Let's start with the result of my analysis:
• PHP equal operator == reference table
• PHP operator== tests (PHP 5.4.10 32-bit)
• PHP operator== tests (PHP 5.4.3 32-bit)
• PHP operator== tests (PHP 5.3.6 64-bit)
• php_cmp_tests.zip (42 KB) - test set (see README file for info how to use)
• A list of OBJECT handler sets I've found in PHP 5.4.10
The rest of the post is structured as following: first there is a short explanation on PHP type internals, then there's a short paragraph on how the operator == works on a high level, and lastly there are several cases described that I found interesting.
TL;DR: feel free to skip the first two sections and go straight for the examples.
Two more notes before we begin:
1. I have one more item on my "TODO" list - the proxy objects. So for the time being please assume that the reference table, the tests, and this post, are all lacking information about these kind of objects.
2. This post is solely about the "equal" operator ==. There is also an "identical" operator === which is not described in this post; however it should be the first choice when it comes to variable comparison in PHP.
PHP types internally
This section focuses solely on variable types from the equality operator's point of view. For a more general description of PHP variables as well as a glimpse on the internal zval structure (variable container in the engine) check out this article on php.net.Internally PHP has exactly 8 the variable types (and yes, you can use == operator to compare any type with any other):
• NULL
• BOOL - Stores either a 0 or a 1 in a LONG's data field.
• LONG - For some reason in PHP language it's called int, but in the engine itself it's called LONG so I'm going to stick with that name. One interesting behavior is that if you would create a LONG of any value and start to increment it, at one point (the LONG overflow point) the type will be automatically switched to DOUBLE (I guess my Polish-speaking readers could compare it to these behaviors). An interesting inconsistency for a high level language is that in both 32-bit and 64-bit PHP the point-of-switch is different (32-bit INT_MAX and 64-bit LONG_MAX respectively).
• DOUBLE - Called float in the PHP language. It's a standard C/C++ double.
• STRING - 8-bit-per-character length+data ("pascal") string.
• ARRAY - Either an indexed or an associative (or both actually) array. The key/index must be either a STRING or a LONG.
• OBJECT - This type is used for both PHP language objects, internal and extension-provided objects, and closures (yes, closures are objects in PHP).
• and RESOURCE - Described by the resource ID (actually it's a LONG number stored in the LONG's data field) and resource type.
The OBJECT type is the most interesting one here. The behavior of an object in PHP is defined by a set of handlers (C/C++ functions) places in the zend_object_handlers structure of a given object. From the == operator point of view there are three important handlers:
• compare_objects - Used to compare two objects that have the same compare_objects handler. This handler is obligatory, even if it does nothing (e.g. always returns "not equal").
• cast_object - Used to cast the object to a given type (used by the == operator when one operand is an OBJECT and the other is neither an OBJECT nor it's NULL). This handler is optional.
• get - Used only by proxy objects (which, as said in the note at the beginning of this post, I am yet to analyze). This handler is optional.
(For further information on handlers please refer to this article and the php-src/Zend/zend_object* files.)
Now, in the standard PHP source package (this includes the default extensions) there are about 50 different handler sets (i.e. 50 different "built-in object classes"; a small number of them is briefly described on this manual page). Some examples:
• default_exception_handlers - used by the standard PHP Exception class (php-src/Zend/zend_exceptions.c).
• closure_handlers - used by the PHP closures (the Closure class; php-src/Zend/zend_closures.c).
• date_object_handlers_date - used by the DateTime class (php-src/ext/date/php_date.c).
• spl_filesystem_object_handlers - used by the FilesystemIterator class (php-src/ext/spl/spl_directory.c).
• and finally the most important set of handlers: std_object_handlers - the standard PHP language class/object handlers - all of the objects you create from classes defined in PHP scripts use this set of handlers.
As an example, let's look at how the std_object_handlers's compare_objects/cast_object/get functions work:
• cast_object (function zend_std_cast_object_tostring) - Works as follows:
‣ Cast to STRING - Call the __tostring PHP method. If the method is not present, return FAILURE (failed to cast). If the __tostring method returns a string that string is returned as the result of the cast. If the method returns a non-string a Catchable Error is raised (E_RECOVERABLE_ERROR) and an empty string "" is returned as the result of the cast. If the method throws an exception a Fatal Error is raised (E_ERROR).
‣ Cast to BOOL - Always returns BOOL(true) as the result of the cast.
‣ Cast to LONG - Always returns LONG(1) and raises a Notice (E_NOTICE).
‣ Cast to DOUBLE - Always returns DOUBLE(1.0) and raises a Notice (E_NOTICE).
‣ It returns FAILURE (failed to cast) for all other types (RESOURCE, ARRAY).
• compare_objects (function zend_std_compare_objects) - If objects classes differ return "not equal". Otherwise do a deep* recursive comparison of all properties using the equality == operator (compare_function to be exact).
• get - Not present (NULL).
* One of the things I've notices is that the recursion is not limited in any way. This means that if there are two objects that hold a reference to either themselves or one another in a property, then comparing them will result in a PHP interpreter crash due to stack exhaustion. Please note that comparing two ARRAYs constructed in a similar manner would not crash the interpreter. Instead a Fatal Error "Nesting level too deep - recursive dependency?" (E_ERROR) would be raised. I've filed a bug for this (#63882).
And I guess that's all you need to know about PHP types to understand the rest of this post.
Equality operator == from a high level perspective
The main mechanics of the equality operator are implemented in the compare_function in php-src/Zend/zend_operators.c, however many cases call other functions or use big macros (which then call other functions that use even more macros), so reading this isn't too pleasant.The operator basically works in two steps:
1. If both operands are of a type that the compare_function knows how to compare they are compared. This behavior includes the following pairs of types (please note the equality operator is symmetrical so comparison of A vs B is the same as B vs A):
• LONG vs LONG
• LONG vs DOUBLE (+ symmetrical)
• DOUBLE vs DOUBLE
• ARRAY vs ARRAY
• NULL vs NULL
• NULL vs BOOL (+ symmetrical)
• NULL vs OBJECT (+ symmetrical)
• BOOL vs BOOL
• STRING vs STRING
• and OBJECT vs OBJECT
2. In case the pair of types is not on the above list the compare_function tries to cast the operands to either the type of the second operand (in case of OBJECTs with cast_object handler), cast to BOOL (in case the second type is either NULL or BOOL), or cast to either LONG or DOUBLE in most other cases. After the cast the compare_function is rerun.
See my PHP equal operator == reference table for details each specific case.
The weird behaviors
There are three classes of weird (unexpected / not intuitive in my opinion): the seemingly unequal operands giving an "are equal" result, the lack of transitiveness (and other inconsistencies), and the crashes. Let's start with the first group and continue down the list.STRING vs STRING
This is my favorite one. Personally I expected the equal operator == to just compare two strings and, if they are identical to the character, return "are equal". However, that's not the case!The first thing the compare_function (or actually the zendi_smart_strcmp function) does is to try to convert both strings to either a LONG or a DOUBLE, skipping all the leading white chars and supporting both the scientific "e" notation and hexadecimal "0x" notation, and if the conversion succeeds, compare the numerical values (in case one is converted to a LONG and the other to a DOUBLE, the LONG one is cast to a DOUBLE).
This means that the following strings are equal as far as PHP is concerned (I had a "WTF" moment when I first saw this):
"1.00000000000000001" == "0.1e1" → bool(true)
"+1" == "0.1e1" → bool(true)
"1e0" == "0.1e1" → bool(true)
"-0e10" == "0" → bool(true)
"1000" == "0x3e8" → bool(true)
"1234" == " \t\r\n 1234" → bool(true)
Furthermore in older PHP versions (like 5.4.3) the following strings would also be considered equal (that's because the operands are converted to DOUBLEs and it's above the DOUBLE precision range):
"1234512345123451234512345" == "1234512345123451234512346" → bool(true)
So, comparing a string that doesn't match the other, containing a numerical value that doesn't match the over, still gave "equal" as a result (another "WTF" moment). A good thing is that this was fixed in newer PHPs (in 5.4.10 this gives "not equal").
I guess the PHP designers wanted to take advantage of PHP being weakly typed. However starting with a LONG/DOUBLE cast in case of two-string comparison is certainly counter intuitive for me. From another point of view I understand what the PHP devs wanted to achieve - a smart (as the name of the function suggests anyway) comparison of two strings where the comparator actually understands what the strings represent (which makes you wonder why "blue" is not equal to "#0000ff", or "awsome" is not equal to "barney stinson").
Btw, if you stick anything non-numeric at the end of the numerical value the result will be "not equal", e.g.: "1000" is not equal to "1000xyz". Also, it's still case sensitive so "asdf" vs "ASDF" are also not equal.
Suggestion: DO NOT USE the equality operator == for STRING vs STRING comparison unless you know EXACTLY what you're doing. Go with the === instead.
LONG vs STRING
There are three interesting cases here.1. Any non-numerical string is equal to 0, e.g.:
0 == "any non-numerical string is equal to zero" → bool(true)
0 == "and I mean any!" → bool(true)
2. In STRING vs STRING if you appended anything at the end of the string (like "xyz" in "1000xyz") it would stop being equal to "1000". However in LONG vs STRING you can append anything at the end of a numerical string and it will still be considered "equal", e.g.:
5 == "5and anything else" → bool(true)
Sadly this is quite inconsistent with the STRING vs STRING behavior (and actually I would prefer it to return "not equal"; consider this example: 5 == "5 million" or 5 == "5K" - these are not equal from the high level perspective yet PHP claims they are).
3. In case of 64-bit PHP where LONG is a 64-bit field and can contain an edge value of 9223372036854775807 (i.e. is_int(9223372036854775807) → true) this value would match a couple of different numerical values (since the string gets cast to DOUBLE instead of a LONG anyways, forcing a cast from LONG to DOUBLE for the second operand before the comparison is made), e.g.:
9223372036854775807 == "9223372036854775806" → bool(false) (not yet equal, string still cast to LONG)
9223372036854775807 == "9223372036854775807" → bool(true) (equal, string still is cast to LONG)
9223372036854775807 == "9223372036854775808" → bool(true) (equal (sic!), string is cast to DOUBLE)
Be cautious when using this comparison.
OBJECT vs OBJECT
I guess the biggest surprise for me was a PHP 5.4.3 (and older) behavior in which two totally different objects would match. Consider the following code:date_default_timezone_set("Europe/Zurich");
class A {}
$a = new A;
$b = new DateTime;
var_dump($a == $b); // will echo bool(true)
The $a object is of class A, that has no methods nor properties. The $b object is of an internal ("built-in") class DateTime that has it's set of properties, methods, and even it's own comparison handler different than the zend_std_compare_objects one. Yet, PHP 5.4.3 would claim these objects are equal.
To be totally honest I must add that this comparison would additionally raise two surprising E_NOTICEs: "Object of class [class name here] could not be converted to int". "Converted to int (LONG)? What? Where?" - well, actually in case the objects had different compare_objects handler they would be cast to LONGs using either their own cast_object handler (the zend_std_cast_object_tostring would return LONG(1) + raise the said E_NOTICE) or if that one would not be set using the default behavior which is returning LONG(1). And of course LONG(1) is equal to LONG(1) so obviously the objects match.
Thankfully this got fixed and the newer PHP version I tested (5.4.10) returns "not equal".
OBJECT vs STRING
The good news here is that you can define your own __tostring method in your class that would be used in comparison. There are two quirks here however.1. In case the object doesn't have a cast_object handler (e.g. the PDORow class doesn't - it's the object class you get when calling the PDOStatement::fetch with the PDO::FETCH_LAZY parameter) the OBJECT is cast to LONG(1) (default fallback behavior) and so the STRING operand will be converted to either LONG or DOUBLE as well - hence you might get some unexpected "equals". Consider the following code:
$r = new \PDO("sqlite::memory:");
$r = $r->query("SELECT sqlite_version()");
$r = $r->fetch(\PDO::FETCH_LAZY);
var_dump($r == "1"); // bool(true)
var_dump($r == " \t\r\n+0.01e2"); // bool(true)
It's quite strange for the default fallback to be "cast to LONG(1)". I think returning "not equal" would be a better choice.
2. You might make the __tostring method return a non-string. In such case a catchable error is raised. If you catch it and handle it, the outcome is identical to that of the __tostring just returning an empty string. So, the following comparison is considered "equal":
set_error_handler(function($errno,$errstr){return true;});
class TestNonString {
function __tostring() {
return $this; // in case this returns non-string, it will
// throw an error, but will return an
// empty string ("") anyways
}
}
var_dump(new TestNonString == ""); // bool(equal)
In my opinion this should be "not equal" - after all, the __tostring did fail and the error handler is not able to correct the mistake.
RESOURCE vs STRING
Consider the following code:fopen('asdf','w')=="0.002e3" → bool(true) (in case fopen returns resource of ID #2)
This one actually makes some sense (in the PHP world that is) - the STRING is cast to DOUBLE, and the resource is cast to LONG (it's resource ID is taken as LONG) and then to DOUBLE, which leads to a DOUBLE vs DOUBLE comparison.
Outside of PHP world comparing a "0.002e3" string with a file handle doesn't make any sense in the first place. I would vote for making this return "not equal" in every case.
A bonus here would be a RESOURCE vs OBJECT comparison, where the resource of ID #1 would return "equal" if compared with any standard object. This doesn't make sense even in PHP world.
Non-transitive equality cases and other inconsistencies
The equality operator is one of the things I would expect to be transitive (it's when A equals B, and B equals C, then that means that A equals C). However, in some cases it's not transitive. Two examples:Case 1: BOOL and STRING
"00" == "0" → equal
"0" == false → equal
but
"00" == false → not equal
Case 2: NULL vs BOOL vs STRING
NULL == false → equal
false == "0" → equal
but
NULL == "0" → not equal
And a different type of inconsistency - ARRAY cast to LONG:
array() == 0 → not equal (compare_function cannot cast ARRAY to LONG)
array(1,2,3) == 0 → not equal
but
(int)array() == 0 → equal (but seems a cast is possible after all)
(int)array(1,2,3) == 1 → equal
And another one - LONG vs STRING and LONG explicitly cast to STRING:
1234 == "1234 asdf" → equal
but
(string)1234 == "1234 asdf" → not equal
Crashes
I've stumbled on two ways to crash PHP interpreter using the == operator, both involve objects and an unbounded recursion attempt.1. I've mentioned the first one in the "PHP types internally" section (bug #63882) - it's the case when two standard object of the same class refer themselves in their properties, and then are compared. E.g.:
class Test { public $x = 5; }
$testobj1 = new Test;
$testobj2 = new Test;
$testobj1->x = $testobj1;
$testobj2->x = $testobj2;
$testobj1 == $testobj2; // Crash (stack exhaustion)
2. While nested function calling in PHP falls under memory limitations (and are killed by a E_ERROR), it seems the same limitation was not put in place if a function (method) is called implicitly. Consider the following code:
class test {
function __tostring() {
//return $this->__tostring(); // -- this would not crash
return $this == "crash me"; // -- stack exhaustion crash
}
}
new test == "plz die";
This is a known bug from at least 5.2.6 from 2008 (see #46754), but it's low severity so it made it to 5.4.10 unfixed.
The end (?)
And that's all of the notes I had prepared. I might write another post in the near future when I get the proxy objects worked out.
I guess good ending words are (to repeat myself): DO NOT USE the equal == operator unless you know EXACTLY what you're doing and what will happen in the background. The "identical" operator === is better in almost every case.
Especially, NEVER use the == operator for security sensitive compares.
By the way...
On 22nd Nov'24 we're running a webinar called "CVEs of SSH" – it's free, but requires sign up: https://hexarcana.ch/workshops/cves-of-ssh (Dan from HexArcana is the speaker).
P.S. I wonder if Python, Perl or Ruby have similarly weird equality operator.
UPDATE: Also take a look at these two links:
• At Least Python Got Equality Right at Xion's blog.
• PHP a fractal of bad design (thx to Przemysław Pawełczyk for this one).
Comments:
I took a look at PHP 5.3.5 (Windows platform) and its results were kinda surprising:
$test = array( is_nan(123.456), is_nan(NULL), is_nan(NAN) );
array_map("var_dump", $test);
$test = array( NAN === 123.321, NAN === 1, NAN === NULL, NAN === "whatever" );
array_map("var_dump", $test);
$test = array( NAN == 123.321, NAN == 1, NAN == NULL, NAN == "whatever" );
array_map("var_dump", $test);
$test = array( sqrt(-5) == log(-5), sqrt(-5) === log(-5), sqrt(-5) == "whatever" );
array_map("var_dump", $test);
T_IS_EQUAL:
NAN compared to anything returns TRUE
T_IS_IDENTICAL:
NAN compared to float returns TRUE
afair, the same code running under PHP 5.2.* gives completly different results (I'm not able to confirm that right now).
Hmm, I've run the above tests on 5.3.5-1ubuntu7.11, 5.3.20 (win32), 5.4.3 (win32) and 5.4.10 (win32), and all give the following results:
bool(false)
bool(false)
bool(true)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
bool(false)
Are you saying you're getting bool(true) everywhere for 5.3.5 (win32) ?
I had a couple of NaNs in my tests too (linked at the top of the post) and the only non IEEE 754 compliant result I've found was var_dump(true == NAN) returning bool(true).
That said, I wonder if that wouldn't be correct anyway. I mean, returning bool(true) for NaN compares with STRING, ARRAY, RESOURCE, BOOL, OBJECT and NULL. After all, these are Not a Numbers.
1. NAN compared to anything returns TRUE (`==` operator)
2. NAN compared to float returns TRUE (`===` operator)
Here's what I got:
boolean false
boolean false
boolean true
boolean true
boolean false
boolean false
boolean false
boolean true
boolean true
boolean true
boolean true
boolean true
boolean true
boolean true
I've done some googling and this problem is described here: https://bugs.php.net/bug.php?id=45712
It seems that this weird behaviour could be noticed only on Windows platform (someone put a comment, which suggests that NaN problem also affects PHP 5.3.3).
Well, I spotted that on my friend's laptop (which has PHP 5.3.5 installed on). However, I've tested it on my Linux platform today: (PHP 5.3.3; PHP 5.3.5; PHP 5.3.10 - 5.3.15) and the results are the same as yours.
root@monitor:~# cat p.php
<?php
if("1234" == "\t\r\n 1234")
echo 'equal';
else
echo 'not equal';
?>
root@monitor:~# php p.php
rowne
PHP is strange;/
Add a comment: