PHP getimagesize internals (part 1)

The getimagesize function is, in my humble opinion of course, one of the most interesting functions of the standard PHP library (yes, the standard library, even while it's documentation is placed among the GD extension functions). Why is it so interesting? Firstly, it's implementation is long, and as one knows, long code = many occasions to make minor or bigger mistakes. Secondly, the functions is commonly misused by php coders, introducing interesting bugs into the php code.

First, a little theoretical introduction (since not everybody might be familiar with this function; if you are, scroll a bit down the page, till just after the pasted array(7)).

The getimagesize function, implemented in ext/standard/image.c, "will determine the size of any given image file and return the dimensions along with the file type and a height/width text string to be used inside a normal HTML <IMG> tag and the correspondant HTTP content type" (from PHP docs). The function prototype looks like this:

array getimagesize ( string $filename  [, array &$imageinfo  ] )

The first parameter is obvious, while the second is optional and is used to return certain additional information about JPEG files (afair it returns APP chunks in an associative array).
As one can see, the function returns an array, which looks like this:

array(7) {
 string(24) "width="HEIGHT" height="WIDTH""
 string(9) "MIME_TYPE"

Here is where the inconsistence start. The documentation states:

Returns an array with 7 elements.
For some image types, the presence of channels and bits values can be a bit confusing. As an example, GIF always uses 3 channels per pixel, but the number of bits per pixel cannot be calculated for an animated GIF with a global color table.

On failure, FALSE is returned.

However, in the getimagesize source code (I use the 5.3.0 version) one can find:

if (result->bits != 0) {
 add_assoc_long(return_value, "bits", result->bits);
if (result->channels != 0) {
 add_assoc_long(return_value, "channels", result->channels);

As one can see, the existence of the bits and channels fields in the returned array are clearly optional. So, the php coder might get unexpected E_NOTICE messages (of course they are unlikely to show up to the user, since they are turned off by default; however they will flood the logs) in case of certain types of images. For example:

// $name - the name of the file
$arr = @getimagesize($ame);
echo 'bits    : ' . $arr['bits'] . "<br/>\n";
echo 'channels: ' . $arr['channels'] . "<br/>\n";

If the above code will get an image with bits and channels set to 0, it will emit the following messages:

Notice: Undefined index: bits in /.../test2.php on line 4
bits    :

Notice: Undefined index: channels in /.../test2.php on line 5

So, a first advice to the php coders: when referring to the bits or channels field, check it the fields exist (isset or array_key_exists). Just in case of course :)
(Ah, as I know life some of you will say "hey, why the heck are you worried about some turned-of-by-default messages?". Well, one should always focus on properly handling the errors - you should not fix bugs by concealing the error/warning/notice messages ;>)

Since I'm already in the warning/notice category... the documentation states:

If accessing the filename  image is impossible, or if it isn't a valid picture, getimagesize() will generate an error of level E_WARNING. On read error, getimagesize() will generate an error of level E_NOTICE.

So, it looks like the function, apart of returning FALSE, also issues a E_WARNING or E_NOTICE in some cases (one could use set_error_handler to get more information about the error from the message at execution level, or, if one is not interested in additional information, one could just use @ in front of the function (yes, here you can conceal the message, since it's just an additional information, nothing more)).

Since this post is about the internals, let's check what input (image data) causes the additional warnings/notices to show up. Let's start with the E_NOTICE type of messages (as one can see in the function changelog, these messages were E_WARNING until PHP version 5.2.3, where they were changed to E_NOTICE):

1. The preliminary checking of file type using the php_getimagetype internal function (1):
if((php_stream_read(stream, filetype, 3)) != 3) {
 php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Read error!");

The code is places at the very beginning of the function. As one can see, to see the notice one just has to provide a stream shorter then 3 bytes (I recommend data:// stream for such experiments), for example:
W tym wypadku wystarczy podać jak dane obrazka ciąg (polecam stream data:// do takich eksperymentów) krótszy niż 3 znaki, np.:
$data = "Hi";
getimagesize("data://text/plain;base64," . base64_encode($data));

This generated the following message (the (1) is my addition; I modified the notice messages a little to be sure my data streams reach the proper error):
Notice: getimagesize(): Read error! (1) in /.../test.php on line 10

2. The preliminary checking of file type using the php_getimagetype internal function (2):
} else if (!memcmp(filetype, php_sig_png, 3)) {
 if (php_stream_read(stream, filetype+3, 5) != 5) {
   php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Read error!");

This is short after the previous call. In this case the stream has to have at least 3 proper bytes of PNG signature (89 50 4e), and must be shorter then 8 bytes total. For example:
$data = "\x89\x50\x4eHi!";
The message:
Notice: getimagesize(): Read error! (2) in /.../test.php on line 10

3. The preliminary checking of file type using the php_getimagetype internal function (3):
if (php_stream_read(stream, filetype+3, 1) != 1) {
 php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Read error!");

The above code is called if no 3-byte signature matches, and when 4-byte signatures must be checked (hence another byte is read from the stream). So, the stream must contain 3 random bytes (that are not equal to any 3 byte signature), and nothing more. For example:
$data = "ABC";
The message:
Notice: getimagesize(): Read error! (3) in /.../test.php on line 10

4. The preliminary checking of file type using the php_getimagetype internal function (4):
if (php_stream_read(stream, filetype+4, 8) != 8) {
 php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Read error!");

Yeeees, it's still the same function. No more possible 4-byte signatures, so 12-byte signatures must be checked. Hence additional 8 bytes must be read. Of course to generate the notice one has to pass any stream longer then 3 bytes (at least 4), but shorter than 12 bytes (of course it should differ from any 3-byte and 4-byte signature). For example:
$data = "ABCDEF";
This generates the following message:
Notice: getimagesize(): Read error! (4) in /.../test.php on line 10

5. There is another notice that appearance is conditioned on the PHP compilation options:
#if HAVE_ZLIB && !defined(COMPILE_DL_ZLIB)
 result = php_handle_swc(stream TSRMLS_CC);
 php_error_docref(NULL TSRMLS_CC, E_NOTICE, "The image is a compressed SWF file, but you do not have a static version of the zlib extension enabled");

If PHP is statically linked with zlib, then SWC files are supported. Otherwise, a notice will be generated. So, a checking-stream consists only of a SWC signature:
$data = "CWS";
The message:
Notice: getimagesize(): The image is a compressed SWF file, but you do not have a static version of the zlib extension enabled in /.../test.php on line 11

Now let's check how to generate warnings (the warnings might be more useful since many hosting services have warning display turned on, so a tester/pentester could learn something about the paths etc if he could make the php application issue a warning)!

1. php_handle_jpc function, checking if the first marker (chunk id) is JPEG2000_MARKER_SIZE aka 0x51.
if (first_marker_id != JPEG2000_MARKER_SIZ) {
 php_error_docref(NULL TSRMLS_CC, E_WARNING, "JPEG2000 codestream corrupt(Expected SIZ marker not found after SOC)");
 return NULL;

To reach this code the function must make the php_getimagetype function return IMAGE_FILETYPE_JPC (to do this the first 3 bytes must be equal to FF 4F FF), and the next byte (if it will exist at all) must be different than 0x51.
$data = "\xff\x4f\xffHi!";
The message:
Warning: getimagesize(): JPEG2000 codestream corrupt(Expected SIZ marker not found after SOC) in /.../test.php on line 11

2. php_handle_jp2 function, checking if it was possible to acquire the size of the image (the end of the function):
if (result == NULL) {
 php_error_docref(NULL TSRMLS_CC, E_WARNING, "JP2 file has no codestreams at root level");

To reach this place one has to start with a proper JP2 signature 00 00 00 0c 6a 50 20 20 0d 0a 87 0a (12 bytes), and thats it, nothing more is required.
$data = "\x00\x00\x00\x0c\x6a\x50\x20\x20\x0d\x0a\x87\x0a";
The warning message:
Warning: getimagesize(): JP2 file has no codestreams at root level in /.../test.php on line 11

3. Preliminary checking the image type by php_getimagetype and a broken PNG signature:
} else if (!memcmp(filetype, php_sig_png, 3)) {
 if (php_stream_read(stream, filetype+3, 5) != 5) {
   php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Read error! (2)");
 if (!memcmp(filetype, php_sig_png, 8)) {
 } else {
   php_error_docref(NULL TSRMLS_CC, E_WARNING, "PNG file corrupted by ASCII conversion");

The stream must have at least 8 bytes, where the first 3 bytes must be the beginning of a proper PNG signature (89 50 4E), and the rest can be random, but different than a full proper PNG signature:
$data = "\x89\x50\x4eALAMAKOTA";
The message:
Warning: getimagesize(): PNG file corrupted by ASCII conversion in /.../test.php on line 11

And thats it! There is no other way to make the getimagesize function to generate a warning. In all other cases the function just quietly returns FALSE, without any warning nor notice.

I would advice the (pen)testers to have a few warning/notice generating "images" ready (yep, you can use the above strings), so one can test if no warning/notice are thrown and disclose valuable information (yes, the application path might be a very valuable information, this I can say from my own experience).
Of course the best place to look for getimagesize being used is around image uploads (photos, avatars, etc).

And thats all for today. Checkout the second part of getimagesize internals in a few days :)


2010-10-12 17:44:07 = Brad
Great article! Very clearly outlines just what's going on under the hood of this function. I ran into the problem you mentioned where the 'bits' and 'channels' values don't get included in return array under certain circumstances and this really helped me out.

2010-10-13 00:36:50 = Gynvael Coldwind
Hey, thanks for the feedback. Glad you've found it useful :)

Add a comment:

URL (optional):
Math captcha: 1 ∗ 1 + 2 =