Let's start with an example:
printf("%.8x\n", atoi("12345678901234567890"));
printf("%.8x\n", atoi("-12345678901234567890"));
The above code outputs:
* msvcrt (up to msvcr71 inclusive)
eb1f0ad2
14e0f52e
* msvcrt (from msvcr80 inclusive)
7fffffff
80000000
* glibc
7fffffff
80000000
So, as one can see, in the older versions of msvcrt an overflow occurs, and we are shown only the 32 least significant bits of the full number. In case of glibc and the newer msvcrt instead of an overflow we get saturation - if the number in the provided string if lower than INT_MIN or higher than INT_MAX, we get INT_MIN or INT_MAX (since the variable "saturates" at these limits).
These behavior differ for different functions (atoi/[fs]scanf %i/strtol), different versions, different implementations, etc.
I've (and a few of my readers too) done some tests (for the same numbers as in the above example code) on different versions/libraries/architectures, and the results are quite interesting:
name | arch | ver | atoi- | atoi+ | sscanf- | sscanf+ | strtol- | strtol+ |
---|---|---|---|---|---|---|---|---|
crtdll.dll | 32-bit | ? | OF | OF | OF | OF | Sat. | Sat. |
msvcrt.dll | 32-bit | 7.0.7600.16385 | OF | OF | OF | OF | Sat. | Sat. |
msvcrt.dll | 32-bit | 7.0.3790.3959 | OF | OF | OF | OF | Sat. | Sat. |
msvcrt20.dll | 32-bit | 2.12.0.0 | OF | OF | OF | OF | Sat. | Sat. |
msvcrt40.dll | 32-bit | 6.1.7600.16385 | OF | OF | OF | OF | Sat. | Sat. |
msvcr71.dll | 32-bit | 7.10.3052.4 | OF | OF | OF | OF | Sat. | Sat. |
msvcr80.dll | 32-bit | 8.0.50727.4053 | Sat. | Sat. | OF | OF | Sat. | Sat. |
msvcr90.dll | 32-bit | 9.0.30729.1 | Sat. | Sat. | OF | OF | Sat. | Sat. |
msvcr100.dll | 32-bit | 10.0.30319.1 | Sat. | Sat. | OF | OF | Sat. | Sat. |
GNU Lib. C | 32-bit | 2.7 | Sat. | Sat. | Sat. | Sat. | Sat. | Sat. |
GNU Lib. C | 32-bit | 2.7-10 | Sat. | Sat. | Sat. | Sat. | Sat. | Sat. |
GNU Lib. C | 32-bit | 2.11.1 | Sat. | Sat. | Sat. | Sat. | Sat. | Sat. |
GNU Lib. C | 64-bit | 2.9 | -1 | 0 | -1 | 0 | -1 | 0 |
GNU Lib. C | 64-bit | 2.11.1 | -1 | 0 | -1 | 0 | -1 | 0 |
GNU Lib. C | 64-bit | 2.11.2 | -1 | 0 | -1 | 0 | -1 | 0 |
OSX ? Lib. C | 64-bit | OSX 10.6.4 | -1 | 0 | -1 | 0 | -1 | 0 |
OSX ? Lib. C | 32-bit | OSX 10.5.8 | Sat. | Sat. | -1 | 0 | Sat. | Sat. |
Thanks to Zarul Shahrin and Unavoweda for running these tests on OSX.
Thanks for the remarks and sharing results to: djstrong, przemoc, ppkt, Rolek, faramir
In the above table, strtol's result is always casted to an int
The function strtol returns a long int, meaning it returns a 32-bit value on 32-bit systems, and a 64-bit value on 64-bit systems. In case of glibc's atoi and [sf]scanf %i functions, which rely on strtol to do the actual conversion, the result is truncated to 32 least significant bits. So, even though strtol will return 0x7fffffffffffffff and 0x8000000000000000 on 64-bit systems, these values will be truncated to 0xffffffff and 0x00000000, which represent -1 and 0 int two's complement.
(Another thanks goes to Zarul here) The function sscanf %i on the 32-bit OSX uses internally strtoimax function to do the conversion. This function returns intmax_t, which, on a 32-bit x86, is defined as int64_t (meaning long long). So, even though the arch is 32-bit, the results in this case are similar to the 64-bit functions that rely on strtol.
It might be worth to check in which of the above cases ERANGE (see the quotes from the C99 draft below) is actually put in the errno global variable.
In a comment on the Polish side of the mirror, Rolek has made a test (and shared results, thanks ;>) inter alia for the atoi/atol/wtoi/wtol functions, using "1234567890123456789012345678901234567890" and "-1234567890123456789012345678901234567890" as test strings (or wide-char versions of these in case of wto[li]). The results in case of crtdll.dll (version ?) and msvcrt20.dll (version ?) were different for the atoi/atol and wtoi/wtol pairs:
* crtdll.dll and msvcrt20.dll
atoi 0xCE3F0AD2 0x31C0F52E (OF)
atol 0xCE3F0AD2 0x31C0F52E (OF)
wtoi 0xEB1F0AD2 0x82167EEB (truncate(20) → OF)
wtol 0xEB1F0AD2 0x82167EEB (truncate(20) → OF)
Where does this difference come from? Well, it looks like the authors of wto[li] functions in the said libraries made a little shortcut in the implementations of these functions. Instead of writing a normal string→int conversion function, they made an atol converting wrapper of the wto[li] functions. This wrapper works like this: first convert wide-char→ASCII (using WideCharToMultiByte with the char limit set to 20 chars; this is at most 20 digits in case of positive numbers, and a minus sign and at most 19 digits in case of negative numbers), and then calls atol. So, the string is truncated during the conversion, hence the difference in returned results.
In the newer versions of msvcrt, wto[li] is no longer a wrapper, but a fully functional converter, that additionally to ASCII digits (U+0030 to U+0039) supports a half of different Unicode digits, e.g. Arabic Digits (U+0660 to U+0669) or Full Width Digits (U+FF10 to U+FF19).
printf("%i\n", _wtoi(L"\uFF11\u0663\u0C69\u17e7")); → 1337
"\uFF11\u0663\u0C69\u17e7" are the codes of 1٣౩៧
Let's end this post with a few quotes from a C99 standard draft:
atoi
The functions atof, atoi, atol, and atoll need not affect the value of the integer
expression errno on an error. If the value of the result cannot be represented, the
behavior is undefined.
sscanf %i
I didn't find any remark what should happen in case of the number couldn't be represented as an integer. However there are some shy remarks that it should behave same as strtol does. But it's a UB (undefined behavior) afaic.
strtol
If the correct value
is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN,
LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type
and sign of the value, if any), and the value of the macro ERANGE is stored in errno.
That's it for today :)
By the way...
On 22nd Nov'24 we're running a webinar called "CVEs of SSH" – it's free, but requires sign up: https://hexarcana.ch/workshops/cves-of-ssh (Dan from HexArcana is the speaker).
P.S. Btw, does anyone know how to check lib C version on OSX?
P.S.2. I include the test below. If you would like to share your results, please include the version of lib c, and the OS/architecture you've run the test on ;>
#include<stdio.h>
#include<stdlib.h>
int main(void)
{
puts("atoi");
printf(" %.8x\n", atoi("12345678901234567890"));
printf(" %.8x\n", atoi("-12345678901234567890"));
printf("sscanf %%i\n");
{ int res = 0; sscanf("12345678901234567890", "%i", &res); printf(" %.8x\n", res); }
{ int res = 0; sscanf("-12345678901234567890", "%i", &res); printf(" %.8x\n", res); }
printf("strtol\n");
printf(" %.8x\n", strtol("12345678901234567890", NULL, 10));
printf(" %.8x\n", strtol("-12345678901234567890", NULL, 10));
return 0;
}
P.S.3. It might be worth taking a look on the comments on the Polish side of the mirror (using a translator or sth).
Comments:
$ otool -L /usr/lib/libc.dylib
/usr/lib/libc.dylib:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.1)
/usr/lib/system/libmathCommon.A.dylib (compatibility version 1.0.0, current version 315.0.0)
Mine is 125.2.1 (OSX 10.6.5). Note that /usr/lib/libc.dylib indirectly points to /usr/lib/libSystem.B.dylib.
And the test results:
$ gcc -o test test.c && ./test
bla.c: In function ‘main’:
bla.c:15: warning: format ‘%.8x’ expects type ‘unsigned int’, but argument 2 has type ‘long int’
bla.c:16: warning: format ‘%.8x’ expects type ‘unsigned int’, but argument 2 has type ‘long int’
atoi
ffffffff
00000000
sscanf %i
ffffffff
00000000
strtol
ffffffff
00000000
Now in x86:
atoi
7fffffff
80000000
sscanf %i
ffffffff
00000000
strtol
7fffffff
80000000
atoi
7fffffff
80000000
sscanf %i
eb1f0ad2
14e0f52e
strtol
7fffffff
80000000
So it is the same as the 32-bit version.
Add a comment: