Debugging story: a crash while mapping a texture

Recently on my Polish livestreams I've been writing a somewhat simple raytracer (see screenshot on the right; source code; test scene by ufukufuk), with the intention of talking a bit on optimization, multithreading, distributed rendering, etc. As expected, there were a multitude of bugs on the way, some more visual than others. My favorite one so far was a mysterious buffer overflow resulting with a C++ exception being thrown when rendering in 4K UHD (3840x2160) but not in 1080p (1920x1080). While trying to find the root cause I also run into a standard C library bug with the sqrt function (though it turned out not to be related in the end), which made the run even more entertaining.

It all started with a thrown C++ exception:

.............terminate called after throwing an instance of 'std::out_of_range'
 what():  vector::_M_range_check: __n (which is 9223372036854775808) >= this->size() (which is 845824)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Well, I had to agree that 9223372036854775808 is larger-or-equal to 845824. I immediately knew where the exception was thrown, as there was only one place in the whole code base where I used the at() method of a std::vector which could throw this exception*.
* The difference between using some_vector[i] and some_vector.at(i) is that the prior doesn't do any range checking (at least in popular implementations) while the latter does and throws an exception when i points out of bounds.

V3D Texture::GetColorAt(double u, double v, double distance) const {
 (void)distance; // TODO(gynvael): Add mipmaps.

 u = fmod(u, 1.0);
 v = fmod(v, 1.0);
 if (u < 0.0) u += 1.0;
 if (v < 0.0) v += 1.0;

 // Flip the vertical.
 v = 1.0 - v;

 double x = u * (double)(width - 1);
 double y = v * (double)(height - 1);

 size_t base_x = (size_t)x;
 size_t base_y = (size_t)y;

 size_t coords[4][2] = {
   { base_x,
     base_y },
   { base_x + 1 == width ? base_x : base_x + 1,
     base_y },
   { base_x,
     base_y + 1 == height ? base_y : base_y + 1 },
   { base_x + 1 == width ? base_x : base_x + 1,
     base_y + 1 == height ? base_y : base_y + 1 }

 V3D c[4];
 for (int i = 0; i < 4; i++) {
   c[i] = colors.at(coords[i][0] + coords[i][1] * width);

I started by running the raytracer with a debugger attached and waited for it to trigger again - it took about 20 minutes to reach the state where the bug was triggered (I felt like I was debugging something that's located on Mars). However it seems that GDB doesn't break by default on C++ exceptions (TIL: you have to issue the catch throw command first):

...........terminate called after throwing an instance of 'std::out_of_range'
 what():  vector::_M_range_check: __n (which is 9223372036854775808) >= this->size() (which is 845824)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
[Thread 26080.0x7b58 exited with code 3]
[Thread 26080.0x7024 exited with code 3]
[Thread 26080.0x8648 exited with code 3]
[Thread 26080.0x84fc exited with code 3]
[Thread 26080.0x81e8 exited with code 3]
[Thread 26080.0x525c exited with code 3]
[Thread 26080.0x53a8 exited with code 3]
[Thread 26080.0x87d0 exited with code 3]
[Thread 26080.0x7de0 exited with code 3]
[Thread 26080.0x41b0 exited with code 3]
[Thread 26080.0x63e4 exited with code 3]
[Thread 26080.0x7e10 exited with code 3]
[Thread 26080.0x6c28 exited with code 3]
[Thread 26080.0x84dc exited with code 3]
[Thread 26080.0x7044 exited with code 3]
[Thread 26080.0x816c exited with code 3]
[Inferior 1 (process 26080) exited with code 03]
(gdb) where
No stack.

I decided to change the strategy and replace the at() method with the operator[] so that the C++ exception is not thrown, but instead a Windows exception (ACCESS_VIOLATION) is raised (which are caught by default by the debugger). A quiet voice in my head told me "that won't work if the value multiplied by the size of the element will overflow and fall within the range" - and sure enough, after 40 minutes of rendering I got a 4K image without any crashes. The reason was that the actual index - 9223372036854775808 (0x8000000000000000 hexadecimal) multiplied by the size of the vector element (12 bytes) overflows into the value 0, and index 0 is perfectly valid.

The next strategy change involved actually catching the exception in C++ (as in a try-catch block) and doing lots of debug prints on such an event. This (after another 20 minutes) yielded good results:

.....u v: nan nan
x y: nan nan
base x y: 9223372036854775808 9223372036854775808
 0: 9223372036854775808 9223372036854775808
 1: 9223372036854775809 9223372036854775808
 2: 9223372036854775808 9223372036854775809
 3: 9223372036854775809 9223372036854775809
w h: 826 1024

The problem seemed to be that u and v were NaN, which is a special IEEE-754 floating point value that has the habit of being propagated all around once some expression yields it as a result, i.e. any operation on NaN will yield NaN, with a small exception of sign change which produces -NaN.

One more exception is actually casting it to integers. The C++ standard is pretty clear here - it's undefined behavior. However, in low-level land there actually needs to be an effect and usually there is an explanation for that effect. I started looking into it by running a simple snippet like cout << (size_t)NAN which produced the value 0 (and not the expected 0x8000000000000000). After doing some more experiments and reading in Intel Manual (Indefinites and Results of Operations with NaN Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions sections) I figured out that the x86 assembly instructions themselves (both fist/fistp from the FPU group, as well as cvttsd2si/vcvttsd2si from the SSE/AVX group) return 0x8000000000000000 (called the indefinite value) in such case (a more general rule is: all bits apart from the most significant one are cleared); the #IA fault might be thrown as well, but it's almost always masked out. That said, since technically this is an UB, the compiler can do whatever it feels like and e.g. put a 0 as the result of a (size_t)NAN cast during compilation time. To quote our IRC bot (courtesy of KrzaQ):

<@Gynvael> cxx: { double x = NAN; cout << (size_t)x; }
<+cxx> 9223372036854775808
<Gynvael> cxx: { cout << (size_t)NAN; }
<+cxx> 0

Back to the original problem. Looking at the code, u and v actually come from outside of the function and are modified only in the first lines of the function:

V3D Texture::GetColorAt(double u, double v, double distance) const {
 u = fmod(u, 1.0);
 v = fmod(v, 1.0);
 if (u < 0.0) u += 1.0;
 if (v < 0.0) v += 1.0;

The fmod function doesn't seem to ever return NaN, so u and v arrived at the function already containing the NaN value. The function itself is called from the main shader and the interesting values are the result of a call to the V3D Triangle::GetUVW(const V3D& point) function. The function itself uses the barycentric interpolation method to calculate the UVW texel coordinates given three points making a triangle, UVW values for each of them and an "intersection" point on the triangle that the interpolated value is calculated for. It looks like this:

V3D Triangle::GetUVW(const V3D& point) const {
 V3D::basetype a = vertex[0].Distance(vertex[1]);
 V3D::basetype b = vertex[1].Distance(vertex[2]);
 V3D::basetype c = vertex[2].Distance(vertex[0]);

 V3D::basetype p0 = point.Distance(vertex[0]);
 V3D::basetype p1 = point.Distance(vertex[1]);
 V3D::basetype p2 = point.Distance(vertex[2]);

 V3D::basetype n0 = AreaOfTriangle(b, p2, p1);
 V3D::basetype n1 = AreaOfTriangle(c, p0, p2);
 V3D::basetype n2 = AreaOfTriangle(a, p1, p0);

 V3D::basetype n = n0 + n1 + n2;

 return (uvw[0] * n0 + uvw[1] * n1 + uvw[2] * n2) / n;

The barycentric interpolation method itself is actually pretty simple - the final result is the weighted average of the UVWs of all three points making the triangle, where the weight of the value is actually the area of the triangle made by the provided "intersection" point and the two triangle points that are not the point the weight is calculated for (see the slides liked above for a better explaination).

After adding some more debug prints and waiting another 20 minutes it turned out that the AreaOfTriangle returned the NaN value in some cases when the triangle in question was actually a line (i.e. one of the three points that make the triangle was located exactly on the edge between two other points). This led me to the AreaOfTriangle function itself:

static V3D::basetype AreaOfTriangle(
   V3D::basetype a, V3D::basetype b, V3D::basetype c) {
 V3D::basetype p = (a + b + c) / 2.0;
 V3D::basetype area_sqr = p * (p - a) * (p - b) * (p - c);

 return sqrt(area_sqr);

The function is rather simple - it uses the Heron's Formula to calculate the area given the length of all three edges of the triangle, by calculating half of the perimeter (variable p in my code), then multiplying with the difference between each length of the edge and p and then calculating the square root out of the result. The last bit is what caught my attention - as far as I knew sqrt() function would actually return NaN if a negative value was fed to it.

I started by verifying both in the documentation and experimentally for the value -1 (no, these are not complex numbers, it's not i). The cppreference.com said "a domain error occurs, an implementation-defined value is returned (NaN where supported)", cplusplus.com stopped at "a domain error occurs", MSDN mentioned "returns an indefinite NaN" and man for glibc agreed. The experimental part had actually much weirder results: on Ubuntu sqrt(-1) yielded the value -NaN (well, I guess -NaN is still technically a NaN), but on Windows I've got -1 (wait, what?). I tried on Windows with a couple other values and it turned out that sqrt is just returning the negatives as I pass them. Given that MSDN claimed that a NaN will be returned I launched IDA and found that the mingw-w64 GCC 7.0.1 compiler I'm using actually has it's own implementation of the sqrt function, which happened to suffer from a regression in the recent months. Oh well.

Wait. Why did I actually get NaN in my raytracer if there is such a bug? It turned out that I'm actually using -O3 when compiling my app and using -O3 causes a sqrt to actually work correctly (probably due to some random compiler optimizations) and return NaN (a positive one for a change).

This left the question of "why is the value passed to sqrt negative?". The answer is the one you would expect: floating point inaccuracies. I printed the exact (somewhat - i.e. in decimal form) lengths of the triangle edges that were passed to the function upon crash:


Given that one of the points lies on the edge between the two other, a should be exactly equal to b+c, however that wasn't the case:

a      =66.2214810193791976189459091983735561370849609375
b+c    =66.2214810193791834080911939963698387145996093750

This also means that one of the brackets in the area_sqr calculation will probably yield a negative value (even though 0 would be the correct result):

p  =66.22148101937918340809119399636983871459960937500

After the multiplications and the sqrt call we arrive at the infamous NaN, which solves the whole riddle.

The fix was rather easy - since I know that there is no such thing as "negative area", I can just check if area_sqr is negative and correct it to 0.0 in such a case:

 // It seems that due to floating point inaccuracies it's possible to get a
 // negative result here when we are dealing with a triangle having all points
 // on the same line (i.e. with a zero size area).
 if (area_sqr < 0.0) {
   return 0.0;

To sum up, the invalid index calculation was caused by float inaccuracies earlier on in the calculations, which caused an area of triangle to be a square root of a negative number. I must admit I really had fun debugging this bug, especially that I encountered both the interesting NaN-to-integer cast scenario, as well as the sqrt mingw-w64 bug on the way.

And that's about it.


2017-07-16 18:36:13 = agust1n
Nice analysis. Have you tried using `rr` (http://rr-project.org/)? Sounds like it would have been of help, even though your stuff seems to be working on windows.
2017-07-16 19:54:33 = Lech Walesa
Wouldnt asserts be faster than debug prints?
2017-07-16 20:16:26 = Gynvael Coldwind
@Lech Walesa
Not really - a print is a print. And it wasn't the prints which were slow - it was the rendering itself :)
2017-07-16 20:21:29 = Gynvael Coldwind
I didn't, it's an interesting idea. Though given that it emulates only a single-core machine, and my raytracer took the said 20 minutes to reach the problematic state using 16 threads (8 cores with hyperthreading) it's safe to say that it wouldn't save me probably any time.
Still, thanks for mentioning it - it would be really useful in less-calculation-heavy cases :)

Add a comment:

URL (optional):
Math captcha: 9 ∗ 2 + 1 =