2011-07-13:

Initialization of static variables

datadump:c:c++
I've never given too much thought to the problem of initialization of a local variable with static storage in C++ (and C). I just blindly assumed that the static variable works identically to a global variable, but is directly accessible (using language provided means) only in the block of code (and its child blocks) in which it was declared/defined. This is partly true - the big difference is that the global variable is initialized either at compilation time (constant/zeroed) or before the entry point, and the static variable is initialized either at compilation time (constant/zeroed) or when the execution first reaches it's declaration/definition. The interesting parts here are "how does the variable know if it has been initialized?", "can initialization fail and need to be rerun?", "what about concurrent multi-threading?" (the latter has some minor stability/security consequences). Let's take a look at GCC and Microsoft Visual C++ and how do they handle these issues...

Quick note:
I'm using MinGW GCC version 4.5.2 and CL (aka Microsoft C/C++ Optimizing Compiler) version 16.00.30319.01, both 32-bit versions.
The tests were run on a Windows 7 on a 4 core PC.
Also, by writing "static variable" I mean "local variable with static storage".

Initialization of static variables


There are three cases here:

   1. (C/C++) Uninitialized variable, e.g.: static int x;
   2. (C/C++) Variable initialized with a constant, e.g.: static int y = 5;
   3. ( C++ ) Variable initialized with a non-constant expression, e.g.: static int z = func();

The first is simple - since a static variable is placed in the data section (same as global variables) it will be initialized (probably at compilation time) with a default value for a given type (aka 0).

The second case can be implemented (compiler-level of course) in two different ways: either by placing '5' in the data section at compilation time, or by setting the variable to 5 at runtime (see also case 3).
As one can see below, both GCC and CL in the default compilation mode (I didn't check any other compilation options than default ones) place the constant in the data section (i.e. in the memory area reserved for that specific static variable in the data section) at compilation time:

GCC:
 .data
 .align 4
_x.1659:
 .long 5

CL:
_DATA  SEGMENT
?x@?1??main@@9@9 DD 05H          ; `main'::`2'::x
_DATA  ENDS

Btw, both first (zero-initialization) and second (constant initialization) case are called "static initialization" in the standard (I'm looking at a half year old Working Draft for C++: "3.6.2 Initialization of non-local variables", "3.7.1 Static storage duration" and "6.7 Declaration statement" sections). Speaking of the standard, let's see what the standard says about the first and second case:

The zero-initialization [...] of all block-scope variables with static storage duration [...] is performed before any other initialization takes place.
Constant initialization [...] of a block-scope entity with static storage duration, if applicable, is performed before its block is first entered.

Looks like both GCC and CL meet the standard requirements in these cases (since compile-time initialization is in fact done waaay before the "block is first entered").

The third case is a little more tricky (and available only in C++), since obviously the "func" functions must be called, and it can be called at most once. Let's start with the said standard:

[...] Otherwise such a variable is initialized the first time control passes through its declaration;
such a variable is considered initialized upon the completion of its initialization.

It basically says that func() (in our example) will be called when the execution first reaches the static z = func(); line, which makes sense. And, after the function is successfully executed, the variable will be marked as "initialized".

Cool. But how do you mark that a variable is initialized? Well... you need another variable - an "is initialized" variable that will be set once the static variable is initialized, and will be checked each time the execution reaches the static variable declaration/definition point.
The image below shows how this is done exactly in CL:


So, there is another 4-byte (native-word size?) variable in the data section (I've marked it as guard_x) that is either 0 (not initialized) or 1 (initialized). If it's 0, it's set to 1 and then the variable is initialized (please note that it marks the variable as initialized before the initialization is actually done - this is incorrect according to the draft I'm looking at).

OK, and how does GCC do it? Let's take a look at this code generated with -fdump-tree-original option:

static int x;
if ((signed char) *(char *) &_ZGVZ4mainE1x == 0)
{
 if (<<cleanup_point __cxa_guard_acquire (&_ZGVZ4mainE1x) != 0>>)
 {
   <<cleanup_point <<< Unknown tree: expr_stmt
   TARGET_EXPR <D.2549, 0>;, x = func ();;, D.2549 = 1;;, __cxa_guard_release (&_ZGVZ4mainE1x); >>>
   >>;
 }
}

So, the green variable is the "is initialized" variable, and the yellow color marks the initialization itself.
The additional (if compared to the CL version) things here are the __cxa_guard_acquire and __cxa_guard_release functions, which basically synchronize threaded access to the guard variable (mutex) - so the GCC version is thread safe (or, to be more precise, the initialization is thread safe; using the static variable still poses the same old thread-related problems as always), but more about that in a moment. The __cxa_guard_release is also responsible for setting the "is initialized" flag (so it's more like "set and release" than "release" itself).
The implementation of these functions can be found in the libstdc++-v3/libsupc++/guard.cc file in GCC source code.

To summarize this part: a dynamic (runtime) initialization of a static variable requires additional (added by a compiler) variable (think: space, usually native-word size) and code (think: CPU cycles) to actually work.

Can initialization fail?


Actually, it can:

If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration.
So, in case the initialization is aborted CL needs to clear the "is initialized" variable, and GCC needs just to release the mutex. In both cases this is done in the "unwind" procedures that are executed when an exception is hit.

In case of CL the following code is called:

mov     eax, guard_x
and     eax, 0FFFFFFFEh
mov     guard_x, eax
retn

In case of GCC the ___cxa_guard_abort function is called, which basically releases the mutex without setting the "is initialized" bit in the guard variable.

In both cases the initialization is executed again upon next time when the execution reached the static variable declaration/definition.

Multi threading?


This is where things get interesting. Let's take a look a the standard first:

If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
And this is exactly how it's implemented in GCC. CL for some reasons (maybe I have too old version of CL, or too new version of the C++ standard draft) does not implement the synchronization mechanism, what can lead to some interesting situations:

1. N (where N>1) threads reach the static variable definition at the exact same time.
In this case the initialization expression is executed N times instead of one. This can lead to e.g. a memory leak (think: a constructor for a static object allocates a buffer; each thread will call new/malloc, but only one pointer will be stored in the end).
I've done a simple experiment with a function declaring a static object of a class that has a constructor that increments a global variable. The code fired 4 threads that were supposed to try to enter this function at the same time (or something close to that). I've run this code 1100 times, and the results are:

- Initialization expression was executed exactly 1 time : 338 times
- Initialization expression was executed exactly 2 times: 648 times
- Initialization expression was executed exactly 3 times: 114 times
- Initialization expression was executed exactly 4 times: 0 times (which is correct)

Please note that these are actually the statistics for "can a couple of threads be fired at the almost-exact same time" - and yes, they can (if coded correctly).
The "window" here is actually quite small (please take a look at the colorful graph a couple of lines above) - just a couple of cycles. This decreases the probability of this issue appearing in a real-world apps (even if the bug exists in code).

2. A thread starts executing the initialization expression, but it takes time.
Well, since the CL initialization code starts with setting the "is initialized" flag before the initialization even begins, it is possible that another thread reaches the static variable declaration/definition, finds the "is initialized" flag set, and continues execution using the static variable, even though the initialization itself didn't yet finish (please note, that if the static variable is considered "init once, read only later", then the programmer-provided access synchronization mechanisms are not required).
So yes, this may lead to a use-of-uninitialized-variable condition (actually it should be "initialized" with zeroes, and it can already be partly initialized, which is an interesting case too).

3. A thread starts executing the initialization, and fails.
This is actually very similar to the previous case - there is a small time window (it starts when the 'is initialized' flag is set, and ends when the 'is initialized' flag is cleared in the unwinds procedure of the internal exception handler) when some other thread might jump in and think that the variable is initialized, though it's initialization actually failed (what "failed" means is very case-dependent).

That being said, I must note that imho the odds of these kind of bugs being found in some real-world code are rather small. Though it would be very interesting if such a bug would lead to e.g. arbitrary code execution or similar effect (a stability issue is more probably imho).

The End.


I'll end this post with warning messages in the MSDN documentation of the "static" keyword:

Visual Studio .NET 2003: (no warning message found)
Visual Studio 2005: Assigning to a static local variable is not thread safe and is not recommended as a programming practice.
Visual Studio 2010/2008: Assigning a value to a static local variable in a multithreaded application is not thread safe and we do not recommend it as a programming practice.

Guess that includes "dynamic initialization of a static local variable is not thread safe" too. I wonder if CL is going to introduce the mutex-guarded initialization any time soon, or will they just stick with the warnings in MSDN.

And that's that.

Update: Some comments are also available at my Google+ stream.

Comments:

2011-07-14 11:18:52 = MeMeK
{
As for your 3rd case (aka static int z = func(); ) I can add that sth like:
int foo(int x)
{
static int a = foo(x-1);
return a;
}
is UB and causes infinite recursion despite presence of "is initialised" flag.
}
2011-07-14 11:59:53 = Gynvael Coldwind
{
@MeMeK
Yeah, that's an interesting case too.
I've seen it in the standard, but forgot to describe it.
I'll take a look later how GCC and CL react to it (unless someone does it before me).
}
2011-07-14 14:08:41 = Seba
{
The first violet blog i've ever seen.
}
2011-07-14 15:51:02 = MeMeK
{
@Gynvael Coldwind
I've checked how it looks in GCC 3.4.5, GCC 4.5.2 and CL 16.00.30319.01 for 80x86

CL 16.00.30319.01 for 80x86 (http://memek.vexillium.org/tmp/static_init_cl.png)
The algorithm is as follows:
1. Check the guard_x flag. If already initialised then go to 4.
2. Mark guard_x flag as initialised.
3. Call (recirsive) function foo().
4. Store result.

GCC 3.4.5: (http://memek.vexillium.org/tmp/static_init_gcc_3_4_5.png)
The algorithm is as follows:
1. Check the guard_x flag. If already initialised then go to 4.
2. Call (recirsive) function foo().
3. Mark guard_x flag as initialised.
4. Store result.

At this point it should be noted that steps 2. and 3. are replaced. That is why app compiled with GCC 3.4.5 falls into infinite recursion and compiled with CL does not.

GCC 4.5.2 is described in the post, and it behaves much like GCC 3.4.5 (plus it makes usage of mutex synchronization), the "is initialised" flag is set after function call so it also falls ito infinite recursion. But in this case an exception "__gnu_cxx::recursive_init_error" is thrown.
}
2011-07-14 17:33:46 = MeMeK
{
As I wrote at Gynvael's Google+ Stream:
"I think, that "init section" fits. In this case (GCC 4.5.2) we have critical section within "init section", which in my opinion is incorrect and still does not fully protect against race condition. I'll try to describe it in comment at your blog ;)"

A picture is worth a thousand words, so:
http://memek.vexillium.org/tmp/init_critical.png

As we can see, under some circumstances (timing) a static variable can be initialised twice (or more). Additionally guard flag will be set twice (or more).
The better way is "init section" within critical section design. It should be sth like:
1. Try to enter critical section.
2. Check guard flag.
3. Initialise static variable.
4. Leave critical section + mask guard flag as initialised.

Correct me if I am wrong.
Greetings.
}
2011-07-16 20:36:59 = MeMeK
{
I did some test but had no luck. Perhabs 4 cores are not enough. It is extremely unlikely for this particular race cond to occur, but still... If someone has more than 4 cores on his/her board I can send testing code.
Greetings!
}

Add a comment:

Nick:
URL (optional):
Math captcha: 4 ∗ 9 + 9 =