2010-10-13:

Resolving macros in C/C++

easy:g++:c++:c
Recently I'm working on some C++ code that (ab)uses many language features in a deep way, and hence, I found it necessary to do some digging to check if a given behavior is a result of standard fulfillment (i.e. it's defined in the language standard), defined compiler behavior (i.e. it's defined in the compiler (GCC in this case) documentation, but not necessarily in the language standard) or it's totally UB (i.e. it's not defined in any official documentation and cannot be relied on in any other version or compiler). So, this post is basically a data dump about some feature (preprocessor macro resolving to be exact) and probably seasoned programmers can skip it.

OK, let's get rid of one question sure (at least on the Polish side of the mirror) to appear:
Q: "Why are you still using the preprocessor when templates are available?"
A: Because I like preprocessors. And let's get this straight: templates are not a total replacement for the preprocessor, since they offer a different set of features.

OK, back to the main subject. The question is related to this piece of code:
#include <cstdio>

int
main(void)
{
#define C(a) printf("C1: %s\n", a);
#define X(a) C(a)


X("before redefining C");

#undef C
#define C(a) printf("C2: %s\n", a);


X("after redefining C");

return 0;
}

The question is: will the second line of output start with C1 or will it start with C2?
Please note, that there are two possible behaviors here:

1st possible behavior: The replacement list of a macro (it's the part that the macro name is expanded to) is expanded at the time of defining, so in case of #define X(a) C(a) the C(a) would be expanded to printf("C1: %s\n", a), and than the pair [X(a) => printf("C1: %s\n", a)] would be placed in the preprocessors translation table (or whatever that thing is called). Please note that this behavior would be quite fast, since at the place of macro name usage in the code only one expansion will take place.

2nd possible behavior: The replacement list of a macro is saved as is (i.e. [X(a) => C(a)]) to the preprocessors translation table, and the (?recursive/linear?) expansions will take place at the place of macro name usage. This will be slower, especially if there are many expansions to be made, but it would add a certain level of flexibility, since you could change (as in the code) the "deeper" macro to something else, and still get the rest of the functionality of the top-macro level.

Personally, I would go for the 2nd behavior, just for the flexibility (I've discussed this with furio yesterday, just before checking, and he also said 2nd is more logical, so.. both of us can't be wrong, right? ;p).

Let's check what the compiler (GCC) actually outputs:
$ g++-4.5.1 test.cpp
$ ./a.out
C1: before redefining C
C2: after redefining C


Right. So it's the second behavior discussed in case of GCC.
Now, is this a UB, compiler-defined or defined in the language standard?

Let's start with GCC Preprocessor documentation (3.1 Object-like Macros):

When the preprocessor expands a macro name, the macro's expansion replaces the macro invocation, then the expansion is examined for more macros to expand.
And...
If the expansion of a macro contains its own name, either directly or via intermediate macros, it is not expanded again when the expansion is examined for more macros. This prevents infinite recursion.

(of course, the above quotes are related to expanding object-like macros, while my example had function-like macros, but one may (should?) assume that the macro expansion rules apply the same to both macro types)

So, this is not UB for sure! It's at least a compiler-defined behavior. Let's look in some recent C++ draft:
16.3.4 Rescanning and further replacement [cpp.rescan]
1. After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. Then the resulting preprocessing token sequence is rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
2. If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.

And (16.3.1 Argument substitution)...
A parameter in the replacement list, unless preceded by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file;

Looks like the 2nd behavior is well defined in the standard.
The interesting part here is that the # and ## processing takes place before rescanning but after argument substitution, so, we can craft macro names same as variable names using ##. Please consider the following example:
#define A _my_name_is_A_
#define B _my_name_is_B_
#define AB _my_name_is_AB_
#define C(a,b) a##b
#define D(a,b) C(a,b)
C(A,B)
D(A,B)

In this example, both C(A,B) and D(A,B) will be resolved to a different string, because C(A,B) will be resolved the following way:

C(A,B)
=> A##B (argument substitution)
=> AB (processing ##)
=> _my_name_is_AB_ (macro rescanning)
=> end

But, D(A,B) will result in a different expansion:

D(A,B)
=> C(A,B) (argument substitution)
=> C(_my_name_is_A_, _my_name_is_B_) (macro rescanning in argument names)
=> _my_name_is_A_##_my_name_is_B_ (macro rescanning)
=> _my_name_is_A__my_name_is_B_ (processing ##)
=> end

So, in the first case the argument macro rescanning had to wait, because a ## token was found. In the second case the argument rescanning took place immediately.
And let's check if g++ -E really displays what I've described:
$ g++-4.5.1 test2.cpp -E
# 1 "test2.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test2.cpp"
_my_name_is_AB_
_my_name_is_A__my_name_is_B_

And that's all what I had in mind for this post. It's worth taking a second look at the linked above C++ standard draft, since there are some interesting UB behaviors in the preprocessor. Maybe it would be worth taking a look how different compilers react to them? But thats a story for another post...

Comments:

2010-10-14 12:56:36 = Reg
{
I report that my Visual C++ 2010 gives same result in both cases you presented here.
}
2010-10-14 18:46:02 = Gynvael Coldwind
{
@Reg
Thanks for checking :)
}

Add a comment:

Nick:
URL (optional):
Math captcha: 9 ∗ 4 + 1 =