2009-03-10:

Automagical function list in C++

c++:medium:assembler:windows:linux:macosx
The story starts as usual. I've been writing a certain application, that generates some test files. The files were very similar in structure, so I took the common factor out, and created a function that creates the common base of the file, and then, made a few functions that make modification to this base, and then the file is written (file shared, only in GF 15200 GTX! ;>). Of course, every modification function that I made, I had to add to a list of function in another part of the source file. And I've added each 'shader' function I created to that list. After 38th function I've grew tired of this...

So a problem was formulated: who will add each and every function I create to that list, instead of me? Well, I can't hire an assistant, so I guess only the compiler is left.

Everything I'll write from this moment, will be related to the GCC, or, to be more precise, to the g++ compiler. It's very probable that one can do this identically on other compilers, just changing a line or three in the macros shown below. I've tested the code on MinGW GCC 3.4.5 on Vista, GCC 4.1.2 20061115 on Debian on a Linux kernel, and GCC 4.0.1 (Apple Inc. build 5490) on a Mac OS X (x86).

Two ideas came to my mind (UPDATE: but there are more ways to do it, as some commentators on the Polish side of the mirror said):
1) Use the function export mechanism, and parse the export table run-time (i.e. EAT in PE, or similar mechanics in ELF or MACH-O). This code will probably be universal between different compilers on a given OS, but one has to implement separate exports parsing for each executable type (and it ain't 3 lines or code ;p, about 30-50 I would judge).
2) Use a certain property of the GNU assembler (GNU AS) that is used by GCC, and create a new section, that will be updated with a pointer to each function at compile time. This will however work only on GCC compilers, or others that use GNU AS (are there such?). It should work will little or no changes on almost every OS where GCC is available (up to date I've tested it on Windows x86, Linux @ x86 and ARM, and Mac OS X @ x86).

Well, I knew how to implement the first one, so for obvious reasons I've chosen the second one ;>

Everything is about three easy macros (I've created them over an hour because of some "features" of "Linux" GCC):

1) LISTED_FUNC_LIST_START(a) - this macro is used to initiate/create a new function list, it receives the list name in the argument (or to be more precise, the name of a to-be-created array of function pointers)
2) LISTED_FUNC(a) - this macro is used to define a new function, it receives the function name in the argument (it declares that function in addition to adding it to the list)
3) LISTED_FUNC_LIST_END - the end of functions that are to be included into the list, it basically terminates the list will a NULL entry

The first ma macro looks like this:

#define LISTED_FUNC_LIST_START(a) \
/* 0 */ __asm (".section .fnc" SECT_ATTR); \
/* 1 */ __asm (".globl " FUNC_UNDERSCORE #a); \
/* 2 */ __asm (FUNC_UNDERSCORE #a ":"); \
/* 3 */ extern func_ptr a[1]

There are 4 things happening here. In the line /* 0 */ there is a new section declared, named .fnc. SECT_ATTR is an OS/platform dependent set of section attributes. In the example there are two sets of attributes: on Linux-based OS's it's defined the following way:
# define SECT_ATTR ",\"a\",@progbits"
The attribute "a" stands for "allocatable" - with out this, the section won't be read into the process memory. The "@progbits" part means that there will be data in that section, however it is optional (UPDATE: even less then optional, on GNU/Linux @ ARM it had to be removed).
As for Mac OS'X and Windows:
# define SECT_ATTR ",\"dr\""
The "dr" attributes mean data and read-only.
Let's get back to the LISTED_FUNC_LIST_START macro. Line /* 1 */ exports a global symbol of the function list (on linker level). The FUNC_UNDERCORE macro is "_" for Mac OS X and Windows, and an empty string "" for Linux-based OS.
Line /* 2 */ declares a label to mark the place (mark the memory address) of the function list on assembler lever.
And the last line /* 3 */ declares this function list on C++ level. Of course a[1] in this case means "array with AT LEAST one element", not "array with EXACTLY one element" - this can be done so because C++ has no boundary control. Hmm, however due to the extern keyword the [1] could be changed to [], but never mind, let's leave it as is ;>.
To sum up this macro, it creates a section '.fnc' and declares an array of function pointer on assembler, linker and C++ levels.

Time to explain the second macro, the one that's used to declare a single function:

#define LISTED_FUNC(a) \
/* 0 */ __asm (".section .fnc" SECT_ATTR); \
/* 1 */ __asm (".long " FUNC_UNDERSCORE #a); \
/* 2 */ __asm (".text"); \
/* 3 */ extern "C" void a()

Line /* 0 */ is identical as in the previous macro, however, one more remark must be made - everything between two section declaration is APPENDED (key word) to the former section, so from this line to the ".text" section declaration, everything will be appended to the '.fnc' section.
Line /* 1 */ inserts the ADDRESS of function 'a' into the .fnc section and, same time, to the function list (in the assembler listing it will become ".long _NAME", meaning "insert ADDRESS of function _NAME here!").
The line /* 2 */ means 'return to append into the .text section' (".text" is short for ".section .text") - I've added this one after a few empirical tests with some identical functions.
The last line, /* 3 */, is a C-style declaration of the function (there MUST be a function body after this). One remark about why I have used extern "C" here - without it the "a" function on assembler/linker lever would be called something like __Z11alicehasacatv (C++ decorations), and in line /* 1 */ we use the name of this functions - that name SHOULD BE with the decorations then, but it's a non-trivial task to get the C++ decorated name at __asm level (because of the number after Z). So either one has to remove the decoration (using extern "C") or use some aliasing schema - at first I have done the later, but it came out that "Linux" GCC has some "feature" thanks to which it groups all the global inline assembler instructions at the beginning of the listing (which broke the aliasing schema on Linux-based OS'es). So I've moved to the extern "C" solution.

It's time for the last macro:

#define LISTED_FUNC_LIST_END \
/* 0 */ __asm (".section .fnc" SECT_ATTR); \
/* 1 */ __asm (".long 0")

As one can see this one is pretty simple - it adds a 4-byte 0 value (/* 1 */) to the .fnc section (/* 0 */), at the end of the function list.

How to use these macros? Checkout the code below:

#include <cstdio>

// Func pointer typedef
typedef void (*func_ptr)();

// Defines
#ifdef __unix__
# define FUNC_UNDERSCORE ""
# define SECT_ATTR ",\"a\",@progbits"
#else
# define FUNC_UNDERSCORE "_"
# define SECT_ATTR ",\"dr\""
#endif

#define LISTED_FUNC_LIST_START(a) \
/* 0 */ __asm (".section .fnc" SECT_ATTR); \
/* 1 */ __asm (".globl " FUNC_UNDERSCORE #a); \
/* 2 */ __asm (FUNC_UNDERSCORE #a ":"); \
/* 3 */ extern func_ptr a[1]

#define LISTED_FUNC_LIST_END \
/* 0 */ __asm (".section .fnc" SECT_ATTR); \
/* 1 */ __asm (".long 0")

#define LISTED_FUNC(a) \
/* 0 */ __asm (".section .fnc" SECT_ATTR); \
/* 1 */ __asm (".long " FUNC_UNDERSCORE #a); \
/* 2 */ __asm (".text"); \
/* 3 */ extern "C" void a()

// Listed functions
LISTED_FUNC_LIST_START(my_function_list);

// 1st func
LISTED_FUNC(first_func)
{
 puts("first function");
}

// 2nd func
LISTED_FUNC(second_func)
{
 puts("second function");
}

// 3rd func
LISTED_FUNC(third_func)
{
 puts("third function");
}

LISTED_FUNC_LIST_END;
// End of listed functions

int
main()
{
 func_ptr *ptr;

 for(ptr = my_function_list; *ptr; ptr++)
   (*ptr)();

 return 0;
}

At the top there are the macros, then LISTED_FUNC_LIST_START(my_function_list); declares a new array of function pointers called my_function_list, then there are some 3 functions declared using LISTED_FUNC(name) schema, and at the end there is a LISTED_FUNC_LIST_END; macro. In the main() function there is a simple array walk with calling each function.

This is how it works on Vista:

12:42:33 gynvael >ver

Microsoft Windows [Wersja 6.0.6001]
Ansi hack ver 0.004b by gynvael.coldwind//vx

12:42:40 gynvael >g++ --version
g++ (GCC) 3.4.5 (mingw special)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE


12:42:43 gynvael >g++ test.cpp -Wall -Wextra

12:42:48 gynvael >a
first function
second function
third function

12:42:49 gynvael >


This is how it works on Debian with a Linux kernel on x86:

12:48:31 gynvael:debianvm> uname -a
Linux debianvm 2.6.18-6-686-bigmem #1 SMP Sat Dec 27 10:38:36 UTC 2008 i686 GNU/Linux
12:53:19 gynvael:debianvm> g++ --version
g++ (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

12:53:37 gynvael:debianvm> g++ ltest.cpp -Wall -Wextra
12:53:47 gynvael:debianvm> ./a.out
first function
second function
third function
12:53:49 gynvael:debianvm>


And the OS X @ x86 version:

Mac:~ gynvael$ uname -a
Darwin Mac.local 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386
Mac:~ gynvael$ g++ --version
i686-apple-darwin9-g++-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5490)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Mac:~ gynvael$ g++ -Wall -Wextra test.cpp
Mac:~ gynvael$ ./a.out
first function
second function
third function
Mac:~ gynvael$


And thats it.

By the way...
On 22nd Nov'24 we're running a webinar called "CVEs of SSH" – it's free, but requires sign up: https://hexarcana.ch/workshops/cves-of-ssh (Dan from HexArcana is the speaker).


P.S. compiling on Linux-based OSes with the -g option breaks it, I'll check out later wtf
P.S.2. the above code should work on all 32-bit platforms with minor changes (it's easy to make it work on 64-bit platforms too)
P.S.3. Unavowed has tested the above code on a Linux-based OS @ ARM - one has to remove the ",@progbits" from the section attributes to make it work. The output from the ARM machine:

<:tmp>$ uname -a
Linux 2.6.16.16 #1 Tue May 16 21:45:07 CEST 2006 armv4tl GNU/Linux
<:tmp>$ gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ g++ -o test test.cc
./test
<:tmp>$ ./test
first function
second function
third function
<:tmp>$

Comments:

2013-07-17 16:11:52 = mina86
{
I imagine “__attribute__((__section__(…)))” could make things easier: the LISTED_FUNC macro would just define a pointer to a function saved in the section and “extern "C"” would not be necessary.
}

Add a comment:

Nick:
URL (optional):
Math captcha: 1 ∗ 2 + 8 =