Sometimes it's fun to forget about why an Undefined Behavior in C is bad and just write some code that works here & now, but not necessarily will work tomorrow (with a different compiler version or different compiler settings) or in another place (another platform/system/architecture). A few weeks ago I had a chance to do such fun coding due to a thread "Hello world bez bibliotek i asm" (eng: "Hello world without libraries or asm") on a Polish programming forum - the thread creator was asking if it's possible to create a program writing out "Hello World" without using any libraries (including includes) or inline assembly. While at the beginning the thread was still about proper C, it soon moved to low-level code (still written as C) that depended on the underlying system, CPU architecture or even the way the compiler does its job. In this post I present my idea on how to write out "Hello World" to a GNU/Linux console; also it might be worth to take a look at the thread itself (I guess you won't need to know Polish just to look at C code ;>).

The post below was originally published (in Polish) on forum 4programmers.net in the "Hello world bez bibliotek i asm" (link) thread.

--post start--
A piece of code from me - please note that I wanted to demonstrate a method and not create an always-working-code :)

The code was written to work on linux (32-bits x86) but you can use the same method on 64-bits or on Windows both 32- and 64-bits.
The code does not use any libraries (it doesn't even look for any in the memory) and there is no inline assembly/etc (well, no direct or explicit inline assembly/etc ;>).

I've placed the explanation of the method below the code.

volatile unsigned int something_wicked_this_way_comes(
   int a, int b, int c, int d) {
 a ^= 0xC3CA8900;  b ^= 0xC3CB8900;  c ^= 0xC3CE8900;  d ^= 0x80CDF089;
 return a+b+c+d;
}

void* find_the_witch(unsigned short witch) {
 unsigned char *p = (unsigned char*)something_wicked_this_way_comes;
 int i;
 for(i = 0; i < 50; i++, p++) {
   if(*(unsigned short*)p == witch) return (void*)p;
 }

 return (void*)0;
}

typedef void (*gadget)() __attribute__((fastcall));

int main(void) {
 gadget eax_from_esi_call_int = (gadget)find_the_witch(0xF089);
 gadget set_esi = (gadget)find_the_witch(0xCE89);
 gadget set_ebx = (gadget)find_the_witch(0xCB89);
 gadget set_edx = (gadget)find_the_witch(0xCA89);

 if(!eax_from_esi_call_int) return 1;
 if(!set_esi) return 3;
 if(!set_ebx) return 4;
 if(!set_edx) return 5;

 set_edx(12), set_ebx(1), set_esi(4);
 eax_from_esi_call_int("Hello World\n");

 return 0;
}

This code uses a method really similar to the JIT-language exploitation techniques when the memory is protected via XD/NX/XN/DEP/etc - i.e. I tried to implicitly place in executable memory a couple of "gadgets" (think: ret2libc or return oriented programming - http://gynvael.coldwind.pl/?id=149) and then use them to make a syscall call into the kernel (so, there are no libraries needed at all, but of course there is interaction with the environment, i.e. the Linux kernel).

These gadgets are places in the something_wicked_with_way_comes function as the constants used in XORs.

a ^= 0xC3CA8900;  b ^= 0xC3CB8900;  c ^= 0xC3CE8900;  d ^= 0x80CDF089;
The above C code on assembly / machine code level might look like this (compiled using gcc; disassembled using objdump afair):

[...]
  6:        35 00 89 ca c3               xor    eax,0xc3ca8900
  b:        89 45 08                     mov    DWORD PTR [ebp+0x8],eax
  e:        8b 45 0c                     mov    eax,DWORD PTR [ebp+0xc]
 11:        35 00 89 cb c3               xor    eax,0xc3cb8900
 16:        89 45 0c                     mov    DWORD PTR [ebp+0xc],eax
 19:        8b 45 10                     mov    eax,DWORD PTR [ebp+0x10]
 1c:        35 00 89 ce c3               xor    eax,0xc3ce8900
 21:        89 45 10                     mov    DWORD PTR [ebp+0x10],eax
 24:        8b 45 14                     mov    eax,DWORD PTR [ebp+0x14]
 27:        35 89 f0 cd 80               xor    eax,0x80cdf089
[...]

So, if we would disassemble the code with a slight misalignment (one or two bytes) we would get a code that differs a little:

 6: 35 00 89 ca c3 → mov edx, ecx ; ret
11: 35 00 89 cb c3 → mov ebx, ecx ; ret
1c: 35 00 89 ce c3 → mov esi, ecx ; ret
27: 35 89 f0 cd 80 → mov eax, esi ; int 0x80

Thanks to the above I'm certain that in this case the needed gadgets do reside in memory (of course if the compiler would work in a slightly different way the opcodes might never show up; but in this specific compilation-case they did).
Going further into the code, I use the find_the_witch function to actually find these gadgets in memory in the something_wicked_this_way_comes function (the argument for the scanning function are the two first bytes of a gadget I'm looking for represented as uint16_t (little endian)).

 gadget eax_from_esi_call_int = (gadget)find_the_witch(0xF089);
 gadget set_esi = (gadget)find_the_witch(0xCE89);
 gadget set_ebx = (gadget)find_the_witch(0xCB89);
 gadget set_edx = (gadget)find_the_witch(0xCA89);

One more important thing - here's the gadget type:

typedef void (*gadget)() attribute((fastcall));
It has two essential features:
1. The unspecified amount of arguments denoted by the C's () (please note that in C++ () means (void), but in C it's closer to (...)).
2. The fastcall convention thanks to which the function arguments will be places in the general purpose registers and not on the stack (in case of the first few arguments of course) - in this specific case the first argument is always placed in the ecx register (the gadgets are designed to use this fact).

After that I "construct" a simple assembly-like hello world using the gadgets I have:

 set_edx(12), set_ebx(1), set_esi(4);
 eax_from_esi_call_int("Hello World\n");

This will be executed as following:


(main)   mov ecx, 12
         mov eax, set_edx
         call eax
(gadget) mov edx, ecx
         ret
(main)   ...
...      ...
(gadget) ...
         int 0x80

Or, skipping the parts from the main() function:

[gadget 1] mov edx, 12 (length of the string)
[gadget 2] mov ebx, 1 (stdout)
[gadget 3] mov esi, 4 (sys_write)
[handled by fastcall] mov ecx, address "Hello World\n"
[gadget 4] mov eax, esi
[gadget 4] int 0x80

Of course I'm missing a C3 (ret) after the int 0x80 (no place left in a 4-byte gadget) so the program will crash AFTER writing out "hello world". However it would be fairly simple to fix this :)

Test:

$ gcc -m32 test.c -O0
$ ./a.out
Hello World
Segmentation fault (core dumped)
$

--post stop--

An elegant fix to the Segmentation fault problem was posted by Azarien in the same thread - he created another function called graceful_exit where, using the existing gadgets, he invoked the exit syscall. And then he added the call to this function in the something_wicked_this_way_comes just after d ^= 0x80CDF089; - thanks to this after the gadget 89 F0 CD 80 is executed the CPU will execute whatever is next after the CD 80 (int 0x80) and that would be the call to the graceful_exit function.
The said patch looks like this (Azarien's changes are yellow; there was another change in the patch - the gadget type declaration was moved to the top of the file but I'll skip this in the listing):

void graceful_exit()
{
 set_ebx(0);
 set_esi(1);
 eax_from_esi_call_int(0);
}


volatile unsigned int something_wicked_this_way_comes(
   int a, int b, int c, int d) {
 a ^= 0xC3CA8900;  b ^= 0xC3CB8900;  c ^= 0xC3CE8900;  d ^= 0x80CDF089;
 graceful_exit();
 return a+b+c+d;
}

As said, very elegant solution :)

It's worth also taking a look at MSM's post and the discussion underneath it (in Polish) - MSM's method uses the commonly known (in RE/shellcoding) technique of looking up kernel32 address in the loaded DLLs list in PEB, finding the GetProcAddress in the import tables and acquiring the addresses of all API functions required to print out "Hello World" (that being said, it kinda relies on some libraries; still, fun to look at).

And that's that. Cheers ;>

Comments:

2012-07-12 15:15:11 = tehnicaorg
{
$ uname -a | cat b.c && gcc b.c -Wall -std=c99 -nostdlib && ./a.out

char _start[] __attribute__ ((section(".text#"))) = {
0xE8, 0x0D, 0x00, 0x00, 0x00, 0x48, 0x65, 0x6C, 0x6C, 0x6F,
0x20, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x21, 0x0A, 0x5E, 0x31,
0xC0, 0x89, 0xC2, 0xFF, 0xC0, 0x89, 0xC7, 0xB2, 0x0D, 0x0F,
0x05, 0x48, 0x31, 0xFF, 0x6A, 0x3C, 0x58, 0x0F, 0x05};
Hello World!

The variant without -nostdlib parameter is similar, but it also has a main():
[...]
int
main(void)
{
((void (*)(void))a)();
return 0;
}
}
2012-07-12 16:12:49 = Gynvael Coldwind
{
@tehnicaorg
Sure, that's the most obvious solution, but it doesn't really respect the rule of "no direct inline assembly/etc" - in this case it's inline machine code, Turbo Pascal style :)
We've talked about this kind of solution on the Polish side of this post.

That being said - sure, it would work ;)
}
2012-07-22 10:05:24 = curious
{
I'm curious, why use volatile?

--Thanks
}
2015-04-23 18:04:58 = John
{
" gcc -m32 test.c -O0 "

You forgot -nostdlib

So you are allowing libraries. And that makes the whole thing a lot easier.

extern long int syscall (long int __sysno, ...) __attribute__ ((__nothrow__ , __leaf__));
int main(void)
{
syscall( 1, 1, "Hello world!\n", 13);
}

}
2015-04-26 22:46:22 = Gynvael Coldwind
{
@curious
I think I wa trying to make sure the function won't be optimized away.

@John
*cough* well the title of this post includes the phrase "without libraries" so I thought -nostdlib goes without saying :) *cough*
}

Add a comment:

Nick:
URL (optional):
Math captcha: 7 ∗ 6 + 6 =