2009-03-17:

OS X, Objective C i RE

macosx:objc:easy:re
Finally has arrived the day when I take a look at creating OS X GUI applications! Applications on Mac are usually created using Objective C language (which I didn't have the pleasure to meet yet) and the Cocoa API (OS X equivalent of WinAPI; there was once also a Carbon API for Mac OS). From a programmers point of view, the Objective C syntax has really caught my eye - it's really very interesting! But I admit, from a reverse-engineers point of view Objective C gets* even better ;>
* gets - hmm, my gcc tells me I should not use that verb... it's the last time I use gcc for spell-checking ;F

First, a short Objective C program that, using the HTTP handling from Cocoa, downloads the main page of my blog to a buffer:

#include <cocoa/cocoa.h>

char buffer[1024 * 1024];

int main(int argc, char **argv) {

 NSURLRequest *theRequest =
   [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://gynvael.coldwind.pl/"]
                 cachePolicy:NSURLRequestReloadIgnoringLocalAndRemoteCacheData
                 timeoutInterval:20.0];

 NSURLResponse *theResponse= NULL;
 NSData *theData =
   [NSURLConnection sendSynchronousRequest:theRequest
                    returningResponse:&theResponse
                    error:NULL];

 [theData getBytes:buffer length:sizeof(buffer)];

 puts(buffer);

return 0;
}



If one has never before read Objective C code, allow me to explain! (On the other hand, if you have written in Objective C, please skip this part). I assume the reader known C++ syntax, so, for contrast, let's look at a C++ code snippet:

ObjectPointer->Method(Parameter1, Parameter2, Parameter3);
Class::StaticMethod(Parameter1, Parameter2, Parameter3);


A similar construction in Objective C looks like this:

[ObjectPointer Method:Parameter1 ParameterName2:Parameter2 ParameterName3:Parameter3];
[Class StaticMethod:Parameter1 ParameterName2:Parameter2 ParameterName3:Parameter3];


As one can see, it looks kinda similar (the meaning) and different (the syntax) in the same time. Another thing are the strings with @ in front, for example: @"alice has a cat" - this notation makes the compiler create an CFString/NSString classed object (CF stands for Core Foundation and NS for NeXTStep, on the source level the string is NSString, but in the internals is usually recognized/named as a CFString), which describes the given string (a being similar to the string class in C++, but with it's own code-level notation).

What does the top-most code co? It starts by creating a HTTP request (theRequest), which is an object of the NSURLRequest class. The request is forged by a static method (I'll refer to such methods as factories from now) called requestWithURL, which receives 3 parameters: the URL, the cache behavior (cachePolicy) and the timeout value (timeoutInterval). The URL is passed as a pointer to a object of NSURL class, which is created by a factory called URLWithString from a CFString/NSString.
After the request is ready, another static method is called, this time from the NSURLConnection class. The method is named sendSynchronousRequest, and it receives, inter alia, the request. It returns an object of the NSData class, which is a wraps the data (I think of it as a similar being to a C++ vector).
Another method, getBytes (non-static this time), from the NSData object is called to copy the data into a buffer.
And the data is outputted to stdout, until the first \0 (let's ignore the case when data fills the whole 1MiB buffer and \0 is not found for now ;>)

An interesting detail (at least for me; since Objective C is an exotic language for me, many things considered 'normal' for Objective C coders are interesting to for me) is that the static methods are called 'class methods' here, and 'normal' methods are called 'object methods' - it makes sens, doesn't it?

Now for compiling the example - I do it (as always) from the command line:

gcc test.m -Wl,-framework,cocoa -o test

Of course this can be compiled from the XCode level too (it's an IDE provided by Apple, quite good actually, but GVIM start faster ;>)

OK, the code is now compiled, so, feed it to your-favorite-disassembler, and let's look what's inside.

The most interesting (in my humble opinion) are the following features of the reversed application:
1) The function names have an identical prefix as the MinGW GCC compiled ones on Windows - an underscore _, for example _main, _func.
2) The prologue of the function, except for creating a stack, also transfers the EIP value to EBX:
call $+5
pop ebx

3) All the other data references are done using the ebx register, meaning they are referenced respectably to the address of pop ebx instructions - it's very peculiar and rare to find something like this in something other then a shellcode.
4) There are 16 (!) sections (__text, __cstring, __literal8, __data, __dyld, __cfstring, __bss, __common, __message_refs, __cls_refs, __module_info, __image_info, __pointers, __jump_table, __LINKEDIT_hidden, ABS, in addition to there there is also a HEADER and something marked as UNDEF)
5) In the __cstring section there are (surprise!) ASCIIZ strings (aka C strings). What's interesting, among the ASCIIZ strings used directly in the source code, there are also class names (for example NSURLConnection), method names (for example requestWithURL:cachePolicy:timeoutInterval:) and strings described by CFString objects here.
6) In the __cfstring section there are the CFString objects. Their structure is simple - the address of the CFConstantStringClassReference class, some flags, address of the described string (in the __cstring section), and the length of the string - each field is 4 byte long.
7) The "buffer" (see the source code) has been placed in the __common secion.
8) In the __cls_refs section there are pointers that point on the strings with class names:
__cls_refs+00: dd offset __cstring:"NSURLRequest"
__cls_refs+04: dd offset __cstring:"NSURL"
__cls_refs+08: dd offset __cstring:"NSURLConnection"

What's interesting, after executing the application, the pointers are changed to point at class descriptors (in external libraries) - it works just like IAT in PE on Windows.
9) In the __message_refs section there are similar pointer to the names of the methods, which are then changed tot he pointers to the methods.
10) The __jump_table is rwx, and during the loading phase (or the execution, not sure of that) the section is slightly modified.
11) The method invocations are made differently than in C++. Just as a reminder, in C++ the thiscall calling convention is used - the pointer of the object is placed in the ecx registry, and the method is called in the same way as a normal function (call). In Objective C however we have a different schema - there is a function called objc_msgSend, that receives at least two parameters: the pointer to the class/object descriptor, and a pointer to the 'called' method. Additional parameters of this function are the parameters of the called method (without their names). The objc_msgSend function is responsible for calling the method with the given parameters, but to tell you the truth, there is a whole story behind that function, which I will tell a different time (one can take a look here and here if one wants). What's interesting, the compiler crafts the code that a pointer to a pointer is being resolved (using the infamous EBX register of course) and placed on the stack, which makes the whole thing rather unreadable (for example - what class/method is being called in the example below? no idea? I don't have one neither...):
__text:00001EBA                 lea     eax, dword_1011CD[ebx]
__text:00001EC0                 mov     eax, [eax]
__text:00001EC2                 mov     edx, eax
__text:00001EC4                 lea     eax, dword_1011B9[ebx]
__text:00001ECA                 mov     ecx, [eax]
__text:00001ECC                 mov     dword ptr [esp+38h+var_28], 0
__text:00001ED4                 lea     eax, [ebp+var_14]
__text:00001ED7                 mov     [esp+38h+var_2C], eax
__text:00001EDB                 mov     eax, [ebp+var_10]
__text:00001EDE                 mov     [esp+38h+var_30], eax
__text:00001EE2                 mov     [esp+38h+var_34], ecx
__text:00001EE6                 mov     [esp+38h+var_38], edx
__text:00001EE9                 call    _objc_msgSend

The unreadability is due to lacking the information on what is called, because of the pointer being relative to EBX and even IDA gets lost in that case - Charlie Miller on BH2008 in Japan has talked about it during his 'Owning the Fanboys: Hacking Mac OS X' lecture, he created a plug-in for IDA which solves the problem. If one is not happy with plug-ins, I present my IDC script that I created as a task during a meeting of the 'Reinventing The Wheel Club' (the script works for this example, but I do not guarantee that it will work for other apps ;D, but it should):
#include <idc.idc>
static main(void)
{
 auto Start, Stop, i, RelAddr;
 auto OldAddr, NewAddr;

 Start = ScreenEA();
 if(Start == BADADDR)
 {
   Message("Invalid address");
   return;
 }

 Stop = FindFuncEnd(Start);
 if(Stop == BADADDR)
 {
   Message("Not in function or invalid address");
   return;
 }  

 Start = GetFunctionAttr(Start, FUNCATTR_START);

 Message("Func <%x, %x>...\n", Start, Stop);

 RelAddr = 0;

 for(i = Start; i < Stop; i = ItemEnd(i))
 {
   if(GetMnem(i) == "call" && Byte(i) == 0xE8 && Dword(i+1) == 0)
   {
     RelAddr = ItemEnd(i);
     Message("RelAddr found (%x)...\n", RelAddr);
   }

   if(RelAddr != 0 && GetMnem(i) == "lea" && GetOriginalByte(i) == 0x8D && (GetOriginalByte(i+1) & 0xC7) == (0x83 & 0xC7))
   {
     OldAddr = (GetOriginalByte(i+2) | (GetOriginalByte(i+3) << 8) | (GetOriginalByte(i+4) << 16) | (GetOriginalByte(i+5) << 24));
     NewAddr = OldAddr + RelAddr;
     Message("Fixing opcode at %x (%x -> %x)...\n", i, OldAddr, NewAddr);
     PatchByte(i+1, GetOriginalByte(i+1) ^ 0x86); // lea Y, [X]
     PatchDword(i+2, NewAddr);

     // To restore uncoment this
     //PatchByte(i+1, GetOriginalByte(i+1));
     //PatchDword(i+2, OldAddr);
     
     // Check for offsets to strings
     if(GetStringType(Dword(NewAddr)) != 0xffffffff)
       MakeComm(i, "->-> \"" +  GetString(Dword(NewAddr), -1, GetStringType(Dword(NewAddr))) + "\"");
     
   }
 }

 AnalyzeArea(MinEA(), MaxEA());
}

The above script is run for a single function. For example, the previous fragment of a deadlisting, after being chewed by the script, looks like this:
__text:00001EBA                 lea     eax, off_103018 ; ->-> "NSURLConnection"
__text:00001EC0                 mov     eax, [eax]
__text:00001EC2                 mov     edx, eax
__text:00001EC4                 lea     eax, off_103004 ; ->-> "sendSynchronousRequest:returningResponse:error:"
__text:00001ECA                 mov     ecx, [eax]
__text:00001ECC                 mov     dword ptr [esp+38h+var_28], 0
__text:00001ED4                 lea     eax, [ebp+var_14]
__text:00001ED7                 mov     [esp+38h+var_2C], eax
__text:00001EDB                 mov     eax, [ebp+var_10]
__text:00001EDE                 mov     [esp+38h+var_30], eax
__text:00001EE2                 mov     [esp+38h+var_34], ecx
__text:00001EE6                 mov     [esp+38h+var_38], edx
__text:00001EE9                 call    _objc_msgSend


And thats all for now. I'll get back to the exotic Mac OS X platform soon.

P.S. Take a look at the comments at the Polish side of the mirror under the post about the automagical function list - a few ideas have been posted there on how to solve that problem in a different manner. It's worth taking a look ;>

Add a comment:

Nick:
URL (optional):
Math captcha: 2 ∗ 7 + 2 =