As one may know, each PE packer does two things:
1. It compresses/encrypts the original application code
2. It adds a loader, which decompresses/decrypts the original code at runtime, and executes it
This way we get a packed exe from an original exe, which decrypts at runtime.
The loader used in the mentioned malware theoretically does the same thing, except, it decrypts to a child process. It does it the following way:
1. It takes the path/filename to it's executable file (GetModuleFileName)
2. It creates a new, suspended (CREATE_SUSPENDED), process, based on it's own exe file (CreateProcess)
3. It decompresses/decrypts the data, but each decrypted element is written to a proper place in the child process (WriteProcessMemory backed up by VirtualProtectEx)
4. It moves the EntryPoint to the decrypted code. I don't recall how he did that, but there are a few ways - starting with using the SetThreadContext and changing the EIP (I wouldn't recommend that) or some other register in which the EP is stored (hint: see in which moment the main thread is stopped when CREATE_SUSPENDED is used), up to injecting a PUSH Addr + RET at OEP.
5. And then, it resumed the thread of the child process (ResumeThread).
The above schema has pros and cons, as everything. Lets start with the good things:
- OllyDbg for some strange reason doesn't show processes (when Attaching) that are stopped after CREATE_SUSPENDED (but one can set OllyDbg as JIT-Debugger, and use RMB/Debug in TaskMgr)
- Having to switch the debugger to another (child) process sucks (and this sucks even more if the child creates yet another child, and so on)
- It's not a common scheme, so it can acquire another few minutes for the application - before it's cracked (maybe it's a standard packer which I know nothing about?)
- The AV emulators will have a hard time checking this schema (of course if the mother-exe would be detected by a signature, then it's a completely different story)
By the way...
There are more blog posts you might like on my company's blog: https://hexarcana.ch/b/
Now the bad things (why this is not so good):
- WriteProcessMemory is everything but fast (but the data could be transfered in bigger chunks...)
- A breakpoint at ResumeThread and an image dump solves this issue - and we don't have to worry about EP to much either
- The dynamic unpacking is easy to make automagic
However, one might want to extend this protection, and make the child processes communicate with each other (for example sending some data between them) - then debugging this is not pleasant at all (however the speed will lower). Maybe I'll try this out someday and throw the code somewhere here hehe...
OK, the end.
Add a comment: