[UPDATE: Video from Recon conference in 2010: "Syndicate Wars Port: How to port a DOS game to modern systems"]
The initial attempt
We made the first attempt to port this game a few years back (I think it was 5 years ago). The plan was simple - create a disassembler, and try to find all the dependencies. Sounds simple!
OK, first of all, why the hell did we want to create a disassembler almost from scratch? Two reasons:
1.We didn't know any disassembler that handled LE files correctly (we didn't know about IDA back then). For those of you who never played with the DOS4GW extender - the executable file consists (similarly to a PE file) of two parts: a 16-bit DOS stub that executes dos4gw.exe, and a 32-bit Linear Executable that is loaded by the dos4gw.exe loader. Of course the LE part contains the application code/data/etc. (check http://www.tenberry.com/dos4g/faq/format.html for additional information)
2.Since the dead-listing was going to be recompiled, it had to be compatible with the input format of an assembler of our choosing (and we chose the Netwide Asembler aka NASM).
Additionally, your own disassembler allows you to incorporate some other, useful in the given case, features, like a user-provided symbol table, a white list of data regions (with the not listed data regions not appearing in the listing), or a list of vtable regions (first we thought that the game was in C++, having mistaken a switch-jump table for a vtable :>).
The disassembler (called ledisasm) was written by Unavowed, and it used the ndisasm (from the Netwide Assembler packet) as the disassembling engine.
Ah, one thing - Unavowed is a GNU/Linux person, while I owned a Windows box, so we had to write everything in a way that it would work on both our systems (which restricted as to create only console applications). Back to the story...
Having a disassembler ready (it's a simplification - Unavowed made fixed to it from time to time, so we had to use a diff-patch (+ some sed scripts) method to keep the changes between the dead-listing regenerations), we could start looking for the dependencies: the C library functions, the I/O calls (mostly int/out/in instructions or mapped memory IO references), and DOS4GW environment specific dependencies.
But first! Some statistics: the listing we got weighted over 14 MB, and consisted of over 1,070,000 lines of assembly code and data (dd db etc).
OK, but how to find a, let's say, an open/fopen function in such insanely large assembly-soup? Look for interrupts and trace the cross-references! Of course, we used the best existing interrupt list - the Ralf Brown's Interrups List (http://www.ctyme.com/rbrown.htm).
One might say "Hey! Wait a minute there! How come the 32-bit application was allowed to use the 16-bit DOS/VBIOS/BIOS/etc interrupts?". It is a good question, and it is a place where the DPMI enters (DOS Protected Mode Interface). In short, the DPMI, which is integrated in DOS4GW, registers some ISRs (Interrupt Service Routine) in the IDT (Interrupt Descriptor Table), which, when called, switch to 16-bit real mode (I'm not sure here whether DOS4GW implements it using the VM86 method, or normal real mode), calls whatever it's supposed to call, and jumps back to the protected mode.
Additionally, there were functions like int386 or int386x which allowed to call the 16-bit interrupts using the DPMI from high level languages like C.
Btw, the int386x is implemented in a funny way (however I admit that there are not many ways to do it, and this one probably is the fastest):
____int386x_:
[...]
call func_02628
[...]
func_02628:
lea esi,[esi+esi*2]
lea eax,[cs:esi+func_02631]
push eax
; push the return address
[...]
ret
func_02631:
int 0x0
ret
int 0x1
ret
int 0x2
ret
int3
nop
ret
int 0x4
ret
int 0x5
[...] ; yep, there is an int XX + ret for every interrupt
int 0xfc
ret
int 0xfd
ret
int 0xfe
ret
int 0xff
ret
OK, back to the topic!
After finding a few functions, we found somewhere reference to Watcom, and found the Open Watcom compiler (http://www.openwatcom.org/), with the source code of the standard libraries. Searching for functions when you have their source is much faster than when you have nothing (the sources were accurate to about 95%). Additionally, we could confirm our findings, and also change some names from __i_think_its_fopen_but_im_not_sure_please_double_check to _freopen :)
While reverse engineering the assembly code (we translated the code to pseudo-C, since it's easier to read C than assembly), we created some tools (in Perl) which helped us in these translations: it was a very simple “decompilation” of some instructions and if blocks; it was buggy, but it was enough to speed things a little (it was really simple, nothing even close to what modern hexrays can do). Also, I had a script in cli PHP that changed the function names to color names (it's easier to remember and distinguish func_red and loc_cyan than func_189275 and loc_9ac61b), but in the end, we didn't use it too much.
When we found about 50% functions, we met a guy (hi joostp!) who was working on a remake of the first Syndicate. After we told him what we were doing, he showed mercy and gave us a list of functions found by IDA Pro in MAIN.EXE (the Syndicate Wars executable), which saved us a few weeks of finding the rest of the functions.
Having the functions, we could cut exchange the function implementations to calls to the modern native libc (glibc or msvcrt). However, the calls couldn't been done with a simple 'jmp libc.func' since Watcom uses a Watcom fastcall calling convention (take a look at the “Calling conventions for different C++ compilers and operating systems” by Anger Fog), which, of course, is not compatible with cdecl used in both glibc and msvcrt. So, we created a Python script that received a list of functions with some kind of prototype descriptors, and created the wrappers. Additionally, the script handled the win32/gnu differences (like the underscore required in cdecl functions in object files on Windows) and added debug-aiding messages.
The configuration file for the wrapper.py script looked like this:
# v - vararg: like cdecl but used for functions with v[name] variant
#
# args is a sequence of zero or more of:
# i - int
# x - int (displayed in hex)
# p - void * (general pointer)
# s - char *
# c - char
#
# name type args
access p sx
asctime p
atoi p s
[...]
A sample wrapper looks like this:
_c_access:
push ebx
push ecx
push edx
push esi
push edi
push edx
push eax
push edx
push eax
push dword .debug_str
call printf
add esp, byte +0xc
call access
add esp, byte +0x8
pop edi
pop esi
pop edx
pop ecx
pop ebx
ret
.debug_str:
db 'access("%s", 0x%x)', 0xa, 0x0
You may be surprised about the push/pop edx and ecx in the above code, since normally the callee should save only ebx, esi, edi and ebp registers, and both the edx and ecx registers are considered to be scratch registers. Well, guess what, in Watcom clib (clib, libc, crt, geeez, these people should make up their minds!) both edx and ecx are callee-save registers. Believe me, we learned this the hard way ;p
About the debug messages, they of course were printed to stdout, and at one point we added also printing the return address to stdout, and we hooked it with a tool written in C which had a symbol map of the functions (as in “had a converted objdump symbol table into a hash table cached on the hard disk between runs symbol table” to be exact), and switched the addresses in the debug output to symbols.
The input looked like this:
004DAAA5 read(3, 0096F95C, 1024)
004DA9FD close(3)
004271EB strcmp("CD", "CD")
004271EB strcmp("InstallDrive", "CD")
004271EB strcmp("InstallDrive", "InstallDrive")
004271EB strcmp("Language", "CD")
and the output:
<func_02046+1d> read(3, 0096F95C, 1024)
<func_02044+11> close(3)
<func_00268+91/jump_02574+12> strcmp("CD", "CD")
<func_00268+91/jump_02574+12> strcmp("InstallDrive", "CD")
<func_00268+91/jump_02574+12> strcmp("InstallDrive", "InstallDrive")
<func_00268+91/jump_02574+12> strcmp("Language", "CD")
Ah, speaking of debugging – we used the GNU Debugger (gdb) mainly, since it was the only debugger both me and Unavowed could use. To speed up things a little, we created some GDB scripts that made it usable a little more. E.g.:
define hardtrace
echo Hardtracing the stack...\n
set $max = 0x00520000
set $min = 0x00400000
set $cnt = $esp
set $iter = 1
printf "[00] "
info symb $eip
while 1
set $temp = *(unsigned int*)$cnt & 0xffff0000
if $temp >= $min && $temp <= $max
printf "[%.2i] ", $iter
set $iter = $iter + 1
info symb *(unsigned int*)$cnt
end
set $cnt = $cnt + 4
end
end
(yes, this script is a brute-force call-stack walker)
After the standard C functions started working, the next step was to see what is the first thing that crashes, analyze it, fix it, and do the same with the next place the game will crash (please note that at that moment we had nothing more than a few debug message showing on the console). Of course, since the C functions worked, the things that crashed were the I/O functions.
After some time, we managed to block (block, not fix) the I/O functions of the keyboard, sound, and mouse, and we focused on the graphic routines.
Some, like the palette changing, were easy to find, since they used known port numbers - out/in instructions were the key here, and Ralf Browns Port List (yes, Ralf Browns XYZ List again). For example, the palette changing function looks like this:
;------------------------------------------------------
func_00889: ; 0006f9dc
;------------------------------------------------------
push ecx
mov ch,dl
mov cl,al
mov dx,0x3c8
xor al,al
out dx,al ; palette color number
mov dl,0xc9
mov al,cl
out dx,al ; red
mov al,ch
out dx,al ; green
mov al,bl
out dx,al ; blue
[...]
pop ecx
ret
The above function was translated into SDL-compatible palette changing (in C of course):
void
set_palette(const uint8_t *palette)
{
SDL_Color colors[256];
int x;
const uint8_t *p;
printf("set_palette(%p)\n", palette);
for (p = palette, x = 0; x < 256; x++, p += 3)
{
colors[x].r = p[0] * 4;
colors[x].g = p[1] * 4;
colors[x].b = p[2] * 4;
printf("[ %i %i %i ], ", colors[x].r,colors[x].g,colors[x].b );
}
if (SDL_SetPalette(screen, SDL_LOGPAL | SDL_PHYSPAL, colors, 0, 256) != 1)
fprintf(stderr, "set_palette() failed\n");
}
And the same had to be done with resolution changing, on screen rendering, etc.
Finally, after some days, we saw the intro! Kinda... (the colors were OK)
That evening we worked until we finally got the proper intro showing up, playing at hi-speed (the timers were just stubs at that time), and finally crashing :)
The last thing we did in this attempt, was displaying the menu. After that Unavowed moved abroad, got offline, and the project was forgotten, and waited for its own time....
The final attempt
You know how it is – when you got that far, something might be forgotten, but it will pop up from time to time.
About two years ago the project was revived, but we decided to start from scratch – we learned a few things here and there, gained some experience, and we thought that it could be done better.
So, Unavowed created a new disassembler, this time based on the binutils package, that created re-compilable listings in GNU Assembler (as) format, AT&T style this time (he he he I can see some of you going 'at&t??? omg blah yuck'). The first disassembler did not give us any guarantee that its output could be correctly recompiled, because it walked linearly through the bytes in the image, so the new version traced all branching instructions to map out all reachable code. Also the wrapper creating script and the C part of the port were rewritten.
Having notes from the previous attempts, we got to the same point very fast, and keyboard, mouse, and timers became our focus.
Well, I guess I can't tell you nothing new here – the previously used method was good enough, and in a few days work we got the first level of the game running.
Also, Unavowed insisted on playing the game music from ogg files, and that also was implemented at that time, as well as resizing (not resampling, we wanted to keep the old school looks) the 320x200 parts of the game (videos and low-res-mode in the game itself) to 640x480.
A few more days, and Unavowed got the sound hooked with OpenAL, and it even stopped sounding like 'gzzzzzbzzzzzzmzzzzfzjiiiiiiiiiiiiiiiitbrrrrrrrrrrr!' :)
Well, it was far from being completed – some parts of menu crashed, and also the game crashed when starting the second mission.
Uh, that second bug took as two weeks to sort out! The funniest thing was that it was not in our code, but in the original code. And when we fixed it, and checked the cross-references, we found out that there actually is a flag /g, that could be provided in the command line, that “turned off” this bug. And guess what – there was a BAT script in the original game that looked like this:
@main /w /g
I could say that we wasted some time there, but no.. we learned a thing or two there :)
In the meantime, both Unavowed and me got access to Macs, and we decided to port the game to OSX too.
I would like to say that porting to OSX went smooth and without any problems. Yes, really, I would like to say that, but I can't, since it was terrible.
First of all, OSX has some terribly old binutils version, that didn't like some directives or other syntax figures that we used (e.g., instead of .global keyword, the OSX binutils expected .globl).
Additionally, we found out that the OSX ABI needs the stack to be aligned to 16 bytes on each function entry, which did not appear on other platforms we had Syndicate Wars Port running on.
But even after this, we got really strange crashes in the OSX version, with the execution landing in the middle of instructions. It turned out that the Apple version of the binutils assembler badly compiled loop and loopnz instructions. The target address would always be compiled as a few bytes off of the real target. To fix this, and the problem with unsupported directives, we wrote a filter that replaced modern directives with ancient directives, and replaced loop/loopnz with more instructions that did the same job.
Also, there were some other more or less time consuming issues, but finally, we got it running on OSX too.
Well, since the reverse-engineering work was done, we started to gather the library licenses, j00ru has written a command line CDDA ripper for us, and Xa has made a cool looking icon and the graphics for the project site. And the ring, er, the project, was forgotten again.
Until a week ago, when we finally decided that a year is enough, and a finished project should be published (maybe someone else also wants to play this game). Of course, during this year both Windows 7 and Mac OSX 10.6 came out, and we found out that there are some minor problems compiling / installing the Port on these systems. Well, actually the Windows one was related to x86-64 mode, not to Windows 7 itself, but I found it out later.
Anyway, the end of this story is known to you – we've finally released it, it can be downloaded, there are some screen shots, there is a video.
At the end, I would like to thank Unavowed for a chance taking part in this project. Additional thanks go to j00ru, MeMeK, oshogbo, Blount, and xa for contributions. Also, I would like to thank joostp for his positive feedback during making of the Port, and Arashi for patience :) Thanks :)
Comments:
I was wondering about the absolute offsets in the disassembled file, and if the program was reassembled and it grew in size / addresses were shifted, how did you deal with that?
Of course, using dosbox might have been an easier solution, but less interesting.
It may be even better to try and apply the method to other bullfrog (or just watcom-built?) games from the same era (hi-octane, magic carpet...)
I'm sure I played that game when I was young, I'll definitely look it up again and see if it brings back memories. :)
Our of curiosity, did you run into any licensing issues with the original game company?
The disassembler changes all memory addresses to labels in its output. All absolute addresses are easily recognised since they appear in LE relocation tables. So in the end there is almost no change in executable size. We could even decrease it by stripping out all unused statically-linked clib code, but we didn't bother.
Ange:
To me it was worth it, but I doubt I'd want to do this again, it's much too time consuming. Still, it might now be easier for others to use our stuff for porting the other Bullfrog games. Though if I wanted to port another one, I'd choose UFO: Enemy Unknown ;-)
Ron:
Nope, not yet at least. Bear in mind that we released the port only a couple of days ago ;-)
Even if nobody will play it... yeah, it was worth it :) It was fun to reverse, and a great feeling to reach the current state :)
@Ange
Hi! Nice seeing you here ;>
When we started this project, Syndicate Wars didn't work too well under DOSBox afair... It changed later, but hehe, it was to late to stop ;>
As for other Watcom games... well, it is an option to work on other games (fungos on OpenRCE suggested even to automate this). However, currently we don't have such plans (hence other projects)... but... who knows ;>
@Peter Ferrie
Hi!
Ah, so it's the rmode switch after all ;> Thanks ;>
And thanks for the positive feedback ;>
@Ron
Thanks ;>
Well, the original game company (Bullfrog) sadly does not exist. I'm not really sure who owns the copyrights (it might be EA who bought Bullfrog and shut it down later). Anyway, nobody contacted us... yet, and the only thing that might be questioned is mentioned on the project page ;>
The game license itself does not mention anything about reverse engineering, and additionally, the Polish law (well, we're both Polish hehe) allows one to patch a program he owns (as in "he is licensed to use") to work with other programs, like modern OS'es (Art. 75 and 76 of the Polish copyright laws). At least that's what we know :)
Thanks! :)
Btw, woah, impressive portfolio man... Witcher was great! :)
@Riddlemaster
Thanks! :)
And see you on IGK ;>
Thanks for this as I now can run SW again on my Linux box, but there seem to be a couple of bugs which prevent much play. These occur in the Cryo and Equipment screens when clicking on a body type or equipment type causes the program to end abruptly, with no obvious segfault warning...
Hi!
Thanks for the bug report! I forwarded it to the http://groups.google.com/group/syndicate-wars-port
Take a look there from time to time, we'll try to find out what's wrong, and maybe we'll have additional questions regarding the issue :)
If you indicate a (freenode?) IRC channel, I can help with a bit of interactive debugging.
You don't have too have a Google account :)
To reply, just send an e-mail to:
syndicate-wars-port U_KNOW_WHATS_HERE googlegroups.com
with the subject set to:
Re: [syndicate-wars-port] Linux Cryo/Equipment screen game termination
Take care :)
I always love to read such 'Reverse Engineering' works by people! Was cool to read your approach and how you handled difficult parts!
Great job :)
I wish I could get it working fully on OS X...
Thanks! :)
@Raveem
Thanks :)
Hmm, whats the problem on OS X? Please write to our mailing list :) (it's address is stated on the project page afair - http://swars.vexillium.org/)
Though I'm saddened to hear you guys don't plan on further developing the port, all this stuff is way too hardcore for me ='( Multiplayer and MOD support would be, while crazy difficult, amazing... there's no game like Syndicate these days. Maybe it'd be better for someone to just code a new sourceport from scratch for something like that, who knows. I'd bounty that. Just a thought... ;)
With that said, where the #@%$ is your donate link? This is amazing work!
What did the "/w" and the "/g" even do? Seems strange that if these "options" were actually requirements - why were they not just hard-coded into the EXE?
Thanks for any information you can provide,
dvwjr
Ah, EA then ;) OK
As for the further develop, well, as I've said, no plans currently, especially when EA stated that they will be doing another Syndicate-series game, and that it will be released in near future.
As for the donate link - there isn't one ;) It's a toootally non-profit project (hey, we mainly did it so WE could play this excellent game! ;>>>>)
@dvwjr
That is an excellent question dvwjr! And to tell you the truth, we've spent some hours trying to figure this out.
And the only thing we've found out, is that when these options are missing, when starting the second mission the code execution goes to some invalid code that is totally broken and has no right to work. So it may have been some unfinished feature that was disabled by /w or /g, or maybe some kind of hardware/drivers actually required that non-working code (and it did work in that case).
Summarizing - after a few hours on it, we still have no idea wtf ;D
Add a comment: