2010-01-27:

The tale of Syndicate Wars Port

hard:reverse engineering:re:assembler:games:gamedev:x86:asm:windows:linux:macosx:c:syndicate wars
As promised, It's time to reveal the technical story behind the Syndicate Wars Port. The story is divided into two parts - the first, and the second attempt to port this game. Comments are welcomed!

[UPDATE: Video from Recon conference in 2010: "Syndicate Wars Port: How to port a DOS game to modern systems"]

The initial attempt


We made the first attempt to port this game a few years back (I think it was 5 years ago). The plan was simple - create a disassembler, and try to find all the dependencies. Sounds simple!
OK, first of all, why the hell did we want to create a disassembler almost from scratch? Two reasons:
1.We didn't know any disassembler that handled LE files correctly (we didn't know about IDA back then). For those of you who never played with the DOS4GW extender - the executable file consists (similarly to a PE file) of two parts: a 16-bit DOS stub that executes dos4gw.exe, and a 32-bit Linear Executable that is loaded by the dos4gw.exe loader. Of course the LE part contains the application code/data/etc. (check http://www.tenberry.com/dos4g/faq/format.html for additional information)
2.Since the dead-listing was going to be recompiled, it had to be compatible with the input format of an assembler of our choosing (and we chose the Netwide Asembler aka NASM).
Additionally, your own disassembler allows you to incorporate some other, useful in the given case, features, like a user-provided symbol table, a white list of data regions (with the not listed data regions not appearing in the listing), or a list of vtable regions (first we thought that the game was in C++, having mistaken a switch-jump table for a vtable :>).
The disassembler (called ledisasm) was written by Unavowed, and it used the ndisasm (from the Netwide Assembler packet) as the disassembling engine.

Ah, one thing - Unavowed is a GNU/Linux person, while I owned a Windows box, so we had to write everything in a way that it would work on both our systems (which restricted as to create only console applications). Back to the story...

Having a disassembler ready (it's a simplification - Unavowed made fixed to it from time to time, so we had to use a diff-patch (+ some sed scripts) method to keep the changes between the dead-listing regenerations), we could start looking for the dependencies: the C library functions, the I/O calls (mostly int/out/in instructions or mapped memory IO references), and DOS4GW environment specific dependencies.

But first! Some statistics: the listing we got weighted over 14 MB, and consisted of over 1,070,000 lines of assembly code and data (dd db etc).

OK, but how to find a, let's say, an open/fopen function in such insanely large assembly-soup? Look for interrupts and trace the cross-references! Of course, we used the best existing interrupt list - the Ralf Brown's Interrups List (http://www.ctyme.com/rbrown.htm).

One might say "Hey! Wait a minute there! How come the 32-bit application was allowed to use the 16-bit DOS/VBIOS/BIOS/etc interrupts?". It is a good question, and it is a place where the DPMI enters (DOS Protected Mode Interface). In short, the DPMI, which is integrated in DOS4GW, registers some ISRs (Interrupt Service Routine) in the IDT (Interrupt Descriptor Table), which, when called, switch to 16-bit real mode (I'm not sure here whether DOS4GW implements it using the VM86 method, or normal real mode), calls whatever it's supposed to call, and jumps back to the protected mode.
Additionally, there were functions like int386 or int386x which allowed to call the 16-bit interrupts using the DPMI from high level languages like C.

Btw, the int386x is implemented in a funny way (however I admit that there are not many ways to do it, and this one probably is the fastest):

____int386x_:
[...]
 call func_02628
[...]
func_02628:
 lea esi,[esi+esi*2]
 lea eax,[cs:esi+func_02631]
 push eax
; push the return address
[...]
 ret

func_02631:
 int 0x0
 ret
 int 0x1
 ret
 int 0x2
 ret
 int3
 nop
 ret
 int 0x4
 ret
 int 0x5
[...] ; yep, there is an int XX + ret for every interrupt
 int 0xfc
 ret
 int 0xfd
 ret
 int 0xfe
 ret
 int 0xff
 ret


OK, back to the topic!
After finding a few functions, we found somewhere reference to Watcom, and found the Open Watcom compiler (http://www.openwatcom.org/), with the source code of the standard libraries. Searching for functions when you have their source is much faster than when you have nothing (the sources were accurate to about 95%). Additionally, we could confirm our findings, and also change some names from __i_think_its_fopen_but_im_not_sure_please_double_check to _freopen :)

While reverse engineering the assembly code (we translated the code to pseudo-C, since it's easier to read C than assembly), we created some tools (in Perl) which helped us in these translations: it was a very simple “decompilation” of some instructions and if blocks; it was buggy, but it was enough to speed things a little (it was really simple, nothing even close to what modern hexrays can do). Also, I had a script in cli PHP that changed the function names to color names (it's easier to remember and distinguish func_red and loc_cyan than func_189275 and loc_9ac61b), but in the end, we didn't use it too much.

When we found about 50% functions, we met a guy (hi joostp!) who was working on a remake of the first Syndicate. After we told him what we were doing, he showed mercy and gave us a list of functions found by IDA Pro in MAIN.EXE (the Syndicate Wars executable), which saved us a few weeks of finding the rest of the functions.

Having the functions, we could cut exchange the function implementations to calls to the modern native libc (glibc or msvcrt). However, the calls couldn't been done with a simple 'jmp libc.func' since Watcom uses a Watcom fastcall calling convention (take a look at the “Calling conventions for different C++ compilers and operating systems” by Anger Fog), which, of course, is not compatible with cdecl used in both glibc and msvcrt. So, we created a Python script that received a list of functions with some kind of prototype descriptors, and created the wrappers. Additionally, the script handled the win32/gnu differences (like the underscore required in cdecl functions in object files on Windows) and added debug-aiding messages.

The configuration file for the wrapper.py script looked like this:
# v - vararg: like cdecl but used for functions with v[name] variant
#
# args is a sequence of zero or more of:
# i - int
# x - int (displayed in hex)
# p - void * (general pointer)
# s - char *
# c - char
#
# name type args
access p sx
asctime p
atoi p s
[...]

A sample wrapper looks like this:
_c_access:
       push ebx
       push ecx
       push edx
       push esi
       push edi
       push edx
       push eax
       push edx
       push eax
       push dword .debug_str
       call printf
       add esp, byte +0xc
       call access
       add esp, byte +0x8
       pop edi
       pop esi
       pop edx
       pop ecx
       pop ebx
       ret
.debug_str:
       db 'access("%s", 0x%x)', 0xa, 0x0


You may be surprised about the push/pop edx and ecx in the above code, since normally the callee should save only ebx, esi, edi and ebp registers, and both the edx and ecx registers are considered to be scratch registers. Well, guess what, in Watcom clib (clib, libc, crt, geeez, these people should make up their minds!) both edx and ecx are callee-save registers. Believe me, we learned this the hard way ;p

About the debug messages, they of course were printed to stdout, and at one point we added also printing the return address to stdout, and we hooked it with a tool written in C which had a symbol map of the functions (as in “had a converted objdump symbol table into a hash table cached on the hard disk between runs symbol table” to be exact), and switched the addresses in the debug output to symbols.

The input looked like this:
004DAAA5 read(3, 0096F95C, 1024)
004DA9FD close(3)
004271EB strcmp("CD", "CD")
004271EB strcmp("InstallDrive", "CD")
004271EB strcmp("InstallDrive", "InstallDrive")
004271EB strcmp("Language", "CD")

and the output:
<func_02046+1d> read(3, 0096F95C, 1024)
<func_02044+11> close(3)
<func_00268+91/jump_02574+12> strcmp("CD", "CD")
<func_00268+91/jump_02574+12> strcmp("InstallDrive", "CD")
<func_00268+91/jump_02574+12> strcmp("InstallDrive", "InstallDrive")
<func_00268+91/jump_02574+12> strcmp("Language", "CD")


Ah, speaking of debugging – we used the GNU Debugger (gdb) mainly, since it was the only debugger both me and Unavowed could use. To speed up things a little, we created some GDB scripts that made it usable a little more. E.g.:
define hardtrace
 echo Hardtracing the stack...\n
 set $max = 0x00520000
 set $min = 0x00400000
 set $cnt = $esp
 set $iter = 1
 printf "[00] "
 info symb $eip
 while 1
   set $temp = *(unsigned int*)$cnt & 0xffff0000
   if $temp >= $min && $temp <= $max
     printf "[%.2i] ", $iter
     set $iter = $iter + 1
     info symb *(unsigned int*)$cnt
   end
   set $cnt = $cnt + 4
 end
end

(yes, this script is a brute-force call-stack walker)

After the standard C functions started working, the next step was to see what is the first thing that crashes, analyze it, fix it, and do the same with the next place the game will crash (please note that at that moment we had nothing more than a few debug message showing on the console). Of course, since the C functions worked, the things that crashed were the I/O functions.

After some time, we managed to block (block, not fix) the I/O functions of the keyboard, sound, and mouse, and we focused on the graphic routines.

Some, like the palette changing, were easy to find, since they used known port numbers - out/in instructions were the key here, and Ralf Browns Port List (yes, Ralf Browns XYZ List again). For example, the palette changing function looks like this:

;------------------------------------------------------
func_00889:             ; 0006f9dc
;------------------------------------------------------
               push ecx
               mov ch,dl
               mov cl,al
               mov dx,0x3c8
               xor al,al
               out dx,al ; palette color number
               mov dl,0xc9
               mov al,cl
               out dx,al ; red
               mov al,ch
               out dx,al ; green
               mov al,bl
               out dx,al ; blue
               [...]
               pop ecx

               ret


The above function was translated into SDL-compatible palette changing (in C of course):
void
set_palette(const uint8_t *palette)
{
 SDL_Color colors[256];
 int x;
 const uint8_t *p;

 printf("set_palette(%p)\n", palette);

 for (p = palette, x = 0; x < 256; x++, p += 3)
 {
   colors[x].r = p[0] * 4;
   colors[x].g = p[1] * 4;
   colors[x].b = p[2] * 4;
   printf("[ %i %i %i ], ", colors[x].r,colors[x].g,colors[x].b );
 }
 
 if (SDL_SetPalette(screen, SDL_LOGPAL | SDL_PHYSPAL, colors, 0, 256) != 1)
   fprintf(stderr, "set_palette() failed\n");
}


And the same had to be done with resolution changing, on screen rendering, etc.

Finally, after some days, we saw the intro! Kinda... (the colors were OK)
intro... kinda


That evening we worked until we finally got the proper intro showing up, playing at hi-speed (the timers were just stubs at that time), and finally crashing :)

The last thing we did in this attempt, was displaying the menu. After that Unavowed moved abroad, got offline, and the project was forgotten, and waited for its own time....

The final attempt


You know how it is – when you got that far, something might be forgotten, but it will pop up from time to time.

About two years ago the project was revived, but we decided to start from scratch – we learned a few things here and there, gained some experience, and we thought that it could be done better.

So, Unavowed created a new disassembler, this time based on the binutils package, that created re-compilable listings in GNU Assembler (as) format, AT&T style this time (he he he I can see some of you going 'at&t??? omg blah yuck'). The first disassembler did not give us any guarantee that its output could be correctly recompiled, because it walked linearly through the bytes in the image, so the new version traced all branching instructions to map out all reachable code. Also the wrapper creating script and the C part of the port were rewritten.

Having notes from the previous attempts, we got to the same point very fast, and keyboard, mouse, and timers became our focus.
Well, I guess I can't tell you nothing new here – the previously used method was good enough, and in a few days work we got the first level of the game running.
Also, Unavowed insisted on playing the game music from ogg files, and that also was implemented at that time, as well as resizing (not resampling, we wanted to keep the old school looks) the 320x200 parts of the game (videos and low-res-mode in the game itself) to 640x480.
A few more days, and Unavowed got the sound hooked with OpenAL, and it even stopped sounding like 'gzzzzzbzzzzzzmzzzzfzjiiiiiiiiiiiiiiiitbrrrrrrrrrrr!' :)

Well, it was far from being completed – some parts of menu crashed, and also the game crashed when starting the second mission.

Uh, that second bug took as two weeks to sort out! The funniest thing was that it was not in our code, but in the original code. And when we fixed it, and checked the cross-references, we found out that there actually is a flag /g, that could be provided in the command line, that “turned off” this bug. And guess what – there was a BAT script in the original game that looked like this:

@main /w /g

I could say that we wasted some time there, but no.. we learned a thing or two there :)

In the meantime, both Unavowed and me got access to Macs, and we decided to port the game to OSX too.
I would like to say that porting to OSX went smooth and without any problems. Yes, really, I would like to say that, but I can't, since it was terrible.

First of all, OSX has some terribly old binutils version, that didn't like some directives or other syntax figures that we used (e.g., instead of .global keyword, the OSX binutils expected .globl).
Additionally, we found out that the OSX ABI needs the stack to be aligned to 16 bytes on each function entry, which did not appear on other platforms we had Syndicate Wars Port running on.
But even after this, we got really strange crashes in the OSX version, with the execution landing in the middle of instructions. It turned out that the Apple version of the binutils assembler badly compiled loop and loopnz instructions. The target address would always be compiled as a few bytes off of the real target. To fix this, and the problem with unsupported directives, we wrote a filter that replaced modern directives with ancient directives, and replaced loop/loopnz with more instructions that did the same job.
Also, there were some other more or less time consuming issues, but finally, we got it running on OSX too.

SWars Port running on Mac with a bottle of Chanoine in the background


Well, since the reverse-engineering work was done, we started to gather the library licenses, j00ru has written a command line CDDA ripper for us, and Xa has made a cool looking icon and the graphics for the project site. And the ring, er, the project, was forgotten again.

Until a week ago, when we finally decided that a year is enough, and a finished project should be published (maybe someone else also wants to play this game). Of course, during this year both Windows 7 and Mac OSX 10.6 came out, and we found out that there are some minor problems compiling / installing the Port on these systems. Well, actually the Windows one was related to x86-64 mode, not to Windows 7 itself, but I found it out later.

Anyway, the end of this story is known to you – we've finally released it, it can be downloaded, there are some screen shots, there is a video.

At the end, I would like to thank Unavowed for a chance taking part in this project. Additional thanks go to j00ru, MeMeK, oshogbo, Blount, and xa for contributions. Also, I would like to thank joostp for his positive feedback during making of the Port, and Arashi for patience :) Thanks :)

Victory

Comments:

2010-01-27 09:04:33 = lallous
{
Although some people might argue: "Is it worth all the efforts", but this remains an interesting practice.

I was wondering about the absolute offsets in the disassembled file, and if the program was reassembled and it grew in size / addresses were shifted, how did you deal with that?
}
2010-01-27 09:42:36 = Ange
{
from my mame experience, it was worth the effort: you spend time doing what you like (reverse engineering), and you know you'll get a good game in the end - and quite a unique one, in this case.
Of course, using dosbox might have been an easier solution, but less interesting.
It may be even better to try and apply the method to other bullfrog (or just watcom-built?) games from the same era (hi-octane, magic carpet...)
}
2010-01-27 16:23:10 = Peter Ferrie
{
To answer your question about interrupt handling - DOS4GW switches to real mode, not v86 mode, to handle them.
}
2010-01-27 18:07:51 = Peter Ferrie
{
And, of course, excellent work! Congratulations on this. One of my most favourite games ever. :-
}
2010-01-27 20:45:01 = Ron
{
Very awesome, thanks!

I'm sure I played that game when I was young, I'll definitely look it up again and see if it brings back memories. :)

Our of curiosity, did you run into any licensing issues with the original game company?
}
2010-01-27 21:40:55 = Unavowed
{
lallous:
The disassembler changes all memory addresses to labels in its output. All absolute addresses are easily recognised since they appear in LE relocation tables. So in the end there is almost no change in executable size. We could even decrease it by stripping out all unused statically-linked clib code, but we didn't bother.

Ange:
To me it was worth it, but I doubt I'd want to do this again, it's much too time consuming. Still, it might now be easier for others to use our stuff for porting the other Bullfrog games. Though if I wanted to port another one, I'd choose UFO: Enemy Unknown ;-)

Ron:
Nope, not yet at least. Bear in mind that we released the port only a couple of days ago ;-)
}
2010-01-28 01:04:22 = Gynvael Coldwind
{
@lallous
Even if nobody will play it... yeah, it was worth it :) It was fun to reverse, and a great feeling to reach the current state :)

@Ange
Hi! Nice seeing you here ;>
When we started this project, Syndicate Wars didn't work too well under DOSBox afair... It changed later, but hehe, it was to late to stop ;>
As for other Watcom games... well, it is an option to work on other games (fungos on OpenRCE suggested even to automate this). However, currently we don't have such plans (hence other projects)... but... who knows ;>

@Peter Ferrie
Hi!
Ah, so it's the rmode switch after all ;> Thanks ;>
And thanks for the positive feedback ;>

@Ron
Thanks ;>
Well, the original game company (Bullfrog) sadly does not exist. I'm not really sure who owns the copyrights (it might be EA who bought Bullfrog and shut it down later). Anyway, nobody contacted us... yet, and the only thing that might be questioned is mentioned on the project page ;>
The game license itself does not mention anything about reverse engineering, and additionally, the Polish law (well, we're both Polish hehe) allows one to patch a program he owns (as in "he is licensed to use") to work with other programs, like modern OS'es (Art. 75 and 76 of the Polish copyright laws). At least that's what we know :)
}
2010-01-28 10:34:52 = yarpen
{
Great story & great work! Congratulations.
}
2010-01-28 10:39:35 = Riddlemaster
{
Once again - good job! and congrats.
}
2010-01-28 11:44:55 = Gynvael Coldwind
{
@yarpen
Thanks! :)
Btw, woah, impressive portfolio man... Witcher was great! :)

@Riddlemaster
Thanks! :)
And see you on IGK ;>
}
2010-02-01 04:37:30 = quietman
{
@ Gynvael Coldwind :

Thanks for this as I now can run SW again on my Linux box, but there seem to be a couple of bugs which prevent much play. These occur in the Cryo and Equipment screens when clicking on a body type or equipment type causes the program to end abruptly, with no obvious segfault warning...
}
2010-02-01 10:08:50 = Gynvael Coldwind
{
@quietman
Hi!
Thanks for the bug report! I forwarded it to the http://groups.google.com/group/syndicate-wars-port
Take a look there from time to time, we'll try to find out what's wrong, and maybe we'll have additional questions regarding the issue :)
}
2010-02-01 13:52:12 = quietman
{
Thanks. Will watch that, but don't have and don't want a Google account though...

If you indicate a (freenode?) IRC channel, I can help with a bit of interactive debugging.
}
2010-02-01 20:03:58 = Gynvael Coldwind
{
@quietman
You don't have too have a Google account :)
To reply, just send an e-mail to:
syndicate-wars-port U_KNOW_WHATS_HERE googlegroups.com
with the subject set to:
Re: [syndicate-wars-port] Linux Cryo/Equipment screen game termination

Take care :)

}
2010-02-08 22:53:40 = Ben/PVDasm
{
Awesome post!
I always love to read such 'Reverse Engineering' works by people! Was cool to read your approach and how you handled difficult parts!

Great job :)
}
2010-02-20 18:32:09 = Raveem
{
Syndicate Wars is one of my most favourite games. I've played it many times over the years since it came out in 1996. Very well done in porting it!

I wish I could get it working fully on OS X...
}
2010-02-22 15:12:12 = Gynvael Coldwind
{
@Ben/PVDasm
Thanks! :)

@Raveem
Thanks :)
Hmm, whats the problem on OS X? Please write to our mailing list :) (it's address is stated on the project page afair - http://swars.vexillium.org/)
}
2010-03-20 15:51:49 = DanielC
{
As for the licencing stuff, yeah it would be EA. Peter Molyneux envisioned/created the Syndicate franchise from memory, for those who don't know thats the face/lead designer of Fable and Black and White and stuff who works for Microsoft now. He said in an interview a few years ago that, he has been interested in taking a look back at the franchise to create a new one (possibly MMO-centric) however licencing is the main issue there so there's no plans for it apart from a personal interest.

Though I'm saddened to hear you guys don't plan on further developing the port, all this stuff is way too hardcore for me ='( Multiplayer and MOD support would be, while crazy difficult, amazing... there's no game like Syndicate these days. Maybe it'd be better for someone to just code a new sourceport from scratch for something like that, who knows. I'd bounty that. Just a thought... ;)

With that said, where the #@%$ is your donate link? This is amazing work!
}
2010-05-04 09:04:02 = dvwjr
{
Did you ever figure out why the SYNDICATE WARS executable required the two MAIN.exe /w /g command-line options?

What did the "/w" and the "/g" even do? Seems strange that if these "options" were actually requirements - why were they not just hard-coded into the EXE?

Thanks for any information you can provide,

dvwjr
}
2010-05-08 10:15:51 = Gynvael Coldwind
{
@DanielC
Ah, EA then ;) OK
As for the further develop, well, as I've said, no plans currently, especially when EA stated that they will be doing another Syndicate-series game, and that it will be released in near future.
As for the donate link - there isn't one ;) It's a toootally non-profit project (hey, we mainly did it so WE could play this excellent game! ;>>>>)

@dvwjr
That is an excellent question dvwjr! And to tell you the truth, we've spent some hours trying to figure this out.
And the only thing we've found out, is that when these options are missing, when starting the second mission the code execution goes to some invalid code that is totally broken and has no right to work. So it may have been some unfinished feature that was disabled by /w or /g, or maybe some kind of hardware/drivers actually required that non-working code (and it did work in that case).
Summarizing - after a few hours on it, we still have no idea wtf ;D

}

Add a comment:

Nick:
URL (optional):
Math captcha: 6 ∗ 7 + 3 =