LOOP vs. default Mac OS X assembler

The guys at Apple seem to like old tools. Last night we worked with Unavowed on some project (I'll write about it another time) - to be more accurate, we tried to to port the project to Mac OS X - and we've stumble on an obstacle. The obstacle told us it was called Apple Inc version cctools-698.1~1, GNU assembler version 1.38. And yes, that is the default assembler (as) used on the current Mac OS X, and I certainly hope that 1.38 is just a different version naming schema, since the current version (according to wiki) is 2.19, my MinGW says it uses 2.18.50, in the year 2000 version 2.11 was released, and in the current project changelog the oldest entry tells about version 1.93.01 - that would make 1.38 reaaaally old.

Of course, there is nothing wrong with using old software. Except the unfixed bugs. And missing functionality.

I'll skip the problems with .fill, .string or .ascii directives, and with aliases for some instructions. The biggest surprise were LOOP and LOOPNE.

I seems that LOOP/LOOPNE is supported. And it even works in small programs. But, when a bigger code comes into play, it came out that the loop's argument is calculated incorrectly. Let's look at the example:

Source (AT&T syntax):
 mov    (%edi),%al
 mov    (%eax,%esi,1),%bl
 mov    %bl,(%edi)
 add    $0x1,%edi
 loop   jump_f09d4
 jmp    jump_f0a0b

The output:
(gdb) x/10i $eip
0xe318a <jump_f09d4>:    mov    (%edi),%al
0xe318c <jump_f09d4+2>:  mov    (%eax,%esi,1),%bl
0xe318f <jump_f09d4+5>:  mov    %bl,(%edi)
0xe3191 <jump_f09d4+7>:  add    $0x1,%edi
0xe3194 <jump_f09d4+10>: loop   0xe3187 <jump_f09bb+26>
0xe3196 <jump_f09d4+12>: jmp    0xe31cb <jump_f0a0b>

As one can see, the LOOP in the output should jump to 0xe318a, but it jumps to 0xe3187 - 3 bytes too early. And what's interesting, it was always 3 bytes. It look like the assembler started to think that the LOOP instruction is not made of opcode + 1 byte argument (2 bytes total), but, it thought it was opcode + 4 byte argument (5 bytes), and it used eip+5 as the jump origin, instead of eip+2 as it should - 3 byte difference. LOOPNE behave the same way. LOOPE we didn't use ;>

The LOOP problem was solved by changing the LOOP instruction to the following equal code:
 sub $0x1,%ecx

LOOPNE was replaced by this:
 jz 0f
 sub $0x1,%ecx
 jmp 1f
0:sub $0x1,%ecx

And everything suddenly started to work.

I hope Apple will upgrade it's assembler some day ;D

