As I was working on the simulation of these two instructions I found that they have some quirk, although the algorithms for these instructions are described in Intel’s specs, which (seems to) make the output defined for all inputs, it is not the case. Everytime I finish writing an implementation for a specific instruction I add that instruction to my unit tests. The instruction is being simulated with random(and some smarter) input and then checked against pure native execution to see if the results are correct. So this way I found a quirk for a range of input that reveals how the instruction is really implemented (microcode stuff prolly) rather than how it’s documented.

AL = AL + 6 is done when the AF is set or low nibble of AL is above 9. According to the documentation the destination register is AL, but in reality the destination register is AX. Now how do we know such a thing?

If we try the following input:
mov al, 0xff

The result will be 0x205, rather than 0x105 (which is what we expect according to the docs).

What really happens is that we supply a number that when added with 6 creates a carry into AH, thus incrementing AH by 1. Then looking at the docs again, we see that if AL was added with 6, it also increments AH by 1 manually. Thus AH is really incremented by 2. :P

The question is why they do AX = AX + 6, rather then operating on AL. No, actually the biggest question is why I get this same behavior on an AMD processor (whereas I work on an Intel processor). And we already by my last post about SHLD that they don’t work the same in some undefined behavior aspects (although some people believe AMD copied Intel’s architecture and implementation)…

There might be some people who will say that I went too far with testing this instruction, because I, somewhat, supply an input which is not in the valid range (it’s unpacked BCD after all), which therefore I must not rely on the output. The thing is, the algorithm is defined well to receive any input I pass it, hence I expect it to work for even undefined input. Though I believe there is no such a thing as undefined input, only undefined output, and that’s why I implemented my instrction as they both did. Specifically where they both didn’t state anything about undefined input/output, which makes my case stronger. Anyway, the point is they don’t tell us something here, the implementation is not similar to this documented in both AMD/Intel docs.

This quirk works the same for AAS, where instead of doing AL = AL -6, it’s really AX = AX – 6. I also tried to see whether they work on the whole EAX, but I saw that the high word wasn’t changed (by carry/borrow). And I also tried to see if this ill behavior is found in both DAA/DAS, but no.

6 Responses to “AAA/AAS Quirk”

  1. lorg says:

    I thought about this a little bit.
    It is better to simulate instructions for inputs with undefined outputs accurately where possible.
    I can imagine some shellcode using this undefined behavior as an obfuscation technique. Having the expression of the instruction match its behavior will counter any such move. (At least where the implementation is consistent across AMD/Intel.)

  2. arkon says:

    Well, this one was consistent between the processor, but SHLD wasn’t, so what will you do then? The annoying thing is that SHLD does have an undefined behavior documented while these two don’t. That’s why I prefered to implement them the same as the processor. Although I gave a look at Bochs And Xen, and they both have the same “bug”…

  3. Peter Ferrie says:

    I thought that this was well-known. Yes, it’s a single 16-bit operation, not two 8-bit operations.
    It occurs if al > 9, or always if AF is set in eflags.
    AMD and Intel are compatible for the 80386 CPUs, which is why they both work in this case. The behaviour is reeeally old. :-)

  4. arkon says:

    Nope, it’s 16 bit and then *also* an increment on AH.

  5. Peter Ferrie says:

    No, it’s just the single add/sub of 0x106. The carry is implicit in the operation.
    aaa 0ffh = 0ffh+106h=205h

  6. arkon says:

    cool, although it is effectively the same. thanks for the enlightment! :)

Leave a Reply