Algorithms « Insanely Low-Level

Archive for the ‘Algorithms’ Category

Instructions’ Prefixes Hell

Sunday, December 21st, 2008

Since the first day diStorm was out people didn’t know how to deal with the fact that I drop(ignore) some prefixes. It seems that dropping unused prefixes isn’t such a great feature for many people and it only complicates the scanning of streams. Therefore I am thinking about removing the whole mechanism, or maybe change it in a way that still preserves the same interface but behaves differently.

For the following stream: “67 50”, the result by diStorm will be: “db 0x67” – “push eax”. The 0x67 prefix supposes to change the address size, which none is used in our case, thus it’s dropped. However, if we look at the hex code of the “push eax” part we will see “67 50”. And this is where most of the people become dumbfounded. Getting twice the same prefix-byte of the stream in two results is in a way confusing. Taking a look at other disassemblers will tell you that diStorm is not the only one to do such games with prefixes. Sometimes I get emails regarding this “impossible” prefix – since it gets to be output twice, which is wrong, right? Well, don’t know, it depends how you choose to decode it. The way I chose to decode prefixes was really advanced, each prefix could have been ignored, unless it has really affected (one of) the operand itself. I had to really keep tracking on each prefix and know whether it affected any operands in the instructions and only then I examined which prefixes I drop or not. This all sounds right in a way. Hey, at least for me.

However, we didn’t even talk about what you will do if you have multiple prefixes of the same family (segment-overide: DS, ES, SS, etc). Now this one is really up to interpretations of the designer. Probably the way I did it in diStorm is wrong, I admit it, that’s why I want to rewrite the whole prefixes thing from the beginning. There are 4 or 5 types of prefixes and according to the specs (Intel/AMD) I quote: “A single instruction should include a maximum of one prefix from each of the five groups.” …. “The result of using multiple prefixes from a single group is unpredictable.”. This pretty much sums all the problems in the world related to prefixes. I guess you can see for yourself from these 2 lines you can actually treat them in many different ways. We know now that it can lead to “unpredictable” results if you have many prefixes – in reality it won’t shut down your CPU, it won’t even throw an exception. So screw it you say, and you’re right. Now let’s see some CPU (16 bits) logic for decoding the prefixes:

while (prefix byte is read) {
switch (prefix): {
case seg_cs: use_seg = cs; break;
case seg_ds: use_seg = ds; break;
case seg_ss: use_seg = ss; break;
….
….
case op_size: op_size = 32; break;
case op_addr: op_addr = 32; break;
case rep_z: rep = z; break;
…
}
– skip byte in stream –
}

The processor will use those flags in order to know which prefix was presented or not. The thing about using a loop (in any form) is that now that you have to show text out of some streams with many prefixes, you don’t know whether the processor really uses the first occurrance of the prefix or its last, or maybe both? And maybe Intel and AMD implement it differently?

You know what? Why the heck do I bother so much with some minor end cases that never really happen in real code sections. I ask myself too, maybe I shouldn’t. Although I happened to see for myself some malware code that tries to screw up the disassembler with many extra prefixes, etc.. and I thought diStorm could help malware analyzers as well with advanced prefixes decoding.

Anyways, according to the above logic code I’m supposed to use the last prefix of each type. Given a stream such as: 66 66 67 67 40. I will get:
0: 66 (dropped)
2: 67 (dropped)
1: 66 67 40
Now you can see that the prefixes used are the second and the fourth and that the instruction starts at the second byte on the stream. Now I officially can commit a suicide, even I can’t follow these addresses, it’s hell. So any better solution?

Posted in Algorithms, Assembly, diStorm | 6 Comments »

Signed Division In Python (and Vial)

Friday, April 25th, 2008

My stupid hosting company decided to move the site to a different server blah blah, the point is that I lost some of the recent DB changes and my email hasn’t been working for a week now :(

Anyways I repost it. The sad truth was that I had to find the post in Google’s cache in order to restore it, way to go.

Friday, April 18th, 2008:

As I was working on Vial to implement the IDIV instruction, I needed to have a signed division operator in Python. And since the x86 is a 2’s complement based, I first have to convert the number into Python’s negative (from unsigned) and only then make the operation, in my case a simple division. It was supposed to be a matter of a few minutes to code this function which gets the two operands of IDIV and return the result, but in practice it took a few bad hours.

The conversion is really easy, say we mess with 8 bits integers, then 0xff is -1, and 0×80 is -128 etc. The equation to convert it to a Python’s negative is: val – (1 << sizeof(val)*8). Of course, you do that only if the most significant bit, sign bit, is set. Eventually you return the result of val1 / val2. So far so good, but no, as I was trying to feed my IDIV with random input numbers, I saw that the result my Python’s code returns is not the same as the processor’s. This was when I started to freak out. Trying to figure out what’s the problem with my very simple snippet of code. And alas, later on I realized nothing was wrong with my code, it’s all Python’s fault.

What’s wrong with Python’s divide operator? Well, to be strict, it does not round the negative result toward 0, but towards negative infinity. Now, to be honest, I’m not really into math stuff, but all x86 processors rounds negative numbers (and positive also to be accurate) toward 0. So one would really assume Python does the same, as would C, for instance. The simple case to show what I mean is: 5/-3, in Python results in -2. Rather than -1, as the x86 IDIV instruction is expected and should return. And besides -(5/3) is not 5/-3 in Python, now it’s the time you say WTF. Which is another annoying point. But again, as I’m not a math guy, though I was speaking with many friends about this behavior, that equality (or to be accurate, inequality) is ok in real world. Seriously, what we, coders, care about real world math now? I just want to simulate a simple instruction. I really wanted to go and shout “hey there’s a bug in Python divide operator” and how come nobody saw it before? But after some digging, this behavior is really documented in Python. As much as I would hate it and many other people I know, that’s that. I even took a look at the source code of the integer division algorithm, and saw a ‘patch’ to fix the numbers to be floored if the result is negative because of C89 doesn’t define the rounding well enough.

While you’re coding something and you have a bug, you usually just start debugging your code and track it down and then fix it easily while keeping on working on the code. Because you’re in the middle of the coding phase. There are those rare times that you really get crazy when you’re absolutely sure your code is supposed to work (which it does not) and then you realize that the layer you should trust is broken (in a way). Really you want kill someone … being a good guy I won’t do that.

Did I hear anyone say modulo?? Oh don’t even bother, but this time I think that Python returns the (math) expected result rather than the CPU. But what does it matter now? I really want only to imitate the processor’s behavior. So I had to hack that one too.

The solution after all, was to make the Python’s negative number to be absolute and remember its original sign, that we do for both operands. And then we make an unsigned division and if the signs of the input are not the same we change the sign of the result. This is because we know that the unsigned division works as the processor does and we can then use it safely.

res = x/y; if (sign_of_x != sign_of_y) res = -res;

The bottom line is that I really hate this behavior in Python and it’s not a bug, after all. I’m not sure how many people like me encountered this issue. But it’s really annoying. I don’t believe they are going to fix it in Python 3, never know though.

Anyway, I got my IDIV working now, and that was the last instruction I had to cover in my unit tests. Now It’s analysis time :)

Posted in Algorithms, Assembly, Debugging, Python | 2 Comments »

Shift Double Precision

Saturday, March 29th, 2008

Were you asking me I had no idea why Intel has support for shift double precision in the 80×86. Probably their answer would be “because it used to be a CISC processor”. The shift double precision is pretty easy to implement algorithm. But maybe it was popular back then and they decided to support it hardware-ly. Like now that they add very important instructions to the SSE sets. Even so, everyone (includes me) seems to implement the algorithm like this:

(a << c) | (b >> (32-c))

Where a and b are the 32 bits input variables(/registers) and c is the count. The code shows a shift left double precision. Shifting right will require to change the shifts direction for each one of the shifts. However, if a and b were 16 bits, the equation of the second shift amount changes to (16-c). And now there is a problem, why? Because we might enter into the magical world of undefined behavior. And why is that? Because the first thing that describes the shift/rotate instructions is that the count operand is masked to preserve only the 5 least significant bits. This is because the largest shift amount for a 32 bits input is 32 shifts (and then you get a 0, ignore SAR for now). And if the input is 16 bits, the count is still masked with 31. That means that you can shift a 16 bits register more than its size. Which doesn’t make much sense, but possible for other shift instructions. But when you use a shift double preicision, not that it doesn’t makes sense, it is also undefined. That is the result is undefined, because then you try to move bits from b into a. But the count becomes negative. For example: shld ax, bx, 17. And internally the second shift amount is calculated as (16-c) which becomes (16-17). And that’s bad, right?

In reality everything is defined when it comes to digital logic. Even the undefined stuff. There must be a reason to the result I get from executing such an instruction like in the example above, even though it’s correctly and officially undefined. And I know that there is a rational behind it, because the result is consistent (at least to my Intel Core2Duo processor). So being the stubborn I am, I decided I want to know how that calculation is really being done in the hardware level.

I forgot to mention that the reason I care of how to implement this instruction is because I have to simulate it for the Vial project. I guess eventually it’s a waste of time, but I really wanted to know what’s going on anyway. Therefore I decided to research the matter and get with the algorithm my processor uses. Examining the results of officially undefined results, I quickly managed to see how to calculate the shift like the processor does, and it goes like this for 16 bits input (I guess, it will work the same for 8 bits input as well, and note that 32 bits input can’t have an undefined range, because you can’t get a negative shift amount):

def shld(a, b, c):
c &= 31
if c <= 15:
return ((a << c) | (b >> (16-c))) & 0xffff
else:

# Undefined behavior:
c &= 15
return ((b << c) | (a >> (16-c))) & 0xffff

Yes, the code is in Python. But you can see that if the the count is bigger than 15, then we are replacing the input order. And then comes the part where you say “NOW WTF?!”. Even though I got this algorithm to return the same results as the processor does for defined and undefined input, I could wager the processor won’t do this kind of stuff internally. So I sat down some (long) more, and stared at the code, doing a few experiments here and there. Eventually it occurred to me:

def shld(a, b, c):
c &= 31
x = a | (b << 16)
return ((x << c) | (x >> (32-c))) & 0xffff

Now you can see that the input for the original equation is the same bits-buffer input, which contains both inputs together as one. Taking a count of 17, won’t yield a negative register, but something else. Anyway, I have no idea why they implemented this instruction like they did (and it applies to SHRD as well), but I believe it has something to do with the way their processor so-called ‘engine’ works and hardware stuff.

After I learned how it works I was so eager to see how it works on AMD. And guess what? They don’t work the same, where it comes to the undefined behavior, of course. And since I don’t have an AMD anymore I didn’t see how they really implemented their shift double precision instructions.

In the Vial project, where I simulate these instructions, I added a special check for the count, to see that it’s not bigger than the input size, and if it is, I mark the destination register and some of the flags as Undefined. This way I will know when I do code-analysis that something is really wrong/buggy with the way the application works. Now what if the application is purposely uses the undefined behavior? Screw us both then. Now why would a sane application do that? ohh and that’s another story…

By the way, other shift/rotate instructions don’t have any problem with the shift amount since they can’t yield negative shift amount internally in any way, therefore the results are always defined for every input.

Posted in Algorithms, Assembly, Debugging, Hardware | No Comments »

Converting A Floating Point Number To A String – Explained

Sunday, March 16th, 2008

I was always curios about how to convert a floating point number into a decimal ASCII string. The equation is trivial after you get to implement it. But the idea behind took me a long while to come up with. For the sake of example I will deal with a single precision floating point of IEEE 754. However, it doesn’t really matter to me how many bits I have to convert into a string, the algorithm stays the same. Floating Point numbers have all this crap about NaNs and QaNs and other weird stuff that I didn’t care about. They are just technical stuff that you have to implement, nothing really hard about it. I am just gonna focus on the way to convert the mantissa (that’s the part of the fraction of the real number) into decimal. I think it’s rather easy to take a look at the implementation of printf in libc of Linux. That way you don’t have a challenge with coming with this, eventually, simple algo on your own.

To simplify matters, I don’t handle big exponents, again it was not my focus. If you wish to change this code to something really usable, it’s possible with some work. Here I am not going to cover the format of the IEEE 754 floating point. I presume you know it, and if not, give it a look here.

void pf(unsigned long x)
{
 unsigned int sign = x >> 31;
 unsigned int exp = ((x >> 23) & 0xff) - 127;
 unsigned int man = x & ((1 << 23) - 1);
 man |= 1 << 23;
 if (sign) printf("-");
 printf("%d", man >> (23 - exp));
 printf(".");
 unsigned int frac = man & ((1 << (23-exp)) - 1);
 unsigned int base = 1 << (23 - exp);
 int c = 0;
 while (frac != 0 && c++ < 6)
 {
  frac *= 10;
  printf("%d", (frac / base));
  frac %= base;
 }
 printf("\n");}
}

You can see in the beginning that we extract the sign bit, exponent and mantissa from the raw number we got as a parameter. Then we fix the exponent since it is biased, though it’s a really nice trick, I won’t cover it today. And we set bit 24 in the mantissa, because we assume we got a normalized floating point number where bit 24th in the data is implied and not saved to spare this extra bit… Printing the sign is self explantory. And then we print the integer part of the mantissa, while taking care of the exponent. This is where a too big or too small exponent will screw up things. But for small integers everything is fine just yet.

Now the fun begins, converting the mantissa from base 2 to base 10. Yipi kay hey. Let’s see. We store the fraction part of the floating point number and the base. For now let’s ignore this base variable and we will get back to it later. We limit the loop to print 6 digits or less when there is no more what to print. The algorithm here is to ‘pull’ the digits over to be in the integer part of the number and print that part which is straight forward (though it’s only a digit and not a whole integer). Then we see how much more digits we have left to print and repeat the body of the loop until we’re done or printed enough digits.

I will try to explain this loop again this way: Let’s say you want to print the number 5/6 in decimal. How will you do that? That’s the key algorithm for converting the number to decimal as well in our case. I was told by a friend that children are taught this method in elemtary school, maybe I lacked of that class or we didn’t have it. :) Then what we are doing is: multiplying the 5 by 10 then see how many times it gets in the result: 50/6 = 8, then we do a modulu with 6, getting 50%6=2. And starting again, 2*10=20; 20 / 6 = 3; 20%6=2; 2*10=20; 20/6=3 and over and over again, the final result is 0.8333… This is similar to 1/3, try this algo on 1/4 and you will see that you get 2 and then 5 and then you reach 0 and stop, resulting in 0.25.

Back to the code above, we take the fraction part and we have a base, take a long breath – of 2 powered by the number of bits the fraction use to be stored. Why is that the base?

Like when you convert an integer to decimal, you have to multiply every set bit by 2**index, where the index is the index of the set bit in the integer stream. For example: 111b is 1*2**0 + 1*2**1 + 1*2**2 = 1 + 2 + 4 = 7. The same goes for bits that are on the right side of the point (thus fractions): .11b is 1*2**(-1) + 1*2**(-2) = 1/2 + 1/4 = 0.75. Notice that the indices this time are negative, and power of negative number is division, therefore: 1/base**(-index). Now instead of doing this conversion per bit, we can do it simply by dividing the fraction once, in our example: 11b by 4. (base powered by number of bits, 2**2=4). Now we treat the fraction as an integer and divide it by the new calculated base; we get 3/4 = 0.75. We saw how the conversion can be done easily and we need a way to print the result of such division… And now we’re back to the description above of how to print a simple fraction. This time we know the reasons behind the fraction and the base. Note that the base is bigger than the fraction by nature, otherwise we won’t have a correct input for the algo and then it won’t work well.

How to use the above code:
float x = 1337.99;
pf(*(unsigned long*)&x);
printf(“%f”, x);

Running the code using a different number every time, you will see that printf %f actually rounds the number where we print it as a raw floating number, without touching it. Therefore, next time I will talk about rounding the number.

Posted in Algorithms | 5 Comments »

Basic Ops #4 – Division

Saturday, March 1st, 2008

I wasn’t sure whether to post this one or not. More over, my dilemma was whether to document this one or not. I decided that it’s so hard to explain what’s going on, that I will only dump my code here. The interface is ugly, because you cannot divide by zero…

// return success, *ret = a / b
int div(unsigned short a, unsigned short b, unsigned short *ret)
{
if (b == 0) return 0;
*ret = 0;
unsigned short mask = 0, len = 0;

// calc the mask of b.
for (int i = 15; i >= 0; i–) {
if (b & (1 << i)) {
mask = (1 << i) – 1;
len = 16 – i;
break;
}
}
// complement mask.
mask = ~mask;

// left justify b.
while ((~b) & (1 << 15)) b <<= 1;

while (len–) {
*ret <<= 1;
if ((a & mask) >= b) {
*ret |= 1;
a -= b;
}
b >>= 1;
mask >>= 1;
}

return 1;
}

And yes it’s 16 bits division, it will apply for 32 or 64 bits as well. It really was one of the toughest things to write from scratch. Division is not that trivial like subtraction or multiplication. The trick eventually was to use the fact that we’re dealing with binary division..oh well, I quit. Get someone else to describe this one. Although I think there are many ways to implement this one. I remember a friend telling me he didn’t expect this way of solution. I don’t think, however, that this technique may be used for long division for bit array or really big numbers, maybe there’s something faster.

By the way, this is an unsigned division. As far as I know, and I might be wrong, you can’t do signed division without using unsigned division. And to do signed division you just remember the sign of a and b and then make them absolute and then you change the result accordingly. Anyways, if I’m wrong with my assumption, let me know otherwise.

Posted in Algorithms | 3 Comments »

Converting An Integer To Decimal – Assembly Style

Wednesday, February 13th, 2008

I know this is one of the most trivial things to implement in a high level language. In C it goes like this:

void print_integer(unsigned int x)
{
if ((x / 10) != 0) pi(x / 10);
putch((x % 10) + ‘0’);
}

The annoying thing is that you have to div twice and do a modulo once. Which in reality can be merged into a single X86 instruction. Another thing is that if you want to be able to print a normal result for an input of 0 you will have to test the result of the division instead of checking simply x itself. The conversion is done from the least significant digit to the most. But when we display the result (or put it in a buffer) we have to reverse it. Therefore the recursion is so handy here. This is my go in 16bit, it’s a code that I wrote a few years ago, and I just decided I should put it here for a reference. I have to admit that I happened to use this same code for also 32bits or even different processors and since it’s so elegant it works so well and easy to port. But I leave it for you to judge ;)

bits 16
mov ax, 12345
call print_integer
xor ax, ax
call print_integer
ret

print_integer:
; base 10
push byte 10
pop bx
.next:; make a 32 bits division, remainder in dx, quotient in ax
xor dx, dx
div bx
push dx ; push remainder
or ax, ax ; if the result if 0, we will stop recursing
jz .stop
call .next ; now this is the coolest twist ever, the IP that is pushed onto the stack…
.stop:
pop ax ; get the remainder (in reversed order)
add al, 0x30 ; convert it to a character
int 0x29 ; use (what used to be an undocumented) interrupt to print al
ret ; go back to ‘stop’ and read the next digit…

I urge you to compile the C code with full optimization and compare the codes for yourself.

Posted in Algorithms, Assembly | 4 Comments »

Python: Converting Signedness

Friday, November 30th, 2007

Many times I encounter Python modules that were written in C/C++ which when you use them you get to a really annoying problem. Integers that that module returns are all unsigned. That’s really a problem, because Python longs are (potentially) inifinite (they are digit-strings) AFAIK, unlike C/C++ integers that have a limit of, usually, 32 or 64 bits.

If you pass back -1 to Python and format it as unsigned long, you will get 2**32-1. Now suppose you really want to treat the number inside your Python code as signed integer, you will have to convert 2**32-1 to -1 somehow. Most solutions I saw to this problem was to use struct in the following manner:

import struct
struct.unpack(“l”, struct.pack(“>L”, 2**32-1))[0]

Packs an unsigned long value of 0xffffffff to (signed) long -1 (using little endian, but I don’t care now about it).

You might want to use unsigned long long – that’s 64 bits integers in order to convert a bigger number. So you can do one of two things, convert your 32 bits integer to 64 bits integer by sign extending (and that’s another whole challenge) it and stuff it into unpack/pack of 64 bits, or test the size of the integer (by how many bits it takes) and call the correct unpack/pack pair.

It was then that I realized why I shouldn’t use this ugly trick. It simply doesn’t support Python’s longs. As I said earlier they are infinite and using this trick you are capped to 64 bits. So I thought of a better way, and not using any stdlib module at all, leaving it pure Python code…

The way we know to change the signedness of an integer is by negating it, which is NOTing all bits of that number and incrementing it by 1, right? (2’s complement) Well true and that should work:

(Say we work with 8 bits integers)

0xfb = -5

>>> ~0xfb
-252
>>> ~0xfb+1
-251
>>> 256-251
5
>>> 0-5
-5

The way it works is: -(256 + (~x+1)) where x is a byte. For every integer we need to scan for its most significant bit… We can do it the other way around with the following formula (x – NEXT_MSB(x)):

>>> 0xfb – 0x100
-5

This way it’s like we changed the sign bit of the integer and fixed the number as well. Both formulas can work for all integers’ sizes. But the key here is to find the MSB. I prefered to stick to the latter formula rather than the former, since it seems to be shorter. But it doesn’t really matter, both work.

So now we have to code it, and as you should know me already – in one-liner device! The first challenge is to find the MSB, something like this should suffice in C(!):

for (int i = 7; i >= 0; i–)
if (x & (1 << i)) break;

This will work for a byte integer, and note that we must start from the end towards the start. Otherwise we won’t find the MSB but the LSB. The problem in Python is that we don’t know the size of the longs we mess with and we need to come up with a trick to find its MSB.

The lame trick I used for converting a number into decimal ASCII string, will be used here too and it goes like this:

for i in xrange(2**32):
if (i / (1 << i)):
break

We try to divide the input number by 2, 4, 8, 16, 32, … and when the result is 0, we know that we are out of bits. I said it’s lame because we use division, which is slow. If you got any other idea write to me please.

Another drawback is the limit of the numbers we scan, we are limited to 2**32, this is huge enough and I guess you will never reach that, or I will be dead first prolly :o. Using Erez’s trick (see here), we can make it a bit more elegant and stop as soon as the MSB was found.

I am not sure whether you noticed, but supplying an input of a negative number isn’t a smart move, we will have to check for it specifically. Eventually this is the code I came up with:

(Note that the input “bits-stream” can be any size)

def signed(n):
return n if n < 0 else n – [i for i in (2**j if n/(2**(j-1)) else iter(()).next() for j in xrange(2**31-1))][-1]

>>> signed(0xb)
-5
>>> signed(0xfb)
-5
>>> signed(0xffffb)
-5
>>> signed(0xffffffffffffffffffffffffb)
-5L

GZ-LN

Posted in Algorithms, Python | 3 Comments »

About DIV, IDIV and Overflows

Thursday, November 8th, 2007

The IDIV instruction is a divide operation. It is less popular than its counterpart DIV. The different between the two is that IDIV is for signed numbers wheareas DIV is for unsigned numbers. I guess the “i” in IDIV means Integer, thus implying a signed integer. Sometimes I still wonder why they didn’t name it SDIV, which is much readable and self explantory. But the name is not the real issue here. However, I would like to say that there is a difference between signed and unsigned division. Otherwise they wouldn’t have been two different instructions in the first place, right? :) What a smart ass… The reason it is necessary to have them both is because signed division is behaving differently than unsigned division. Looking at a finite string of bits (i.e, unsigned char) which has a value of -2 and trying to unsigned divide that by -1, will result in 0, since if we take a look at the numbers as unsigneds – 0xfe and 0xff. And naively asking how many times 0xff is contained inside 0xfe, will result in 0. Now that’s a shame because we would like to treat the division as signed. For that, the algorithm is a bit more complex. I am really not a Math guy. So I don’t wanna get into dirty details of how the signed division works. I will leave that algorithm for the BasicOps column of posts… Anyway, I can just say that if you have an unsigned division you can use it to do a signed division of the same operands size.

Some processors only have signed division instructions. So for doing an unsigned division, one might convert the operands to the next bigger size and then do the signed division. Which means the high half of the operand is zero, which makes the division work as expected.

With x86, luckily we don’t have to do some nifty tricks, we have them straight away, DIV and IDIV, for our use. Unlike multiplication, when there is an overflow in division, a division overflow will be raised, wheareas in multiplication only the CF and OF flags will be set. If we like it or not this is the situation. Therefore it’s necessary to convert the numbers before doing the operation. Sign extension or zero extension (depending on the signedness of operands) and only then do the division operation.

What I really wanted to talk about is the way the overflow is detected by the processor. I am interested in that behavior since I write a simple x86 simulator as part of the diStorm3 project. So truly, my code is the “processor” or should I say the virtual machine…Anyhow, the Intel documentation for the IDIV instruction shows some psuedo algorithm:

temp = AX / src; // Signed division by 8bit
if (temp > 0x7F) or (temp < 0x80)
// If a positive result is greater than 7FH or a negative result is less than 80H
then #DE; // Divide error

src is a register/immediate or a memory indirection, which results in a 8bits value that will be signed extended to 16bits and only then will be signed divided by AX. So far so good, nothing special.

Then comes some stupid looking if statement. Which on the first look says, that if temp is 0x7f or 0x80 then bam, raise the exception. So you ask yourself how these special values have anything to do with overflowing.

Reading on the next comment makes things clearer, since for 8bits input, the division is done on 16bits, and the result is stored inside 8bits that are signed values, the result can vary from -128 to 127. Thus, if the result is positive, and the value is above 127, there is an overflow, because then the value will be treated as a negative number, which is a no no. And same for negative results: if the result is negative and the value is below 128 there is an overflow. Since the negative number cannot be represented in 8bits and as a signed number.

It is vital to understand that overflow means that a resulting value cannot be stored inside its destination because it’s too low or too big to be represented in that container. Don’t confuse it with carry [flag].

So how do we know if the result is positive or negative? If we take a look at temp as a byte sized, we can’t really know. But that’s why we got temp as 16bits. That extra half of temp (high byte) is really the hint for the sign of the whole value. If the high byte is 0xff, we know the result is negative, otherwise the result is positive. Well I’m not 100% accurate it, but let’s keep things simple for matter of conversation. Anyway, it is enough to examine the most significant bit of temp to know its sign. So let’s take a look at the if statement again now that we have more knowledge about the case.

if temp[15] == 0 and temp > 127 : raise overflow

Suddenly it makes sense, huh? Because we assure the number is positive (doesn’t have the sign bit set) and the result is yet higher than 127, and thus cannot be represented as a sign value in a 8bits container.

Now, let’s examine its counterpart guard for negative numbers:

if temp[15] == 1 and temp < 128: raise overflow

Ok, I tried to fool here. We have a problem. Remember that temp is 16bits long? It means that if, for example, the result of temp after the division is -1 (0xffff), our condition is still true and will raise an overflow exception, where the result is really valid (0xff represents -1 in 8bits as well). The problem origin is in the signed comparison. By now, you should understood that the first if statement for a positive number uses an unsigned comparison as well, although temp is a signed value.

We are left with one option since we are forced to use unsigned comparisons, (my virtual processor supports only unsigned comparisons), then we have to convert the signed 128 value into a 16bits unsigned value, which is 0xff80. As easy as that, just signed extend it…

So taking that value and putting it in its place we get the following if statement:

if temp[15] == 1 and temp < 0xff80: raise exception

We know by now that temp is being compared to as an unsigned number. Therefore, if the result was a negative number (must be above 0x8000) and yet it was below 0xff80, then we cannot represent that value in a 8bits signed container, and we have to raise the division error exception.

Eventually we want to merge both if statements to be one, sparing some basic boolean algebra, we end up with:

if (temp > 0x7f) && ((temp < 0x8000) || (temp > 0xff80)):

then raise exception…

Posted in Algorithms, Assembly, diStorm | 3 Comments »

Challenge: One-Liner For Converting a Decimal

Thursday, October 25th, 2007

Or – a one-liner device to convert a decimal number to a string, using any base, which is lower than 10. If you want to use bases which are above 10, you will have to construct a table somehow that goes from ‘0’ to ‘9’ and then continues from ‘a’ to the required base, (or you can use a static table). So suppose we are dealing with a base <= 10, we only need to convert it to ascii, so it’s pretty simple.

If you didn’t figure it out until now (and how could you?) I’m talking about Python here. There is this int() function (actually it’s a class type to be more accurate, its constructor), which converts any string to a decimal number. Say, int(‘101’, 2) will result in 5. But the opposite operation is no where to be seen.

The straight forward way is easy:

while(n > 0):
l.append(n%BASE)
n /= BASE
“”.join(map(str, l[::-1]))

Though, it’s an ugly way, just to show the principle. We can do it with recursion, and then we don’t need to reverse the result, by a side effect of recursion.

When I decided to write the conversion function just for the fun of it, I wanted it to not use recursion…because with recursion it’s really easy. :) So why to make our life simple when we do things for learning and sport? Besides, for some people recursion is less intuitive, althought we might argue abou it.

So here’s my first version:

“”.join([str((n/i)%b) for i in [b**j for j in xrange(31, -1, -1)]]).lstrip(‘0’)

At the beginning I use chr((n/i)%b + 0x30), because I’m used to deal with char arrays and thinking old school C code. So Kasperle came up with the str thingy, which is much better for code readability.

Anyway, I really got pissed with the idea that I have to drop all leading zero, otherwise for n=5, I will get an input of ‘00000000000000000000000000001110’, which is quite cumbersome.

One drawback is the size of integer we want to convert, as you probably guessed, this code supports 32 bit numbers, it might support any number in a jiffy… But then you will probably have to strip more zeros most of the times. ;( Enough fooling around.

What I’m really trying to achieve is to use the code to convert any sized number, without the need of any constant magic value in my one-liner.

So trying to come up with the accurate number of digits to convert in the first place is the really the bugging trick. What we really need is something like math.log. Using the log we can know the number of digits at once. But then we need to import math. Do we count ‘import’s when we say one-liner or not? Well, I will take it as No. Hardening my life without math.

“”.join([str((n/i)%b) for i in [b**j for j in xrange(math.log(n, b), -1, -1)]])

I could have used the input number for the xrange, but it won’t return ‘0’ for an input zero number. And even so, it’s kidna cheating and lame.

Technically, the solution is to generate a list with [1, 10, 100, 1000, ….]. The condition to stop is when n/entry == 0. The problem to make this list is how to generate it on the fly? :) or how to stop generating it.

Well, AFAIK in Python it’s not possible. So I’m trying to simulate log. Imri just suggested to use a rate number for a log approximation which will be base dependent. But I didn’t like that idea – magic numbers, remember? And maybe even losing precision.

By now, Kasperle, who was the recursion guy, lost his patience with my stupid challenge. Imri is trying to calculate crazy numbers for log approximations, which I stopped following long ago. :)

FYI: Kasperle’s code, which is pretty cool, goes like this:

foo = lambda n,base: (n or “”) and (str(foo( n / base, base)) + str( n % base))

Notice the way the recursion stops…However, in one-liner code, I prefer assigning the result to a value, rather than assign the lambda and call it. But it’s also possible to do, for instance: x = (lambda y: y+1)(0). But if you ask me, I don’t really like this notation.

Then Imri suggested another idea using sqrt, but I objected since we need math. The truth is that you can do x**0.5 in Python. But eventually his solution wasn’t good enough.

ARRRG, As for now I am giving up :(. If you have another idea, let me know.

Posted in Algorithms, Python | 3 Comments »

Lambdas Forever

Saturday, October 20th, 2007

Ahhh Python, what a splendid scripting language. One of the most likeable features is the anonymous functions, aka Lambda. The lambda is actually a way to write/implement a simple one liner function in-place. The official docs says:

“Lambda forms (lambda expressions) have the same syntactic position as expressions. They are a shorthand to create anonymous functions; the expression lambda arguments: expression yields a function object.”

Instead of implemented the damned comparison function for sorting, you probably all know what I’m talking about:
def ComparisonProc(x, y):
return y – x
list.sort(ComparisonProc)

We can simply do:
list.sort(lambda x, y: y – x) # Descending
and voila.

This is a very simple example. Lambdas are really handy when you want to do one liner devices. Some of them which you manage to stuff in one line and some which you just can’t. However, without lambda it wouldn’t have been possible in the first place.

There are many samples in the Internet. I came up with something, hopefully even useful. Let’s say you want to print all .JPG files on your c:\, including subdirectories. So we have to scan the h.d for all files, then filter those with .JPG extension and afterwards print the result. :) Yes this is all possible in one-liner, let’s see the original code first:

for root, dirs, files in os.walk(‘c:’):
    for i in files:
            if i[-4:] == “.jpg”:
                    print i

The one-liner version:

print filter(lambda name: name[-4:] == ".jpg", reduce(lambda x,y:x+y, [i[2] for i in os.walk('c:')]))

See? Easy :)

Actually now, I have to explain a few more things.
1) We are only interested in the Files list from os.walk, therefore we take the third entry in the result, that’s i[2].

2) The i[2] itself, is a list, and we cannot filter a list of lists with the file names, therefore we have to flatten the lists to a single list containing the file names. This is where the reduce comes in, it will return the accumulated result of all lambdas – each time calling the lambda with the accumulated x and supplying the next item, y. Thus, adding the lists extends the resulting list and flatten them…

3) Now that we the a single list with all file names in the system, we need to filter out the files which are not .JPG. So yet again we use a lambda that checks the last 4 characters in the file name and assures whether it is a .JPG, all other files will be removed from the resulting list.

4) Lastly, print the result. Actually you can use pretty print (module pprint) to print it prettier :)

Yes, Python’s crazy!

So what’s the annoying things with lambdas? They are slow relatively to list comprehensions (which we used to get all lists of file names above). But again, if we are using scripting – are we after speed? I am not sure. Another irritating thing about lambdas is that you cannot assign inside the expression, but then you have reduce.. :)

The worst thing about lambdas is when you use global variables, and let me explain. Since lambdas are evaluated at runtime (I hope I am right here) if you access some variables outside of the lambda, they will get re-evaluated everytime with the lambda itself. Now think that you wanted the lambda to have a specific value when you created the lambda, and then when you really call the lambda, that value was already changed and your result is screwed.

Enough words, let’s see the problem with some code:

>>> x, y = [lambda z: 5+z+i for i in xrange(2)]
>>> x(0)
6
>>> y(0)
6

Oh uh, we’re in trouble!

Do you notice we get the same result for both functions? This is incorrect because they are not supposed to return the same value. Note this:

>>> i
1

So now when both lambdas, x and y, are evaluated they use i as 1. Sucks huh?

>>> i = 3
>>> x(0)
8
>>> y(0)
8
“Use Nested Lambdas, Luke”

x, y = [(lambda v: lambda z: 5 + v)(i) for i in xrange(2)]
>>> x, y = [(lambda v: lambda z: 5 + v)(i) for i in xrange(2)]
>>> x(0)
5
>>> y(0)
6
>>>

The outter lambda is get evaluated immediately and thus leaves the value, and not the pointer to the value, in the code. Next time when the inner lambda is evaluated it uses the value-of(i) and not the value-from-pointer-of(i).

This surely will help someone out there :) And then they say Lambdas are going to be deprecated in Python 3000…

[Updated]

Thanks to Kasperle, here’s another solution to the nested lambdas:

x, y = [lambda z, dummy = i: 5 + dummy for i in xrange(2)]

The drawback is that x and y can now get a second parameter which you potentially let the caller override…Or if we’re talking about Python 2.5 you can use functools.partial.

Posted in Algorithms, Python | 7 Comments »

Insanely Low-Level