Archive for the ‘Algorithms’ Category

Basic Ops #3 – Averaging Without Overflow

Monday, October 15th, 2007

A few months ago I read this post here. Now I’m not talking about bugs in general but I want to focus on that specific bug. Summing two integers without causing an overflow. It might sound an easy thing to do, but taking another look reveals some implementation details that you should be careful about. As I see it, there are two problems. On some processors an addition operation that overflow might generate an interrupt. Although, I haven’t seen it on the mainstream processors, it still a challenge, because every time you add two numbers you should make sure you won’t trigger that overflow int. Or handling that interrupt – what can you really do about it? Not much, I guess… The other serious problem is that you lose precision of your result, or getting a wrong value if at all.

I decided to give an example from examining the average operation. Adding two numbers and dividing the result by two, of course. The problem arises when you add the numbers and cause an overflow. Even that same post shows some ways to overcome the overflow. Another wrong way that seems to be correct at the first time is a/2 + b/2. But messing with integers we know that we lose precision twice (for each division) for odd numbers, but we overcame the Addition potential overflow. Getting back to the (a+b)/2 equation, summing the numbers first and doing one division yeilds in only one precision lose for odd numbers, and yet the result is considered correct. That post writer suggests a correction int mid = (low + high) >>> 1; Which I still don’t see directly why it’s better, notice the ugly unsigned shift operator. I guess the standard says to use the next bigger variable size. Looking at  unsigned char x = 200, y = 100, avg = (x + y) / 2; this one will be calculated well, since the addition is done in integer size and then the result is written back as a byte. But I’m not sure whether two integers are promoted to ‘long’ automatically. Let’s say for the matter of conversation that you have to calculate the average of the largest sized registers of that processor (and for those smart asses who think of add/adc combination, leave me alone now :) ), you can’t get away without an overflow. Which leads to the next equation a + (b – a)/2. This time we assume that b – a results in a positive number, otherwise you have to switch it to a – b, which means that you will have to check (/cmp)  the numbers first. And checking is out of question. It’s all pure math, there must be some way even in the computers practical world. And I forgot to mention that the latter equation doesn’t lose any precision (it’s the same result as (a+b)/2). Finally I get to this equation: ((a & b) << 1) + (a ^ b). And more over, finally I understand it! hehe :)

If you take a look at the Addition algorithm I posted here. You will find that a&b calculates the first carry mask and shifted left by one, so it’s ready to be added again to the result of the half added value, you got it right, a ^ b. Although, in my algo I didn’t use the + operator for adding the two numbers, because that’s stupid and non-challenging. But the + here is used for propagating the carry to the higher bits at once without needing a loop. Why do I mention this equation then? Because if we take this equation and shift it to the right, this time, by one, we will get (a & b) + ((a ^ b) >> 1). Which adds the two numbers, and divides them by two. And voila, you got an average of two integer numbers without a chance of overflowing. Now this is cool! The other advantages are that the result is correct (no precision lose) and you don’t have to do any prerequisites.

BTW – To be honest I have no idea how the steps from a + b to ((a & b) << 1) + (a ^ b) are achieved, I will try to find it out sometime. But after implementing the Addition myself, I surely understand this equation and I hope you too now.

Code Anaylsis #1 – Oh, Please Return

Sunday, September 30th, 2007

The first thing you do when you write a high level disassembler (in contrast with diStorm which is a flat stream disassembler) is starting to scan from the entry point of the binary. Assuming you have one, for instance in .COM files it will be the very first byte in the binary file. For MZ or PE it’s a bit complicated story, yet easily achievable.

So what is this scanning really?

  Well, as I have never taken a look in any high level disassembler’s source code, my answer is from scratch only. The way I did it, was to disassemble the entry point, following control flow (branches, such as: jmp/jcc/loop, etc) and recursively add the new functions’ addresses (upon encountering the call instruction) to some list. This list will be processed until it is exhausted. So there’s some algo that will firstly insert the entry point to that function and then it will pop the first address and start analyzing it. Everytime it stumbles a new function address, it will add it to that same list. And once it’s finished analyzing the current function (for example:) by hitting the ret instruction; it will halt the inner loop, pop the next address off the list (if one exists) and continue again. The disassembled instruction/info will be stored into your favorite collection, in my case, it was a dictionary (address->instruction’s info), which later you can walk easily and print it or do anything you wish with it.

The thing is, that some functions (generated by compilers for the sake of conversation) are not always terminated by the RET instruction. It might be an IRET, and then immediately you know it’s an ISR. But that’s a simple case. Some functions end with INT3. Even that is ok. When do things get uglier? When they end with a CALL to ExitProcess (for the Win32 minded), so then your stupid scanning algo can’t determine when the function ends, because now it also has to ‘know’ the IAT and determine whether the function was ExitProcess, ExitThread or whatever Exit API there exists. So before you even made your first move with analyzing a binary code, you have to make it smarter. And that’s a bammer. My goal was to try and decide where a function start (usually easy) and where a function ends. Parsing the PE and getting the IAT is no biggy, but now it means that if you wanted to write a generic x86 disassembler, you’re screwed. So you will have to write plugins or addon (whatever, you name it) to extend the disassembler capabilities for different systems…

But even that’s ok, because the job is the same, although the project is now much bigger. And again, it all depends how accurate you wish to be. In my opinion, I try to be 99% accurate. With heuristics you cannot ask for 100%? Right? :P

So tell me, you smart aleck compiler-engineers out there, why the heck you write the generated function code in a way that it NEVER ends?

You all know the noreturn keyword or compiler extension, which states that the function doesn’t return. Yes, that’s good for functions that the (invisible) system takes control from that point, like ExitProcess, etc. I really never unerstood the reason that a programmer would like to state such a behaviour for a function. So what? Now your generated code will be optimized? To omit the RET instruction? Wow, you rock! NOT.

To be honest, talking about ExitProcess is not the case, and to be more accurate I was talking about the Linux code:

00000da6 (03) 8b40 14                  MOV EAX, [EAX+0x14] 
00000da9 (05) a3 ec7347c0              MOV [0xc04773ec], EAX 
00000dae (01) c3                       RET 
00000daf (05) 68 dcb834c0              PUSH 0xc034b8dc 
00000db4 (05) e8 8b09d1ff              CALL 0xffffffffffd11744 
00000db9 (05) 68 c5a034c0              PUSH 0xc034a0c5 
00000dbe (05) e8 8109d1ff              CALL 0xffffffffffd11744 
00000dc3 (03) 0068 00                  ADD [EAX+0x0], CH 
00000dc6 (05) 0d 39c06888              OR EAX, 0x8868c039

This is some disassembled code that I got from a good friend, Saul Tamari, while he was researching some stuff in the Linux kernel. He noticed that panic() function never returns, but this time, for real. So the problem now is that while flatly disassembling the stream you got, you go out of synchronization and start to disassemble real code in the wrong offset. You can see in the above snippet the second call, which a zero byte follows. That single byte is the end-of-function marker. How nice huh? The next instruction PUSH (68 00 …) is now out of synch, and actually is considered as a new different function.

So now tell me, how should you find this kind of noreturn function when you want to solve this puzzle only in static analysis?? It is defiantly not an easy question. We (Saul and I) had some ideas, but nothing 100% reliable. Performance was an issue also which made things harder. And there are some crazy ideas, which I will cover next time.

Meanwhile, if you got any ideas, you’re more than welcome to write them here.

Basic Ops #2 – Sub

Tuesday, August 21st, 2007

As promised, but in a long delay, the algorithm for Subtraction. Just notice how similar it is to the Addtraction algorithm. Also, try to think why we use the bitwise not operator to make it a borrow rather than carry.

unsigned int sub(unsigned int a, unsigned int b)
{
   a ^= b;
   b &= ~(a ^ b);
    while (b) {
       b <<= 1;
       a ^= b;
       b &= a;
    }
    return a;
}

Basic Ops #2 – Add/Sub

Wednesday, July 25th, 2007

After we started with the really easy implementation of Multiplication’s algorithm (right here), I will now proceed to add and sub, which are a bit more complicated. But still, nothing like the dreaded Long Division algorithm. :)

Anyways, the intresting part about add and sub is that you cetainly will use the XOR operator. If you think of it, actually XOR can calculate add and sub together, but without fixing the carry or borrow. That you will have to do yourself. But it’s still good enough for one bit operations. For example: 1-1 = 0, and 1^1 = 0. Or another example, 1 + 0=1 –> 1^0=1. So this is all nice, but what happens when you have to borrow or to calculate the carry? So if you operate on 1 bit variables and you do 1+1, you will get 0 and a carry should be propagated to the next bit. Therefore, even if this is the case, we still can use the XOR operator, because 1^1 will yield a zero, which is what we expect. So now comes the big question – how do we calculate the carry (for now I will stick to carry, and later I will talk about borrow), the carry has to propagate from the low significant bit to the highest. In my opinion this is the most fascinating device in this algorithm, because the rest is the same for both add and sub.

So to answer that, let the input be two integers: a and b. Calculating the XOR between them is a good start and can be done as easily as that. But now we would like to fix the carry and apply it to the other bits. So if we AND (bitwise) the original a and b, we will get all parallel bits that match. For example, adding 2+6 will result in:

010 &
110
—-
010

This ANDed result is the mask of the carry we begin with. And I use the word “begin” since the carry has to propogate to the higher bits, and therefore we will need to iterate a specific operation that will fix the carry on the rest of the bits. The reason we AND a and b, is because surely the result’s bits are these which will cause a carry… Once we applied the current carry, we will have to calculate the next carry. As I said earlier, we have to iterate this operation, therefore it means there must be some condition which will stop this iteration, otherwise it won’t make any sense and stuck forever. Do you have any idea what will be this condition? It’s not the number of bits of a (aka sizeof a), nor b (which we presume are the same, of course). We will run as long as there is a carry to apply to the constructed result of the addition.

By now we know almost everything, there’s only one thing left out to be done, and that’s the actual calculation of the next carry. This is the most tricky thing in the whole algorithm. We have the mask of the carry from the a&b operation, every time we apply this mask by XORing it the result, we have to remove (mask off) these bits and continue until the carry-mask is zero. Therefore it is understandable that some bits should “go home”, but which bits do we remove? We have to remove the bits that were already applied as a carry (since carry changes after every bit). The real problem is that we need to calculate the carry for all bits at once, since we can’t and don’t want to calculate the carry for each bit, since it will be less optimized and not a challenge at all. But even then, it’s not possible to apply the carry only once, since it ‘carries’ to the next bit by ‘nature’.

Well, usually I think that when code is shown no words need to be spoken, but in the case of this algorithm, some text must be written. And that was the above…So now that you should know what’s going on abit, the code will help you to fully understand the algorithm. I never claimed to explain this algorithm well, it’s a really hard one, honestly. Sorry.

I will just remind you that XOR is the key to the add and sub operations. So if you take a look at the truth-tables of XOR, you will see that it is exactly what we need, like I tried to explain in the beginning. XOR alone is not enough to calculate the result of the wanted operation, since we have to apply the carry everytime. The carry is applied using, nothing else but, XORing again…you got that right.

unsigned int add(unsigned int a, unsigned int b) 
{ 
  a ^= b; // Do a one bit addition, now we have to calculate carry and apply it if required. 
  b &= a ^ b; // b will now contain the carry-mask, note that they are XORed again to eliminate a temporary value. 
  while (b) { // As long as there is a carry to apply to the result... 
     b <<= 1; // We've got to shift left, so the carry propagates to higher bits, right? 
     a ^= b; // Apply the new carry to the result. 
     b &= ~a; // Get the next carry-mask by removing bits that were already taken into account and fixed. 
  } 
  return a; 
}

If you really want to master this algorithm, start applying it on small number with 3 bits. You will see that the next carry is wisely calculated when you will begin with a carry that have a few bits set, for example: 7+5…

I will leave the implementation of the ‘sub’ to next week, so you can try to come up with it on your own. I can only say that it is really similar to the addition’s algorithm. Think of how you would change the carry into a borrow, and how it actually affects the mask.

Basic Ops #1 – Multiplication

Saturday, June 30th, 2007

A few years a go a friend challenged me to write the basic arithmetic operations, add/sub, div/mul. But the catch was to write them without using that same instruction for doing the calculation and to make it as a long multiplication rather than log(A*B). I think the Multiply implementation is the easiest one amongst these operations, so I decided to start with it:

unsigned int mul(unsigned int a, unsigned int b)
{
  unsigned int sum = 0, d = 0;
  while (a) {
     if (a & 1) sum += (b << d);
     a >>= 1;
     d++;
  }
  return sum;
}

The idea of the algorithm, which I wrote from scratch, was to treat the input A as a mask. And anytime the next bit is set in the mask, I accumulate the shifted B. So what you get actually is long multiplication, which will work for negative numbers as well as positive ones, since the nature of 2’s complement numbers.

You should take a paper and a pencil and start playing with the algorithm by applying it on small numbers and learn how it works… :) Though this one is easy, really.