Basic Ops #3 – Averaging Without Overflow

A few months ago I read this post here. Now I’m not talking about bugs in general but I want to focus on that specific bug. Summing two integers without causing an overflow. It might sound an easy thing to do, but taking another look reveals some implementation details that you should be careful about. As I see it, there are two problems. On some processors an addition operation that overflow might generate an interrupt. Although, I haven’t seen it on the mainstream processors, it still a challenge, because every time you add two numbers you should make sure you won’t trigger that overflow int. Or handling that interrupt – what can you really do about it? Not much, I guess… The other serious problem is that you lose precision of your result, or getting a wrong value if at all.

I decided to give an example from examining the average operation. Adding two numbers and dividing the result by two, of course. The problem arises when you add the numbers and cause an overflow. Even that same post shows some ways to overcome the overflow. Another wrong way that seems to be correct at the first time is a/2 + b/2. But messing with integers we know that we lose precision twice (for each division) for odd numbers, but we overcame the Addition potential overflow. Getting back to the (a+b)/2 equation, summing the numbers first and doing one division yeilds in only one precision lose for odd numbers, and yet the result is considered correct. That post writer suggests a correction int mid = (low + high) >>> 1; Which I still don’t see directly why it’s better, notice the ugly unsigned shift operator. I guess the standard says to use the next bigger variable size. Looking at  unsigned char x = 200, y = 100, avg = (x + y) / 2; this one will be calculated well, since the addition is done in integer size and then the result is written back as a byte. But I’m not sure whether two integers are promoted to ‘long’ automatically. Let’s say for the matter of conversation that you have to calculate the average of the largest sized registers of that processor (and for those smart asses who think of add/adc combination, leave me alone now :) ), you can’t get away without an overflow. Which leads to the next equation a + (b – a)/2. This time we assume that b – a results in a positive number, otherwise you have to switch it to a – b, which means that you will have to check (/cmp)  the numbers first. And checking is out of question. It’s all pure math, there must be some way even in the computers practical world. And I forgot to mention that the latter equation doesn’t lose any precision (it’s the same result as (a+b)/2). Finally I get to this equation: ((a & b) << 1) + (a ^ b). And more over, finally I understand it! hehe :)

If you take a look at the Addition algorithm I posted here. You will find that a&b calculates the first carry mask and shifted left by one, so it’s ready to be added again to the result of the half added value, you got it right, a ^ b. Although, in my algo I didn’t use the + operator for adding the two numbers, because that’s stupid and non-challenging. But the + here is used for propagating the carry to the higher bits at once without needing a loop. Why do I mention this equation then? Because if we take this equation and shift it to the right, this time, by one, we will get (a & b) + ((a ^ b) >> 1). Which adds the two numbers, and divides them by two. And voila, you got an average of two integer numbers without a chance of overflowing. Now this is cool! The other advantages are that the result is correct (no precision lose) and you don’t have to do any prerequisites.

BTW – To be honest I have no idea how the steps from a + b to ((a & b) << 1) + (a ^ b) are achieved, I will try to find it out sometime. But after implementing the Addition myself, I surely understand this equation and I hope you too now.

6 Responses to “Basic Ops #3 – Averaging Without Overflow”

  1. mxatone says:

    Hi,

    That’s a really interresting topic. As everyone I would add a check to see
    if the sum is inferior from one member :
    if ((a+b) < b || (a+b) < a)
    break;

    I didn’t know you could avoid such checks and then improve algorithm speed /
    security. Thanks.

    ————————————-

    Thanks for your blog and nice blog topic.

    bye

    mx-

  2. arkon says:

    Hey Mx,
    your checks are still relevant for integer overflow in the classic way. I talked about *averaging* two numbers, which for them you don’t need those tests, that’s correct.

    BTW – you can omit one of the conditions, they both mean the same thing. ;) and that’s because if A+ B overflows, the result will be necessarily less than A and less than B too…Think of the biggest unsigned number (i.e 255) and add any 8bit number, X, to it; eventually the result of the addition is x – 1…which must be less than x and 255.

  3. mxatone says:

    Yeah I know you can do only one but I was busy when you mailed me for correction (in fact my first comment was with only one check) so I just add both, lazy as I am to recheck the algorithm and remake everything in my head.

    I really love this integer overflow story, as always security “evole” with architecture and even if you got a strong algorithm (as it was) it doesn’t mean you know / see / predict everything.

  4. Jeff Doak says:

    Another simple solution is:
    (b-a)/2 + a

  5. arkon says:

    That can generate an underflow ? :)

  6. Jeff Doak says:

    I stick with my suggestion and wonder why it wasn’t tought in school that way. Of course if you are dealing with unsigned, then you also need to insure b is your higher number.

Leave a Reply