A few months ago I read this post here. Now I’m not talking about bugs in general but I want to focus on that specific bug. Summing two integers without causing an overflow. It might sound an easy thing to do, but taking another look reveals some implementation details that you should be careful about. As I see it, there are two problems. On some processors an addition operation that overflow might generate an interrupt. Although, I haven’t seen it on the mainstream processors, it still a challenge, because every time you add two numbers you should make sure you won’t trigger that overflow int. Or handling that interrupt – what can you really do about it? Not much, I guess… The other serious problem is that you lose precision of your result, or getting a wrong value if at all.
I decided to give an example from examining the average operation. Adding two numbers and dividing the result by two, of course. The problem arises when you add the numbers and cause an overflow. Even that same post shows some ways to overcome the overflow. Another wrong way that seems to be correct at the first time is a/2 + b/2. But messing with integers we know that we lose precision twice (for each division) for odd numbers, but we overcame the Addition potential overflow. Getting back to the (a+b)/2 equation, summing the numbers first and doing one division yeilds in only one precision lose for odd numbers, and yet the result is considered correct. That post writer suggests a correction int mid = (low + high) >>> 1; Which I still don’t see directly why it’s better, notice the ugly unsigned shift operator. I guess the standard says to use the next bigger variable size. Looking at unsigned char x = 200, y = 100, avg = (x + y) / 2; this one will be calculated well, since the addition is done in integer size and then the result is written back as a byte. But I’m not sure whether two integers are promoted to ‘long’ automatically. Let’s say for the matter of conversation that you have to calculate the average of the largest sized registers of that processor (and for those smart asses who think of add/adc combination, leave me alone now ), you can’t get away without an overflow. Which leads to the next equation a + (b – a)/2. This time we assume that b – a results in a positive number, otherwise you have to switch it to a – b, which means that you will have to check (/cmp) the numbers first. And checking is out of question. It’s all pure math, there must be some way even in the computers practical world. And I forgot to mention that the latter equation doesn’t lose any precision (it’s the same result as (a+b)/2). Finally I get to this equation: ((a & b) << 1) + (a ^ b). And more over, finally I understand it! hehe
If you take a look at the Addition algorithm I posted here. You will find that a&b calculates the first carry mask and shifted left by one, so it’s ready to be added again to the result of the half added value, you got it right, a ^ b. Although, in my algo I didn’t use the + operator for adding the two numbers, because that’s stupid and non-challenging. But the + here is used for propagating the carry to the higher bits at once without needing a loop. Why do I mention this equation then? Because if we take this equation and shift it to the right, this time, by one, we will get (a & b) + ((a ^ b) >> 1). Which adds the two numbers, and divides them by two. And voila, you got an average of two integer numbers without a chance of overflowing. Now this is cool! The other advantages are that the result is correct (no precision lose) and you don’t have to do any prerequisites.
BTW – To be honest I have no idea how the steps from a + b to ((a & b) << 1) + (a ^ b) are achieved, I will try to find it out sometime. But after implementing the Addition myself, I surely understand this equation and I hope you too now.