I know this subject and every time it surprises me again. Even if you know the rules of how it works and you read K&R, it will still confuse you and you will end up being wrong in some cases. At least, that’s what happened to me. So I decided to mention the subject and to give two examples along.
Integer promotions probably happen in your code so many times, and most of us are not even aware of that fact and don’t understand the way it works. To those of you who have no idea what integer promotion is, to make a long story short: “Objects of an integral type can be converted to another wider integral type (that is, a type that can represent a larger set of values). This widening type of conversion is called “integral promotion.”, cited by MSDN. Why? So calculation can be faster in some of the times, otherwise because of different types; so seamless conversions happen, etc. There are exact rules in the standard how it works and when, you should check it out on your own
Can you tell what it prints?
It will print “true”. Nothing special right, just works as we expect?
Check the next one out:
And this time? This one will result in “false”. Only because the type of ‘a’ is unsigned. Therefore it will be promoted to unsigned integer – 0x000000ff, and compare it to 0xffffffff, which will yield false, of course.
If ‘a’ were defined as signed, it would be ok, since the integer promotions would make sure to sign extend it.
Another simple case:
Any idea what the result is? I would expect it to be (200*5) & 0xff – aka the low byte of the result, since we multiply uchars here, and you? But then I would be wrong as well. The result is 1000, you know why? … Integer Promotions, ta da. It’s not like c = (unsigned char)(a * b); And there is what confusing sometimes.
Let’s see some Assembly then:
movzx ecx,byte ptr [b]
imul eax,ecx
mov dword ptr [c],eax
Nasty, the unsigned char variables are promoted to unsigned int. Then the multiplication happens in 32 bits operand size! And then the result is not being truncated, just like that, to unsigned char again.
Why is it dangerous? I think the answer is obvious.. you trivially expect for one result when you read/write the code, but in reality something different happens. Then you end up with a small piece of code that doesn’t do what you expect it to. And then you end up with some integer overflow vulnerability without slightly noticing. Ouch.
Update: Thanks to Daniel I changed my erroneous (second) example to what I really had in mind when I wrote this post.
Actually the a = a * b result would be 232, not 244 (5 * 200 = 1000 = 0x3e8, take the lower byte, 0xe8 = 232).
Why would we expect 0 though? Even if you only do a byte multiply, it depends on what is done with the overflow. I wouldn’t expect it to be 0 just like I don’t expect additive overflow to give me a result of 0.
Thanks for posting this…
Here’s what kills me: It makes sense to promote in the case of multiplications, etc. Especially if the result is written to something big… But here’s an example I didn’t expect:
uint8_t u1 = 35;
uint8_t u2 = 55;
uint8_t res = u1 – u2;
printf(“u35 – u55 = %u\n”, res);
//OK 236
//Same math, between two integers of the same type, not multiplication…
if(u1 – u2 != 236)
printf(“(u35 – u55 != 236)\n”);
// It’s not!
if((uint8_t)(u1-u2) == 236)
printf(“((uint8_t)(u35-u55) == 236)\n”);
//Yay!
Unfortunately, your explanation to your first and second example is wrong (while the results remains the same). And many assumptions are made about type sizes etc.
Here is a more accurate explanation for the popular case when sizeof(unsigned char) < sizeof(int) and CHAR_BIT == 8 for your environment (which is not guaranteed by C):
Second example:
unsigned char a = -1
-1 here alone is an integer constant. As it has no suffix (like U, L, UL or …) and its value can be represented by an int, it IS of type int.
Now this int is assigned to a, which is unsigned char. This could also be written as unsigned char a = (unsigned char)-1. A (signed) integer type is converted to unsigned here by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
So -1 + 1 * (UCHAR_MAX + 1) = UCHAR_MAX. a will therefore be assigned the value 255, which is 0xff, when above assumptions are met.
Now the comparison:
a == foo_a
Given that foo_a is still the enum constant -1 (therefore, it is also of type int), this is a comparison of unsigned char with value 255 and int with value -1.
As all values of type unsigned char (inclusive 255) can be represented by type int, 255 is converted to int without problem. So the comparison would reduce to
(int)255 == (int)-1
which of course yields 0 (or "false").
So your explanation that a is promoted to unsigned int is wrong.
a is promoted to (signed) int for the comparison, but already had a value of 255 assigned by the statement before, due to an implicit cast of int -1 to unsigned char.
The first example:
This only works because unsigned int a = -1 assigns the maximum value of type unsigned int to a.
In the following comparison, the enum constant -1, being of type SIGNED int, is ALSO promoted to UNSIGNED int, because a is unsigned int (and unsigned int of higher conversion rank than signed int)!
It is not as obvious as you expected it (and told us) to be.
Especially when signed and unsigned are mixed, things can and will get very dangerous…