Integer Promotion is Dodgey & Dangerous

I know this subject and every time it surprises me again. Even if you know the rules of how it works and you read K&R, it will still confuse you and you will end up being wrong in some cases. At least, that’s what happened to me. So I decided to mention the subject and to give two examples along.

Integer promotions probably happen in your code so many times, and most of us are not even aware of that fact and don’t understand the way it works. To those of you who have no idea what integer promotion is, to make a long story short: “Objects of an integral type can be converted to another wider integral type (that is, a type that can represent a larger set of values). This widening type of conversion is called “integral promotion.”, cited by MSDN. Why? So calculation can be faster in some of the times, otherwise because of different types; so seamless conversions happen, etc. There are exact rules in the standard how it works and when, you should check it out on your own

enum {foo_a = -1, foo_b} foo_t;

unsigned int a = -1;
printf("%s", a == foo_a ? "true" : "false");

Can you tell what it prints?
It will print “true”. Nothing special right, just works as we expect?
Check the next one out:

unsigned char a = -1;
printf("%s", a == foo_a ? "true" : "false");

And this time? This one will result in “false”. Only because the type of ‘a’ is unsigned. Therefore it will be promoted to unsigned integer – 0x000000ff, and compare it to 0xffffffff, which will yield false, of course.
If ‘a’ were defined as signed, it would be ok, since the integer promotions would make sure to sign extend it.

Another simple case:

unsigned char a = 5, b = 200;
unsigned int c = a * b;
printf("%d", c);

Any idea what the result is? I would expect it to be (200*5) & 0xff – aka the low byte of the result, since we multiply uchars here, and you? But then I would be wrong as well. The result is 1000, you know why? … Integer Promotions, ta da. It’s not like c = (unsigned char)(a * b); And there is what confusing sometimes.
Let’s see some Assembly then:

movzx       eax,byte ptr [a]
movzx       ecx,byte ptr [b]
imul        eax,ecx
mov         dword ptr [c],eax

Nasty, the unsigned char variables are promoted to unsigned int. Then the multiplication happens in 32 bits operand size! And then the result is not being truncated, just like that, to unsigned char again.

Why is it dangerous? I think the answer is obvious.. you trivially expect for one result when you read/write the code, but in reality something different happens. Then you end up with a small piece of code that doesn’t do what you expect it to. And then you end up with some integer overflow vulnerability without slightly noticing. Ouch.

Update: Thanks to Daniel I changed my erroneous (second) example to what I really had in mind when I wrote this post.

5 Responses to “Integer Promotion is Dodgey & Dangerous”

  1. Daniel says:

    Actually the a = a * b result would be 232, not 244 (5 * 200 = 1000 = 0x3e8, take the lower byte, 0xe8 = 232).

    Why would we expect 0 though? Even if you only do a byte multiply, it depends on what is done with the overflow. I wouldn’t expect it to be 0 just like I don’t expect additive overflow to give me a result of 0.

  2. ericwazhung says:

    Thanks for posting this…

    Here’s what kills me: It makes sense to promote in the case of multiplications, etc. Especially if the result is written to something big… But here’s an example I didn’t expect:

    uint8_t u1 = 35;
    uint8_t u2 = 55;
    uint8_t res = u1 – u2;

    printf(“u35 – u55 = %u\n”, res);
    //OK 236

    //Same math, between two integers of the same type, not multiplication…
    if(u1 – u2 != 236)
    printf(“(u35 – u55 != 236)\n”);
    // It’s not!

    if((uint8_t)(u1-u2) == 236)
    printf(“((uint8_t)(u35-u55) == 236)\n”);
    //Yay!

  3. Felix says:

    Unfortunately, your explanation to your first and second example is wrong (while the results remains the same). And many assumptions are made about type sizes etc.

    Here is a more accurate explanation for the popular case when sizeof(unsigned char) < sizeof(int) and CHAR_BIT == 8 for your environment (which is not guaranteed by C):

    Second example:

    unsigned char a = -1

    -1 here alone is an integer constant. As it has no suffix (like U, L, UL or …) and its value can be represented by an int, it IS of type int.

    Now this int is assigned to a, which is unsigned char. This could also be written as unsigned char a = (unsigned char)-1. A (signed) integer type is converted to unsigned here by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
    So -1 + 1 * (UCHAR_MAX + 1) = UCHAR_MAX. a will therefore be assigned the value 255, which is 0xff, when above assumptions are met.

    Now the comparison:

    a == foo_a

    Given that foo_a is still the enum constant -1 (therefore, it is also of type int), this is a comparison of unsigned char with value 255 and int with value -1.

    As all values of type unsigned char (inclusive 255) can be represented by type int, 255 is converted to int without problem. So the comparison would reduce to
    (int)255 == (int)-1
    which of course yields 0 (or "false").

    So your explanation that a is promoted to unsigned int is wrong.
    a is promoted to (signed) int for the comparison, but already had a value of 255 assigned by the statement before, due to an implicit cast of int -1 to unsigned char.

    The first example:

    This only works because unsigned int a = -1 assigns the maximum value of type unsigned int to a.
    In the following comparison, the enum constant -1, being of type SIGNED int, is ALSO promoted to UNSIGNED int, because a is unsigned int (and unsigned int of higher conversion rank than signed int)!

    It is not as obvious as you expected it (and told us) to be.
    Especially when signed and unsigned are mixed, things can and will get very dangerous…

  4. kagerato says:

    This is a rather old post now, but I think it’s worth a comment due to its relatively high google rank.

    First, I’d like to thank Felix for posting a much better analysis of what is actually going on here.

    Second, there are numerous reasons why C behaves the way it does with regards to integer promotions. Here’s just a few:

    (1) It causes arithmetic operators to better mimic real arithmetic. Yeah, you know, arithmetic on real numbers. People might want to do that with their programs.

    (2) It allows for many bitwise operations that require shifts to be correctly calculated without needing to use explicit expansion casts. The difference in readability of the code is quite large in that case.

    (3) Most people expect C to behave the way it does here, contrary to the OP’s claim that this “dodgey”.

    (4) The “danger” introduced by promotions is mostly caused by poorly written code that makes bad assumptions. Intentionally assigning negative numbers to an unsigned integer is an example of broken code. It’s undefined behavior! The C standard does not specify what the internal representation of integers is (they’re allowed to be sign-magnitude, 1s complement, or 2s complement). Now, it really should specify 2s complement, since that’s exactly what the vast majority of real world architectures use nowadays, but ambiguous quirks like that are almost inevitable in 40 year old languages. In any case, you can’t count on reliable behavior there. More fundamentally, the operation being performed does not make sense in basically any language no matter how well specified. A positive integer simply cannot hold a negative value. Anything more is semantic quibbling and reinterpretation tricks.

    (5) Integer expansions are value preserving, whereas integer contractions are value destroying. That has enormous implications for correctness that I’m not going to go into.

    (6) Providing implicit promotions allows the use of narrow types to avoid many integer overflows in ordinary real arithmetic. Indeed, with the right input variables, it’s possible to guarantee the output will never overflow. (For a trivial example: adding two 32-bit integers never overflows a 64-bit integer.) Without promotions, overflow is essentially guaranteed to occur in many calculations at least some of the time. If you want your code to be correct all the time, in most cases that means having to add explicit overflow checks. Restricting the input variable sizes is often both easier and makes for much cleaner code, but it would never work without promotions.

    The bottom line is this: if you want an narrowing conversion that truncates, use one. The idea that it should be the default is ludicrous, and you’d realize that if you ever had to use a language with those semantics.

  5. David Feuer says:

    This exact sort of thing is what makes programming in a strongly typed language more predictable. Doing any of those conversions in, say, Haskell, requires an explicit conversion function. Nothing is promoted unless you promote it yourself. The equivalent of

    unsigned char a = 5, b = 200;
    unsigned int c = a * b;
    printf(“%d”, c);

    would be

    a, b :: Word8
    a = 5
    b = 200

    c :: Word
    c = (fromIntegral a) * (fromIntegral b)

    main = print c

    If, instead, you wanted unpromoted multiplication, you’d just write

    c = fromIntegral (a*b)

Leave a Reply