Optimize My Index Yo

I happened to work with UNICODE_STRING recently for some kernel stuff. That simple structure is similar to pascal strings in a way, you got the length and the string doesn’t have to be null terminated, the length though, is stored in bytes. Normally I don’t look at the assembly listing of the application I compile, but when you get to debug it you get to see the code the compiler generated. Since some of my functions use strings for input but as null terminated ones, I had to copy the original string to my own copy and add the null character myself. And now that I think of it, I will rewrite everything to use lengths, I don’t like extra wcslen’s. :)

Here is a simple usage case:

p = (PWCHAR)ExAllocatePool(SomePool, Str->Length + sizeof(WCHAR));
if (p == NULL) return STATUS_NO_MEMORY;
memcpy(p, Str->buffer, Str->Length);
p[Str->Length / sizeof(WCHAR)] = UNICODE_NULL;

I will show you the resulting assembly code, so you can judge yourself:

shr    esi,1 
xor    ecx,ecx 
mov  word ptr [edi+esi*2],cx 

One time the compiler converts the length to WCHAR units, as I asked. Then it realizes it should take that value and use it as an index into the unicode string, thus it has to multiply the index by two, to get to the correct offset. It’s a waste-y.
This is the output of a fully optimized code by VS08, shame.

It’s silly, but this would generate what we really want:

*(PWCHAR)((PWCHAR)p + Str->Length) = UNICODE_NULL;

With this fix, this time without the extra div/mul. I just did a few more tests and it seems the dead-code removal and the simplifier algorithms are not perfect with doing some divisions inside the indexing for pointers.

Update: Thanks to commenter Roee Shenberg, it is now clear why the compiler does this extra shr/mul. The reason is that the compiler can’t know whether the length is odd, thus it has to round it.

2 Responses to “Optimize My Index Yo”

  1. Roee Shenberg says:

    I’m not sure the first case is wrong – theoretically, ESI could be odd – the shr and then multiplication round it off (the compiler can’t know that Length will always be even)

    Also, your second code seems to be missing some casts – for it to work as you expected, you’d have to cast to a pointer with a sizeof(*p)=1, and then cast the result again to PWCHAR, but, again, you’re incorporating the knowledge that you know Length is even here, something the compiler can’t know in most cases.

  2. arkon says:

    I didn’t think of this way, you solved it

    Though it doesn’t explain the dead code removal bug, but who cares ;)
    Also we learn that sometimes we do need to do things manually, since we trust length is even.

    Thanks!

    (I will update second sample)

Leave a Reply