Archive for the ‘diStorm’ Category

Finding Kernel32 Base Address Shellcode

Thursday, July 7th, 2011

Yet another one…
This time, smaller, more correct, and still null-free.
I looked a bit at some shellcodes at exploit-db and googled too, to see whether anyone got a smaller way to no avail.

I based my code on:
AFAIK, who based his post on:

And this is my version:

00000000 (02) 6a30                     PUSH 0x30
00000002 (01) 5e                       POP ESI
; Use DB 0x64; LODSD
00000003 (02) 64ad                     LODS EAX, [FS:ESI]
00000005 (03) 8b700c                   MOV ESI, [EAX+0xc]
00000008 (03) 8b761c                   MOV ESI, [ESI+0x1c]
0000000b (03) 8b5608                   MOV EDX, [ESI+0x8]
0000000e (04) 807e1c18                 CMP BYTE [ESI+0x1c], 0x18
00000012 (02) 8b36                     MOV ESI, [ESI]
00000014 (02) 75f5                     JNZ 0xb

The tricky part was how to read from FS:0x30, and the way I use is the smallest one, at least from what I checked.
Another issue that was fixed is the check for kernel32.dll, usually the variation of this shellcode checks for a null byte, but it turned out to be bogous on W2k machines, so it was changed to check for a null word. Getting the shellcode by a byte or two longer.

This way, it’s only 22 bytes, it doesn’t assume that kernel32.dll is the second/third entry in the list, it actually loops till it finds the correct module length (len of ‘kernel32.dll’ * 2 bytes). Also since kernelbase.dll can come first and that renders lots of implementations of this technique unusable.
And obviously the resulting base address of kernel32.dll is in EDX.


[Update July 9th:]
Here’s a link to an explanation about PEB/LDR lists.
See first comment for a better version which is only 17 bytes.

Private Symbols Look Up by Binary Signatures

Friday, July 1st, 2011

This post could really be extended and divided into a few posts, but I decided to try and keep it small as much as I can. If I see it draws serious attention I might elaborate on the topic.

Signature matching for finding functions is a very old technique, but I haven’t found anyone who talks about it with juicy details or at all, and decided to show you a real life example. It is related to the last post about finding service functions in the kernel. The problem is that sometimes inside the kernel you want to use internal functions, which are not exported. Don’t start with “this is not documented story”, I don’t care, sometimes we need to get things done no matter what. Sometimes there is no documented way to do what you want. Even in legitimate code, it doesn’t have to be a rootkit, alright? I can say, however, that when you wanna add new functionality to an existing and working system, in whatever level it might be, you would better depend as much as you can on the existing functionality that was written by the original programmers of that system. So yes, it requires lot of good reversing, before injecting more code and mess up with the process.
The example of a signature I’m going to talk about is again about getting the function ZwProtectVirtualMemory address in the kernel. See the old post here to remember what’s going on. Obviously the solution in the older post is almost 100% reliable, because we have anchors to rely upon. But sometimes with signature matching the only anchors you have are binary pieces of:
* immediate operand values
* strings
* xrefs
* disassembled instructions
* a call graph to walk on
and the list gets longer and can really get crazy and it does, but that’s another story.

I don’t wanna convert this post into a guideline of how to write a good signature, though I have lots of experience with it, even for various archs, though I will just say that you never wanna put binary code as part of your signature, only in extreme cases (I am talking about the actual opcode bytes), simply because you usually don’t know what the compiler is going to do with the source code, how it’s going to look in assembly, etc. The idea of a good signature is that it will be as generic as possible so it will survive (hopefully) the updates of the target binary you’re searching in. This is probably the most important rule about binary signatures. Unfortunately we can never guarantee a signature is to be future compatible with new updates. But always test that the signature matches on a few versions of the binary file. Suppose it’s a .DLL, then try to get as many versions of that DLL file as possible and make a script to try it out on all of them, a must. The more DLLs the signature is able to work on successfully, the better the signature is! Usually the goal is to write a single signature that covers all versions at once.
The reason you can’t rely on opcodes in your binary signature is because they get changed many times, almost in every compilation of the code in a different version, the compiler will allocate new registers for the instructions and thus change the instructions. Or since code might get compiled to many variations which effectively do the same thing, I.E: MOV EAX, 0 and XOR EAX, EAX.
One more note, a good signature is one that you can find FAST. We don’t really wanna disassemble the whole file and run on the listing it generated. Anyway, caching is always a good idea and if you have many passes to do for many signatures to find different things, you can always cache lots of stuff, and save precious loading time. So think well before you write a signature and be sure you’re using a good algorithm. Finding an XREF for a relative branch takes lots of time, try to avoid that, that should be cached, in one pass of scanning the whole code section of the file, into a dictionary of “target:source” pairs, with false positives (another long story) that can be looked up for a range of addresses…

I almost forgot to mention, I used such a binary signature inside the patch I wrote as a member of ZERT, for closing a vulnerability in Internet Explorer, I needed to find the weak function and patch it in memory, so you can both grab the source code and see for yourself. Though the example does use opcodes (and lots of them) as part of the signature, but there’s special reason for it. Long story made short: The signature won’t match once the function will get officially patched by MS (recall that we published that solution before MS acted), and then this way the patcher will know that it didn’t find the signature and probably the function was already patched well, so we don’t need to patch it on top of the new patch.. confusing shit.

The reason I find signatures amazing is because only reversers can do them well and it takes lots of skills to generate good ones,
happy signaturing :)

And surprisingly I found the following link which is interesting:

So let’s delve into my example, at last.
Here’s a real example of a signature for ZwProtectVirtualMemory in a Kernel driver.

Signature Source Code

From my tests this signature worked well on many versions…though always expect it might be broken.

diStorm Goes on Diet

Saturday, June 11th, 2011

I just wanted to share my happiness with you guys. After a long hard work (over a month in my free time, which ain’t much these days), I managed to refactor all the data-structures of the instructions DB in diStorm3. As the title says, I spared around 40kb in data! The original distorm3.dll file took around 130kb and currently it takes 90kb. I then went ahead and reconfigured the settings of the project in Visual Studio and instructed the compiler not to include the CRT shits. Then it bitched about “static constructors won’t be called and the like”, well duh. But since diStorm is written in C and I don’t have anything static to initialize (which is based on code) before the program starts, I didn’t mind it at all. And eventually I got the .dll file size to ~65kb. That’s really 50% of the original file size. This is sick.

I really don’t want to elaborate with the details of what I did, it’s really deep shit into diStorm. Hey, actually I can give you an simple example. Suppose I have around 900 mnemonics. Do not confuse mnemonic with opcodes – some opcodes share the same mnemonic although they are completely different in their behavior. You have so many variations of the instruction ‘ADD’, for instance. Just to clarify: mnemonic=display name of an opcode, opcode: the binary byte code which identifies the operation to do, instruction: all the bytes which represent the opcode and the operands, the whole.
Anyway, so there are 900 mnemonics, and the longest mnemonic by length takes 17 characters, some AVX mofo. Now since we want a quick look up in the mnemonics table, it was an array of [900][19], which means 900 mnemonics X 19 characters per mnemonic. Why 19? An extra character for the null terminating char, right? And another one for the Pascal string style – means there’s a leading length byte in front of the string data. Now you ask why I need them both: C string and Pascal string together. That’s because in diStorm all strings are concatenated very fast by using Pascal strings. And also because the guy who uses diStorm wants to use printf to display the mnemonic too, which uses C string, he will need a null terminating character at the end of the string, right?
So back to business, remember we have to allocate 19 bytes per mnemonic, even if the mnemonic is as short as ‘OR’, or ‘JZ’, we waste tons of space, right? 900×19=~17kb. And this is where you get the CPU vs. MEMORY issue once again, you get random access into the mnemonic, which is very important but it takes lots of space. Fortunately I came up with a cooler idea. I packed all the strings into a very long string, which looks something like this (copied from the source):

“\x09” “UNDEFINED\0” “\x03” “ADD\0” “\x04” “PUSH\0” “\x03” “POP\0” “\x02” “OR\0” \
“\x03” “ADC\0” “\x03” “SBB\0” “\x03” “AND\0” “\x03” “DAA\0” “\x03” “SUB\0”
and so on… 


You can see the leading length and the extra null terminating character for each mnemonic, and then it’s being followed by another mnemonic. And now it seems like we’re lost with random-access cause each string has a varying length and we can never get to the one we want… but lo and behold! Each instruction in the DB contains a field ‘opcodeId’ which denotes the index in the mnemonics array, the offset into the new mnemonics uber string. And now if you use the macro mnemonics.h supplies, you will get to the same mnemonic nevertheless. And all in all I spared around 10kb only on mnemonic strings!

FYI the macro is:

#define GET_MNEMONIC_NAME(m) ((_WMnemonic*)&_MNEMONICS[(m)])->p

As you can see, I access the mnemonics string with the given OpcodeId field which is taken from the decoded instruction and returns a WMnemonic structure, which is a Pascal string (char length; char bytes[1])…

The DB was much harder to compact, but one thing I can tell you when you serialize trees is that you can (and should) use integer-indices, rather than pointers! In x64, each pointer takes 8 bytes, for crying out loud! Now in the new layout, each index in the tree takes only 13 bits, the rest 5 bits talks about the type, where/what the index really points to… And it indirectly means that now the DB takes the same size both for x86 and x64 images, since it is not based on pointers.

Thanks for your time, I surely had pure fun :)

diStorm for Java, JNI

Monday, October 4th, 2010

Since we decided to use Java for the reverse engineering studio, ReviveR, I had to wrap diStorm for Java. For now we decided that the core framework is going to be written in Java, and probably the UI too, although we haven’t concluded that yet. Anyway, now we are thinking about the design of the whole system, and I’m so excited about how things start to look. I will save a whole post to tell you about the design once it’s ready.

I wanted to talk a bit about the JNI, that’s the Java Native Interface. Since diStorm is written in C, I had to use JNI to use it inside Java now. It might remind P/Invoke to people, or Python extensions, etc.

The first thing I had to do is to define the same C structures of diStorm’s API, but in Java. And this time they are classes, encapsulated obviously. After I had this classes ready, and stupid Java, I had to put each public class in a separate file… Eventually I had like 10 files for all definitions and then next step was to compile the whole thing and use the javah tool to get the definitions for the native functions. I didn’t like the way it worked, for instance, any time you rename the package name, add/remove a package the name of the exported C function, of the native .DLL file, changes as well, big lose.
Once I decided on the names of the packages and classes finally I could move on to implement the native C functions that correspond to the native definitions in the Java class that I wrote earlier. If you’re familiar a bit with JNI, you probably know well jobject and its friends. And because I use classes rather than a few primitive type arguments, I had to work hard to wrap them, not mentioning arrays of the instructions I want to return to the caller.

The Java definition looks as such:

public static native void Decompose(CodeInfo ci, DecomposedResult dr);

The corresponding C function looks as such:

JNIEXPORT void JNICALL Java_distorm3_Distorm3_Decompose
  (JNIEnv *env, jobject thiz, jobject jciObj, jobject jdrObj);

Since the method is static, there’s no use for the thiz (equivalent of class’s this) argument. And then the two objects of input and output.
Now, the way we treat the jobjects is dependent on the classes we set in Java. I separated them in such a way that one class, CodeInfo, is used for the input of the disassembler. And the other class, DecomposedResult, is used for output, this one would contain an array to return the instructions that were disassembled.

Since we are now messing with arrays, we don’t need to use another out-argument to indicate the number of entries we returned in the array, right? Because now we can use something like array.length… As opposed to C function: void f(int a[], int n). So I found myself having to change the classes a bit to take this into account, nothing’s special though. Just need to get advantages of high level languages.

Moving on, we have to access the fields of the classes, this is where I got really irritated by the way the JNI works. I wish it were as easy as cTypes for Python, of course they are not parallel exactly, but they solve the same problem after all. Or a different approach like parsing a tuple in Embedded Python, PyArg_ParseTuple, which eases this process so much.

For each field in the class, you need to know both its type and its Id. The type is something you know at compile time, it’s predefined and simply depends on the way you defined your classes in Java, easy. The ugly part now begins, Ids – You have to know to which field you want to access, either for read or write access. The idea behind those Ids was to make the code more flexible, in the way that if you inherit a class, then the field you want to access probably moved to a new slot in the raw structure that contains it.
Think of something like this:

struct A {
int x, y;

struct B {
 int z, color;

struct AB {

Suddenly, accessing to AB::B.z has a different index than accessing to B.z. Can you see that?
So they guys who designed JNI came with the idea of querying the class, by using internal reflection to get this Id (or really an index to the variable in the struct, I take a guess). But this reflection thingy is really slow, obviously you need to do string comparisons on all members of the class, and all classes in the derived class… No good. So you might say, “but wait a sec, the class’s hierarchy is not going to change in the lifetime of the application, so why not reuse its value?”. This is where the JNI documentation talks about caching-ids. Now seriously, why don’t you guys do it for us internally, why I need to implement my own caching. Don’t give me this ‘more-control’ bullshit. I don’t want control, I want to access the field as quickly as possible and get on to other tasks.

Well, since the facts are different, and we have to do things the way we do, now we have to cache the stupid Ids for good. While I read how people did it and why they got mysterious crashes, I solved the problem quickly, but I want to elaborate on it.

In order to cache the Ids of the fields you want to have access to, you do the following:

if (g_ID_CodeOffset == NULL) {
    g_ID_CodeOffset = (*env)->GetFieldID(env, jObj, "mCodeOffset", "J");
    // Assume the field exists, otherwise your interfaces are broken anyway.
// Now we can use it…

Great right? Well, not so fast. The problem is that if you have a few functions that each accesses this same class and its members, you will need to have this few lines of code everywhere for each use. No go. Therefore the common solution is to have another native static InitIDs function and invoke it right after loading the native library in your Java code, for instance:

static {

Another option would be to use the JNI_OnLoad exported function to initialize all global Ids before the rest of the functions get ever called. I like that option more than the InitIDs, which is less artificial in my opinion.

Once we got the Id ready we can use it, for instance:

codeOffset = (*env)->GetLongField(env, jciObj, g_ID_CodeOffset);

Note that I use the C interface of the JNI API, just so you are aware to it. And jciObj is the argument we got from Java calling us in the Decompose function.

When calling the GetField function we have to pass a jclass, that’s a Java-class object’s pointer kinda. In contrast to the class instance, I hope you know the difference. Now since we cache the Ids for the rest of the application life time, we have to keep a reference to this Java-class, otherwise weird problems and nice crashes should (un)surprise you. This is crucial since we use the same Ids for the same classes along the code. So when we call the GetFieldID we should hold a reference to that class, by calling:

(*env)->NewWeakGlobalRef(env, jCls);

Note that jCls was retrieved using:

jCls = (*env)->FindClass(env, "distorm3/CodeInfo");

Of course, don’t forget to remove the reference to those classes you used in your code, by calling DeleteGlobalRef in JNI_OnUnload to avoid leaks…

The FindClass function is very good once you know how to use it. It took me a while to figure out the syntax and naming convention. For example, the String which seems to be a primitive type in Java, is really not, it’s just a normal class, therefore you will have to use “java/lang/String” if you want to access a string member.
Suppose you got a class “CodeInfo” in the “distorm3” package, then “distorm3/CodeInfo” is the package-name/class-name.
Suppose you got an inner class (inside another class), then “distorm3/Outer$Inner” is the package-name/outer-class-name$inner-class-name.
And probably there are a bit more to it, but that’s a good start.

About returning new objects to the caller. We said already that we don’t use out-arguments in Java.
Think of:

void f(int *n)
 *n = 5;

That’s exactly what an out-argument is, to return some value rather than using the return keyword…
When you want to return lots of info, it’s not a good idea, you will have to pass lots of arguments as well, pretty ugly.
The idea is to pass a structure/class that will hold this information, and even have some encapsulation to it.
The problem at hand is whether to use a constructor of the class, or just create the object and set each of its values manually.
Also, I wonder which method is faster, letting the JVM do it on its own in a constructor, or doing everything using JNI.
Unfortunately I don’t have an answer to this question. I can only say that I used the latter method of creating the raw object and setting its fields. I thought it would be better.
It looks like this:

jobject jOperand = (*env)->AllocObject(env, g_OperandIds.jCls);
if (jOperand == NULL) // Handle error!
(*env)->SetIntField(env, jOperand, g_OperandIds.ID_Type, insts[i].ops[j].type);
(*env)->SetIntField(env, jOperand, g_OperandIds.ID_Index, insts[i].ops[j].index);
(*env)->SetIntField(env, jOperand, g_OperandIds.ID_Size, insts[i].ops[j].size);

(*env)->SetObjectArrayElement(env, jOperands, j, jOperand);

This is real piece of code taken from the wrapper code. It constructs an Operand class from the Operand structure in C. Notice the way the AllocObject is used, using that jCls we hold a reference to, instead of calling FindClass again… Then setting the fields and setting this object in the array of Operands.

What I didn’t like much in the JNI is that I had to call SetField, GetField and those variations. On one hand, I understand they wanted you to know which type of field you access to. But on the other hand, when I queried the Id of the field, I specified its type, so I pretty much know what type-value I’m setting, so… Well, unless you have bugs in your code, but that will always cause problems.

To another issue, one of the members of the CodeInfo structure in diStorm is a pointer to the binary code that you want to disassemble. It means that we have to get some buffer from Java as well. But apparently, sometimes the JVM decided to make a copy of the buffer/array that is being passed to the native function. In the beginning I used a straight forward byte[] member in the class. This sucks hard. We don’t want to waste time on copying and freeing buffers that are read-only. Performance, if can be better, should be better by default, if you ask me. So reading the documentation there’s an extension to the JNI, to use java.nio.ByteBuffer, which gives you a direct access to the Java buffer without the extra efforts of copying. Note that it requires the caller to the native API to use this class specifically and sometimes you’re limited…

The bottom line is that it takes a short while to understand how to use JNI and then you get going with it. I found it cumbersome a bit… The most annoying part is all the extra preparations you have to do in order to access a class or a field, etc. Unless you don’t care at all about performance but then your code is less readable for sure. We don’t have any information about performance of allocating new objects and array usage. We can’t base our ways of coding on anything. I wish it could be more user friendly or parts of it eliminated somehow.

New Project – ReviveR

Saturday, September 25th, 2010

Hey all,

long time haven’t posted. I’m kinda busy with lots of stuff.
Anyway I just wanted to let you know that I’m starting to work on the sequel of diStorm, you guessed it right… A reversing studio!
Unlike what many people said, the core is going to be written in C++, the GUI is going to be written per OS. No thanks, QT. Top goals are performance, scripting, good UI and most important good analysis capabilities. Obviously it’s going to be open source, cross platform. For a start, it will support only x86 and AMD64 and PE file format, maybe ELF too, though not my priority. I’m not sure about a debugger yet, but it will probably be implemented later. GUI is going to be written using WPF under C#, just to give you an idea.

My main interests are performance and binary code analysis algorithms.

If there are highly skilled programmers who wish to help, please contact me.
For now it seems we are a group of 4 coders, I’m still not going to publish their names, until everything is settled.

Anyway, design is taking place nowadays. This is your time for suggesting new features and ideas.

Big good luck

diStorm3 is Ready

Monday, August 16th, 2010

diStorm3 is ready for the masses! :)
– if you want to maximize the information you get from a single instruction; Structure output rather than text, flow control analysis support and more!

Check it out now at its new google page.

Good luck!

diStorm3 Released

Thursday, January 14th, 2010

Hey people
I just uploaded the source code of diStorm3 to Google Code.
Now you can enjoy SVN too! And of course, better source code and features from the disassembler itself.
It is officially released under the GPLv3.

Special thanks to Michael Rolle, for tons of suggestions, ideas and fixes. Also thanks to many others who reported buggy instructions.

In the next few weeks I will update the source code some more, for more compilers, etc. Going to re-do the webpage of diStorm. Hopefully will have some nice logo too. And the most important thing, I will make a somehwhat tutorial of how-to use the new disassembler with the newest features. Stay tuned!


Thoughts About Open Source Community In RE

Monday, January 4th, 2010

Hello there,
if you wanna know me better and you use diStorm, take 10 minutes from your precious time and please read my post.

If you don’t give a damn about diStorm or open source community, you will waste your time reading this post, so visit here.

Believe me or not, I would have given this code (new version) under BSD. I really wanted to. A proof is that diStorm64 was BSD so far, and still is, however it is deprecated. But I don’t see a good end for BSD. Let me explain. It’s very permissive, everybody can use it and I like it more than GPL for that reason, truely. Why after all I decided on releasing diStorm in a GPL license, and not even LGPL? That’s a good question and I will answer this by telling you my story and more.

I began this project for pure fun and challenge of decoding x86, back in mid 2003 already. Soon enough I got some basic framework that could accept a stream, look in some data structure and fetch an instruction, then decode the operands. Sounds easy, right? Because it is easy. But that was only for integer instructions. After a few months I added FPU support. This is where I started to hate the x86 machine code, really. It’s a pain in the arse. Only at that time I started to look around and to see what other (disassembler) libraries we got on the Internet, it was already 2004, and I still enjoyed coding diStorm for the fun and sport of it, 100% from scratch, always. I knew from day one that I am going to make it an open source project. And yet, it wasn’t so useful, there were better disassemblers out there. I’m talking about binary stream diassemblers, mind you, not the GUI wrapped ones, with high level features. Anyway, diStorm wasn’t any special yet. Therefore I had to work hard, in my free time, no complaints ever. And added all those SSE instructions set. Like 5 sets eventually were added to diStorm, in addition to new sets every now and then in other computing fields. Honestly, I doubt people use diStorm for SSE, but you never know. Besides the goal of diStorm was to be a complete product, top quality and optimized, and I achieved them all within time. diStorm was opened source in the beginning of 2006. A few months later I added AMD64 support, and then diStorm was the first open source disassembler library to support it.

All the while I got lots of emails about diStorm. Some were about asking help of how to use it, some were about defects in specific isntructions, etc. And even two critical bugs, one is code regression that I put a bug in the code accidentally, and the other was some memory leak in the Python module, which I happened to fix before already.

The most appreciated work from the community was about the sample projects. People helped me with the code to make it more useful, and better code for each platform (there are Win32 and Linux separated projects). But never anything about diStorm’s code itself. Maybe the project is too complex. Maybe there was no need, overall it was stable and mature. I don’t know.

Since 2005 I got more than 50,000 downloads of the variants of diStorm (the sample projects, the full source code, the library, the Python modules, etc). It is a lot, like 10K a year. Don’t forget that after all it’s a mere disassembler, not some crazy application, and it’s even a library, so only developers can use it. Though later I added the flat disassembler project compiled and which can be downloaded in a binary form. And what we learn from this? Nothing, nada! You can never trust statistics as this, and it doesn’t mean much. Cause there are new releases and the same person can download the project a few times within some time, to get the latest version, etc. So I can’t have any info about how many people/companies use it around the world.

My own goal was to make diStorm the best disassembler out there. Only you can judge. I know what it’s worth.

Sometimes I had the doubts about this issue. And it gave me the inspiration to go on and bring the next generation disassembler library to the world, one that afterwards I can retire, in a way, and finally start to enjoy the fruits of my hard work with coding code analysis tools myself in the future. About time, yey, no more excuses, no more stupid string parsing, totally efficient.

I see many people complain about the general status of tools in the Reverse Engineering field. Not many people open their code. And the rest make money out of it. And I want to make a better community in this field. I really tried to do so, but without success. People that use diStorm either only use it as is, and they won’t open their code, because they want to keep their precious knowledge to themselves, or sell it for money. That’s legitimate. You work, you get money.  And it also gives the opportunity for tools to exist, otherwise they won’t be there at all. But I think we can do better. I am doing better myself this way, I belive so and therefore I open the code in GPL. GPL is ugly and the only reason I am for it is because I want the community to help me with this project and the coming one. Ohhhh, you will love the next project. It’s not a library anymore but a whole studio…dreams come true. I make them coming true on my end, but I need help. For crying out loud.

I, hereby, am asking from the community, the people out there, that do this stuff for fun or profit to help, to contribute back to the community. I can’t do it all myself, it will take years. Though I am willing to do it myself, and that’s what I do. But then WHAT FOR?! So some companies can enjoy my work and get money on my expense? NO THANKS.

Community, Community, Community, this is key, what the whole issue is about. You are saying you need open source tools, no problem, but share. Let’s do it together. Parsing strings is not the way. Finally we got a good weapon and let’s build a framework on top of it. We got Olly, we got WinDbg, and we got IDA. None of them is opened source. Each has a clear win in its niche. I think there is a room, a NEED, for something new, free and open source. If I am not going to get help this time, I am not going to open source the next project, because it’s pointless and by now you should know exactly why.

You know what, maybe it’s all my fault. Maybe diStorm’s code is TOO complex. (What do you expect though?). Maybe the stupid and ugly diStorm’s page is not easy to track. Otherwise why I see someone who spent a few hours taking a disassembler out of a bigger project and make it work for Windows kernel, while you can do that in diStorm in two clicks?

Maybe nobody gives a f@ck about it, probably, just a stupid disassembler. But no more. This is the time to make a change, a big one. It might be a disassembler, or anything else, but it just shows the attitude of the community.And don’t kid yourself, everybody looks for code analysis stuff, or eventually write them up on their own. C’est tout. Mailling list or not, if we are not going to help each other and only complain that there are not enough open source projects in the RE field, nothing good is gonna happen.


Gil Dabah, Arkon

P.S – You can start by forwarding a link to this post.

diStorm3 – News

Tuesday, December 29th, 2009

Yo yo yo… forgot to say happy xmas last time, never too late, ah? :)

This time I wanted to update you about diStorm3 once again. Yesterday I had a good coding session and I added some of the new features regarding flow control. The decode function gets a new parameter called ‘features’. Which is a bit field flag that lets you ask the disassembler to do some new stuff such as:

  1. Stop on INT instructions [INT, INT1, INT3, INTO]
  2. Stop on CALL instructions [CALL, CALL FAR]
  3. Stop on RET instructions [RET, RETF, IRET]
  4. Stop on JMP instructions [JMP, JMP FAR]
  5. Stop on any conditional branch instructions [JXXX, JXCX, LOOPXX]
  6. Stop on any flow control (all of the above)

I wasn’t sure about SYSCALL and the like and UD2, for now I left them out. So what we got now is the ability to instruct the disassembler to stop decoding after it encounters one of the above conditions. This makes the higer disassembler layer more efficient, because now you can disassemble code by basic blocks. Also building a call-graph or branches-graph faster.

Note that now you will be able to ask the disassembler to return a single instruction. I know it sounds stupid, but I talked about it already, and I had some reasons to avoid this behavior. Anyway, now you’re free to ask how many instructions you want, as long as the disassembler can read them from the stream you supply.

Another feature added is the ability to filter non-flow-control instructions. Suppose you are interested in building a call-graph only, there’s no reason that you will get all the data-control instructions, because they are probably useless for the case. Mixing this flag with ‘Stop on RET’ and ‘Stop on CALL’, you can do nice stuff.

Another thing is that I separated the memory-indirection description of an operand into two forms. First of all, memory indirection operand is when an instruction reads/writes from/to memory. Usually in Assembly text, you will see the brackets characters surrounding some expressions. Something like: MOV [EDX], EAX. Means we write a DWORD to EDX pointer. If you followed me ’till here, you should know exactly what I’m talking about anyway.

When you get the result of such instruction from diStorm3, the type of the operand will be SMEM (stands for simple-memory), which hints there’s only one register in the memory-indirection operand. Although it doesn’t hint anything about the displacement, that’s that offset you usually see in the brackets. Like MOV [EDX+0x12345678], EAX. So you will have to test if the displacement exists in both forms. The other form is MEM (Normal memory indirection, or probably should be called ‘complex’) since it supports the full memory indirection operand, like: MOV [EAX*4 + ESI + 0x12345678], EAX. Then you will have to read another register that supplies the base register, in addition to the index register and scale. Note that this applies for 16 bits mode addressing as well, when you can have a mix of [‘BX+SI]’ or only ‘[BX]’. Also note that sometimes in 32/64 bits mode, you can have a SIB byte, that sets only the base register and the index register is unused, but diStorm3 will return it as an SMEM, to simplify matters. This way it’s really easy to read the instruction’s parameters.

Another feature for text formatting is the ability to tell the disassembler to limit the address to 16 or 32 bits. This is good since the offsets are 64 bits today. And if you encounter an instruction that jumps backwards, you will get a huge negative value, which won’t make much sense if you disassemble 16 bits code…

diStorm3 still supplies the bad old interface. And now it supports two new additional functions. The decompose function, which returns the structures for each instruction read. And another function that formats a given structure into text, which is pretty straight forward. The text format is not an accurate behavior of diStorm64, it’s more simplified, but good enough. Besides I have never heard any special comments about the formatting of diStorm64, so I guess it doesn’t matter much to you guys. And maybe maybe I will add AT&T syntax later on.

Another field that is returned now, unlike diStorm64, is the instruction-set-class type of the instruction, with very broad categories, like Integer instructions, FPU instructions, SSE instructions, and so on. Still might be handy. And the hint about the flow-control type of the instruction.

Also I changed tons of code, and I really mean it, the skeleton is still the same, but the prefixes engine works totally different now. Trying to imitate a real processor this time. By including the last prefix found of that prefix-type. You can read more about this, here. I made the code way more optimized and eliminated double code and it’s still readable, if not for the better. Also I changed the way instruction are fetched, so the locate-instruction function is much smaller and better.

I’m pertty satisfied with the new version of diStorm and hopefully I will be able to share it with you guys soon. Still I got tons of tests to do, maybe I will add that unit-test module in Python to the proejct so you can enjoy it too, not sure yet.

Also I got a word from Mario Vilas, that he is going to help with compiling diStorm for different platforms, and I’m going to integrate his new Python wrappers that use ctypes, so you don’t need the Python extension anymore. Thanks Mario! 😉 However, diStorm3 has its own Python module for the new structure output.

If you have more ideas, comments, complaints or you just hate me, this is the time to say so.
Cheers, happy new year soon!

Short Update

Sunday, December 27th, 2009

I’ve slowed down with posting here, been working hard recently both on diStorm and real life job. Hopefully I will be able to release diStorm3 soon (matter of weeks for a start). I’m almost finished with coding everything I wanted, though I’m still left with the features you guys asked for and tons of tests since I also changed lots of code and added new AVX instruction set.