Archive for the ‘diStorm’ Category

Code Anaylsis #1 – Oh, Please Return

Sunday, September 30th, 2007

The first thing you do when you write a high level disassembler (in contrast with diStorm which is a flat stream disassembler) is starting to scan from the entry point of the binary. Assuming you have one, for instance in .COM files it will be the very first byte in the binary file. For MZ or PE it’s a bit complicated story, yet easily achievable.

So what is this scanning really?

  Well, as I have never taken a look in any high level disassembler’s source code, my answer is from scratch only. The way I did it, was to disassemble the entry point, following control flow (branches, such as: jmp/jcc/loop, etc) and recursively add the new functions’ addresses (upon encountering the call instruction) to some list. This list will be processed until it is exhausted. So there’s some algo that will firstly insert the entry point to that function and then it will pop the first address and start analyzing it. Everytime it stumbles a new function address, it will add it to that same list. And once it’s finished analyzing the current function (for example:) by hitting the ret instruction; it will halt the inner loop, pop the next address off the list (if one exists) and continue again. The disassembled instruction/info will be stored into your favorite collection, in my case, it was a dictionary (address->instruction’s info), which later you can walk easily and print it or do anything you wish with it.

The thing is, that some functions (generated by compilers for the sake of conversation) are not always terminated by the RET instruction. It might be an IRET, and then immediately you know it’s an ISR. But that’s a simple case. Some functions end with INT3. Even that is ok. When do things get uglier? When they end with a CALL to ExitProcess (for the Win32 minded), so then your stupid scanning algo can’t determine when the function ends, because now it also has to ‘know’ the IAT and determine whether the function was ExitProcess, ExitThread or whatever Exit API there exists. So before you even made your first move with analyzing a binary code, you have to make it smarter. And that’s a bammer. My goal was to try and decide where a function start (usually easy) and where a function ends. Parsing the PE and getting the IAT is no biggy, but now it means that if you wanted to write a generic x86 disassembler, you’re screwed. So you will have to write plugins or addon (whatever, you name it) to extend the disassembler capabilities for different systems…

But even that’s ok, because the job is the same, although the project is now much bigger. And again, it all depends how accurate you wish to be. In my opinion, I try to be 99% accurate. With heuristics you cannot ask for 100%? Right? :P

So tell me, you smart aleck compiler-engineers out there, why the heck you write the generated function code in a way that it NEVER ends?

You all know the noreturn keyword or compiler extension, which states that the function doesn’t return. Yes, that’s good for functions that the (invisible) system takes control from that point, like ExitProcess, etc. I really never unerstood the reason that a programmer would like to state such a behaviour for a function. So what? Now your generated code will be optimized? To omit the RET instruction? Wow, you rock! NOT.

To be honest, talking about ExitProcess is not the case, and to be more accurate I was talking about the Linux code:

00000da6 (03) 8b40 14                  MOV EAX, [EAX+0x14] 
00000da9 (05) a3 ec7347c0              MOV [0xc04773ec], EAX 
00000dae (01) c3                       RET 
00000daf (05) 68 dcb834c0              PUSH 0xc034b8dc 
00000db4 (05) e8 8b09d1ff              CALL 0xffffffffffd11744 
00000db9 (05) 68 c5a034c0              PUSH 0xc034a0c5 
00000dbe (05) e8 8109d1ff              CALL 0xffffffffffd11744 
00000dc3 (03) 0068 00                  ADD [EAX+0x0], CH 
00000dc6 (05) 0d 39c06888              OR EAX, 0x8868c039

This is some disassembled code that I got from a good friend, Saul Tamari, while he was researching some stuff in the Linux kernel. He noticed that panic() function never returns, but this time, for real. So the problem now is that while flatly disassembling the stream you got, you go out of synchronization and start to disassemble real code in the wrong offset. You can see in the above snippet the second call, which a zero byte follows. That single byte is the end-of-function marker. How nice huh? The next instruction PUSH (68 00 …) is now out of synch, and actually is considered as a new different function.

So now tell me, how should you find this kind of noreturn function when you want to solve this puzzle only in static analysis?? It is defiantly not an easy question. We (Saul and I) had some ideas, but nothing 100% reliable. Performance was an issue also which made things harder. And there are some crazy ideas, which I will cover next time.

Meanwhile, if you got any ideas, you’re more than welcome to write them here.

Say No to Redundant Checks

Wednesday, August 29th, 2007

I had a long argument with a friend about some check that I don’t do in diStorm. He only said that apparently I don’t do that check (if statement to test some value.) But if it was not important enough he wouldn’t have said that in the first place, right? :) And I told him that he is right – I don’t, because it is absolutely not necessary. Soon I will let you judge yourself. Eventually we got nothing with that argument, probably both sides are mad (at least I was pissed off) and the code left the same.

Now let me show you an example where I will ask you if you do an (extra?) test of the result or not. Since I’m a Win32 guy, here we go:

char buf[256];
int length = GetWindowText(HWND, buf, 256);
if (length == 0) return ; // No string for some reason, though I don’t care why.

memcpy(DST, buf, length+1); // Include null termination char.
Well, that’s a useless code, but still good enough for the sake of conversation. So I get the caption of some window and copy it into another destination buffer. Did you notice that I did not check the length before copying to DST?! (And yes, DST is the same size as buf). Oh no!! Am I doomed now? Of course not, it is guaranteed that the length will be less than 256 for sure. Because otherwise the API is broken. Now we can start another flaming about who’s fault it is, but spare me this, please. So if the API is broken, it will be probably a bug, and will break other applications which use it. But that’s not the point. Say the API is fine, why don’t I check the length before I memcpy? Am I doing anything wrong? I will say it shortly: NO. My code is just fine (as long as DST size is at least buf’s size, of course). So his accusations were that I should be a code-monkey-freak and consistent. And then he said something like “And what if in the future the API changes?”, Well, bammer, all other API’s users are now broken as well. If it was for that question, I would have to pad all my code with extra unnecessary IF’s. No thanks. Besides if I won’t trust my API’s what should I trust? They are the basis for every appliaction.

Now in diStorm the case is the almost the same, there is some internal function that you call with lots of parameters, one of them is an output param that receives the size which was written to some buffer you supply as well as a maximum size of that buffer. So I examine the return code and see whether I can use the data in the buffer or not, and then I know that the size of written entries in my buffer cannot exceed the maximum size I supplied. So why the heck should I check that writtenEntriesCount < MaximumEntries? Tell me, please!

The only rational I can see is to make an assertion (and only in debug mode) that everything is correct. But that’s not a must, and not doing that won’t even let you the possibility to say that I am a bad coder. The thing is, that both the internal function and its caller were written by me, and both are internal only (not exported in any way) so as long as one of them (probably the called) guarantees that the written entries don’t exceed the maximum size. Everyone are is happy. If it wasn’t the case, I should have had a crash (buffer overflow, access violation), just something nasty, that will hint of the buggy function. Now we can get into the philosophical issue of whether the callee or caller should make that test, or both. But for me, as I see it, it’s better be put inside the called function since it must work well for the interface it supplies. And I always think ahead and ask myself “…and what if someone will call that function and forgets(/doesn’t know -) to add the extra check?”

Don’t get me wrong assertions are really useful devices, I really suggest you use them. It just happened that diStorm doesn’t have them. Maybe I should add them. And yet, it’s not a must.

Lib hell (or DLL hell)

Tuesday, August 21st, 2007

In the last few months I got many emails regarding diStorm compilation, usually for different compilers and platforms. So now I’ve been working in the last week to make diStorm compileable for all those compilers. Some of them are in DOS (OpenWatcom/DJGPP/DMC), some of them are for Linux (mostly GCC) for its varying distributions and the rest are for Windows (MSVS, Mingw, TCC, etc…).

The problem begins with the one and only function of the diStorm library, called distorm_decode. This function gets a lot of parameters, the first one is the offset of the code as if it was loaded in the memory (AKA virtual address). For example, you would feed it with 0x100 for .COM binary files (that is, DOS). This very parameter’s type is varying according your compiler and environment, it might be a 32bits integer or, prefereably, a 64bits integer (unsigned long long/__uint64). If you have a compiler that can generate 64bits integer code, then you would prefer it to 32 bits code. So when you disassembler 64bits code you will be able to set any 64bits offset you just like. Otherwise, you will be limited to 32bit offsets.

So I got a special config.h file for diStorm, where there you configure the relevant macros so your compiler will be able to compile diStorm. The config.h is eventually there for all portable macros. There you also set if you would like to use the 64bit offsets or not.

Now the distorm_decode function uses that integer type which is dependent on your decision (which will probably be derived from the support for 64bit integers by your compiler). To simplify the matter, the diStorm interface is a function with the following declaration:

#ifdef SUPPORT64
void distorm_decode(unsigned long long offset);
#else
void distorm_decode(unsigned long offset);
#endif

Now, this function is actually exported by the library, so a caller project will be able to use it. And as you can see, the declaration of this function may change. What I did till now was to have this macro (SUPPORT64) defined twice, once for the library itself when it’s compiled and once for the caller project, so when it uses the library, it will know the size of integers it should use. So far, I didn’t get any problems with it. But then since many compilers (say, for DOS) don’t support 64bit integers, it is a pain in the ass to change these two different files. I couldn’t come up with the same header file for the sake of the caller project, since it might get only a compiled version, and then what? Do trial and error until it finds the correct size of the integer? I can’t allow things to be so loosy.

Things could get really nasty, if the library was compiled with 32bit integers, and then the caller project will use it with 64bit integers. Stack is unbalanced, it will probably crash somewhere when trying to use some pointer…soon or later you will notice it. And then you will have to know to comment out the macro on the caller project.

After I got this complaint from one of the diStorm users, I decided to try and put an end to this lib hell. I tried to think of a different ways. It would be the best way if I had a reflection in C, or if I had a scripting language which in there I could determine the size of the integer in runtime and know how to invoke distorm_decode. Unfortunately, this is not the case. Starting to think of COM out there, or SXS and other not really helpful ways to solve DLL hell. I sense it’s the wrong way. Since it is a library after all, and I don’t have the luxory of running code, I mean, it could be really solved with running code and asking the library (or rather the DLL actually) what interface should I use… reminds me of QueryInterface somewhat. But alas, I can’t do that either, since it’s compile time we’re talking about. So all I got is this lousy idea:

The exported function name will be determined according to the size of the integer it was compiled with. It would look like this:

#ifdef SUPPORT64
void distorm_decode64(unsigned long long offset);
#define distorm_decode distorm_decode64
#else … you get the idea

But we are still stuck with the SUPPORT64 macro for the caller project, which I think there is no option and we will have to stick with it to the bitter end…

Then what did we earn from this change? Well pretty much a lot, let’s see:

1) The actual code of existing projects don’t need to be changed at all, the macro substitution will do the work for it automatically.

2)The application now won’t get to crash, since if you got the wrong integer size, means you won’t find the exported symbol.

3)Now it becomes a linker time error rather than runtime crash/error. So before the caller project will get to fully be linked, you will know to change the SUPPORT64 macro (whether to add it or drop it).

4)The code as it was before could lead you to a crash, since you could manually enable/disable the macro.

Well, I still feel it’s not the best idea out there, but hey, I really don’t want to write some code in the caller project that will determine the interface in runtime and only then know which function to use…it’s simply ugly and not right, because you can do it earlier and cleaner.

I would like to hear how would you do it? I am still not committing my code… so hurry up :)

AMD SSE4a

Tuesday, August 21st, 2007

In the latest version of the programmer’s manual (for AMD64 architecture) of July 2007, AMD released a new instruction set – SSE4a. In the beginning we (YASM maililng list) weren’t sure whether this set is part of Intel’s SSE4.1 or SSE4.2 until a simple check of the documentation for CPUID shed some light and showed that there is a different bit for indicating the instruction set is SSE4a rather than SSE4.1. So now we got a new instruction set. What’s so special about it? Well nothing in particular. It only has a few instruction: extrq, insertq, movntsd, movntss.

The ugly thing about these instructions is insertq which gets fouroperands. Two XMM operands and two byte-sized immediates. We have seen many instructions with 3 operands, so it’s nothing new. Although most of them are in the SSE sets, we got a few in the basic/integer set such as SHLD/SHRD/IMUL… But four operands? And two of them are immediates? Hmm for example the ENTER instruction gets two immediates of differente size, that’s the only one I can come up with quickly, maybe a quick test with disOps can yield more, but it doesn’t really matter. Just trying to show this irregularity. So in diStorm what I did was to add a fourth operand for my extended instruction-information structure (the structure which holds the data that describes a full instruction). Wondering where are we heading to with all those new SSE sets, and weird instructions. It gets harder to parse them every time.

I mean – come on, even in the internal engine of the AMD processor’s pipeline, the engineers hated to add support for a fourth operand, or it was rather a quick hack? who knows…But I am sure they have a generic engine and it’s not an “execution” module of circuity per instruction.

Recursive Data-Structures

Saturday, June 2nd, 2007

For the sake of simplicity a linked list structure is a recursive one, it points to itself.

struct Node {
 int data;


 Node* next;
};

This case is easy to follow because no matter what, you always end up with accessing a Node structure.

The problem arises when you have recursive data-structures and terminals. I treat a terminal as a data structure which stops the recursion of the recursive data-structure. Unlike the above linked list structure, which stops when the ‘next’ pointer is NULL, the ‘next’ in the complex data-structure might point to either itself or a terminal.

 struct TerminalNode {

 int data;

};

 struct Node {

 int data;

 Node* next;

};

Now you might find yourselves asking why would one make the ‘next’ point to a TerminalNode, but that’s a common data-structure layout. For example, an expression tree can represent arithmetic operations in a recursive way, but the values are the terminals – Add(5, Sub(3, 2)).

In a structured layout, it looks something like this (where children is an array of pointers to ‘next’ nodes):

0: ExpressionNode: (type) Add, (children) 1, 2

1: ExpressionNode: (type) Value 5, (children) NULL

2: ExpressionNode: (type) Sub, (children) 3, 4

3: ExpressionNode: (type) Value 3, (children) NULL

4: ExpressionNode: (type) Value 2, (children) NULL

So you can see that in this case an ExpressionNode of ‘Value’ type is a TerminalNode whereas the ExpressionNode is a recursive one.

This type of data-structure is not rare: To be honest, I have this situation in both diStorm64 and diStorm3. In diStorm64 the TerminalNode is actually the Instruction-Info structure which describes the instruction, its mnemonic, its operand types, etc… and in diStorm3, it’s the above ExpressionNode thingy, but much more complex, of course…

Everything is good and nice, but finally I can get to my point in this topic – How do you distinguish between a recursive node and a terminal node? That is, how do you know where the “tree” terminals?

So there are many solutions to this one, I chose to cover two of them. The simplest one is to change your structures by adding another flag which states, “I’m a terminal” or “I’m a recursive node”. But then, it means you have to pay at least another extra byte for both structures. And then examining the flag only after you accessed its structure. Not mentioning, it takes more memory space.

The other solution, which seems to be more common, is to set the LSB (least-significant-bit) in the ‘next’ pointer, thus, if the bit is set, you know you reached a terminal node, even before you accessed it. And seeing the LSB is cleared, you know you continue normally with a recursive node. This is also good for caching, because the structures are compact.

But what happens, if you have 4 types of terminal structures? Well, in a 32 bits environment all structures are (usually/99%) aligned to a 4 bytes boundary, so it means you can easily use the two LSBs, which represent 4 different terminals…

So accessing a single terminal structure is as simple as:

if (pNode->next & 1) {

 pTerminal = (TerminalNode*)(pNode & -1);

 // Use pTerminal here…

} else {

 pNode &= -1;

 // Continue with another node…

}

You should have noticed by now that a terminal structure doesn’t have a ‘next’ pointer. So a NULL is out of question for terminals, but it’s still OK for recursive nodes, which means “dead-end”, rather than a terminal to mark something bad happened…