Archive for the ‘Reversing’ Category

diStorm3 – News

Tuesday, December 29th, 2009

Yo yo yo… forgot to say happy xmas last time, never too late, ah? :)

This time I wanted to update you about diStorm3 once again. Yesterday I had a good coding session and I added some of the new features regarding flow control. The decode function gets a new parameter called ‘features’. Which is a bit field flag that lets you ask the disassembler to do some new stuff such as:

  1. Stop on INT instructions [INT, INT1, INT3, INTO]
  2. Stop on CALL instructions [CALL, CALL FAR]
  3. Stop on RET instructions [RET, RETF, IRET]
  4. Stop on JMP instructions [JMP, JMP FAR]
  5. Stop on any conditional branch instructions [JXXX, JXCX, LOOPXX]
  6. Stop on any flow control (all of the above)

I wasn’t sure about SYSCALL and the like and UD2, for now I left them out. So what we got now is the ability to instruct the disassembler to stop decoding after it encounters one of the above conditions. This makes the higer disassembler layer more efficient, because now you can disassemble code by basic blocks. Also building a call-graph or branches-graph faster.

Note that now you will be able to ask the disassembler to return a single instruction. I know it sounds stupid, but I talked about it already, and I had some reasons to avoid this behavior. Anyway, now you’re free to ask how many instructions you want, as long as the disassembler can read them from the stream you supply.

Another feature added is the ability to filter non-flow-control instructions. Suppose you are interested in building a call-graph only, there’s no reason that you will get all the data-control instructions, because they are probably useless for the case. Mixing this flag with ‘Stop on RET’ and ‘Stop on CALL’, you can do nice stuff.

Another thing is that I separated the memory-indirection description of an operand into two forms. First of all, memory indirection operand is when an instruction reads/writes from/to memory. Usually in Assembly text, you will see the brackets characters surrounding some expressions. Something like: MOV [EDX], EAX. Means we write a DWORD to EDX pointer. If you followed me ’till here, you should know exactly what I’m talking about anyway.

When you get the result of such instruction from diStorm3, the type of the operand will be SMEM (stands for simple-memory), which hints there’s only one register in the memory-indirection operand. Although it doesn’t hint anything about the displacement, that’s that offset you usually see in the brackets. Like MOV [EDX+0x12345678], EAX. So you will have to test if the displacement exists in both forms. The other form is MEM (Normal memory indirection, or probably should be called ‘complex’) since it supports the full memory indirection operand, like: MOV [EAX*4 + ESI + 0x12345678], EAX. Then you will have to read another register that supplies the base register, in addition to the index register and scale. Note that this applies for 16 bits mode addressing as well, when you can have a mix of [‘BX+SI]’ or only ‘[BX]’. Also note that sometimes in 32/64 bits mode, you can have a SIB byte, that sets only the base register and the index register is unused, but diStorm3 will return it as an SMEM, to simplify matters. This way it’s really easy to read the instruction’s parameters.

Another feature for text formatting is the ability to tell the disassembler to limit the address to 16 or 32 bits. This is good since the offsets are 64 bits today. And if you encounter an instruction that jumps backwards, you will get a huge negative value, which won’t make much sense if you disassemble 16 bits code…

diStorm3 still supplies the bad old interface. And now it supports two new additional functions. The decompose function, which returns the structures for each instruction read. And another function that formats a given structure into text, which is pretty straight forward. The text format is not an accurate behavior of diStorm64, it’s more simplified, but good enough. Besides I have never heard any special comments about the formatting of diStorm64, so I guess it doesn’t matter much to you guys. And maybe maybe I will add AT&T syntax later on.

Another field that is returned now, unlike diStorm64, is the instruction-set-class type of the instruction, with very broad categories, like Integer instructions, FPU instructions, SSE instructions, and so on. Still might be handy. And the hint about the flow-control type of the instruction.

Also I changed tons of code, and I really mean it, the skeleton is still the same, but the prefixes engine works totally different now. Trying to imitate a real processor this time. By including the last prefix found of that prefix-type. You can read more about this, here. I made the code way more optimized and eliminated double code and it’s still readable, if not for the better. Also I changed the way instruction are fetched, so the locate-instruction function is much smaller and better.

I’m pertty satisfied with the new version of diStorm and hopefully I will be able to share it with you guys soon. Still I got tons of tests to do, maybe I will add that unit-test module in Python to the proejct so you can enjoy it too, not sure yet.

Also I got a word from Mario Vilas, that he is going to help with compiling diStorm for different platforms, and I’m going to integrate his new Python wrappers that use ctypes, so you don’t need the Python extension anymore. Thanks Mario! ;) However, diStorm3 has its own Python module for the new structure output.

If you have more ideas, comments, complaints or you just hate me, this is the time to say so.
Cheers, happy new year soon!
Gil

VML + ANI ZERT Patches

Tuesday, February 3rd, 2009

It is time to release an old presentation about the VML and ANI vulnerabilities that were patched by ZERT. It explains the vulnerabilities and how they were closed. It is somewhat very technical, Assembly is required if you wanna really enjoy it. I also gave a talk using this presentation in CCC 2007. It so happened that I wrote the patches, with the extensive help of the team, of course.

ZERT Patches.ppt

Oh No, My XPSP3

Monday, February 2nd, 2009
#include <windows.h>
int main()
{
 WCHAR c[1000] = {0};
 memset(c, 'c', 1000);
 SystemParametersInfo(SPI_SETDESKWALLPAPER, 0, (PVOID)c, 0);

 WCHAR b[1000] = {0};
 SystemParametersInfo(SPI_GETDESKWALLPAPER, 1000, (PVOID)b, 0);
 return 0;
}

Two posts ago I talked about vulnerabilities. So here’s some Zero Day. This will crash your system, unless you’re on Vista (which is already immune to it). And why the heck on SP3 we are still having this thing not closed yet?

It might be exploitable, I didn’t research it any further than the BSOD of the security cookie…Maybe on some compilations without /GS it can be easily exploited. Or maybe overriding enough of the stack to trigger an exception could be it.

“Remember to let her into your heart,
Then you can start to make it better” – The Beatles.

NULL, Vulnerabilities and Fuzzing

Wednesday, December 31st, 2008

I remember seeing Ilja at BH07. We talked about Kernel attacks, aka privilege escalation. He told me, also, back then, that he found some holes that he managed to execute code through. I think the platform of target was Windows, although Ilja is specializing in Unix. Back in ’05 already he had a talk about Unix Kernel Auditing. Nothing new probably there, at least for the time being. However, the new approach of fuzzing the kernel, the system calls to be accurate, was pretty new. But feel free to correct me if I’m wrong about it. And it seems Ilja managed to find some holes using fuzzing. (BTW, a much more interesting paper from him about Unusual Bugs.)

Personally, I don’t believe in fuzzing. Usually the holes I find – there is no way a fuzzer will find. Although, I do believe that you need to mix tools/knowledge in order to find holes and audit a software in a better way. It is enough that there is a simple validation of some parameter you pass to a specific potential-hole’y function and all your test can be thrown away because of that validation, though, there is still a weakness in that function, you won’t get to it. Then you say “Ah Uh”, and you think that you can refine the randomness of the parameters you pass to that function and hopefully prevail. Well, it might work, it might not. As I said, I’m not a big fan of fuzzing.

Although, it might be cool to have a tool that analyzes the code of a function and builds the parameters in a special way to make a code coverage of 100% on that function, which is not fuzzing anymore and means: you walk all paths of execution and the chances to find a weakness are so much greater. Writing such a tool is crazyness, and yet possible, if you ask me.

Fuzzing or not, there are still weaknesses in Win32k, which supposed to be one of the most “secured”/audited components in the Kernel. Probably because many researches had their work on it as well. And that’s simply sad.

Speaking about Ilja’s fuzzing of kernel and stuff, and thinking we are cool to find weaknesses nowadays, Mark Russinovich wrote NTCrash back in ’96 for god sake and, it was a Fuzzer(!), but back then nobody called it or knew about fuzzers. And NTCrash as simple as it is, found some weaknesses in kernel system calls of NT4 ;) Respect (though today it won’t even scratch the kernel, so we might think ourselves cool for still finding stuff :) ).

A friend and I are trying to audit another application, and my friend found some NULL dereference which crashes that software. So we fired up Olly and tried to see what’s going on. It seems that some interface is queried and returns a successful code value and at the same time we get NULL for that interface, which means something is really f*cked up there. Thing is, as you probably can imagine for yourself, we want to execute code out of it. But odds seem to be against us at this time, since we can’t control that NULL or anything about it.

I then wanted to see what people have done with NULL before, how to exploit it better. And usually 99% of the applications running out there don’t have page 0 mapped to their address space. But CSRSS and NTVDM for instance, do have it mapped, but who cares now…? It doesn’t help our cause. Besides, you probably can’t control that page 0 and its data anyway. So I encountered that Flash Exploitation. To be honest, I didn’t read all of the white paper about the exploitation, I only looked for how the arbitary data write worked. And it seems that some CALLOC had failed to allocate memory because of an integer overflow weakness and from there you got a NULL pointer to begin with. But Flash didn’t access that pointer immediately – it had some pointer arithmetic added to it. And you guessed it right, you can control some offset before the pointer is really accessed, thus you can write (almost) anywhere you want. Now I really don’t underestimate the exploitation, from the bits I read it is a crazy and very beautiful exploitation. But to say that it is a new technique and a new class of exploitation is one thing that I really don’t agree to. You know what, looking at it in a different light – it was probably not leading to a code execution if that CALLOC not returned NULL, because then you won’t know where you are on the heap and you couldn’t really write to anywhere you knew accurately. And besides, the NULL wasn’t dererferenced directly and an offset was added to it (no matter what the calculation was for the sake of conversation), so therefore I don’t see it so exciting if you ask me (again, not the exploitation but the “new class of exploitation”). Still you should check it out :)

So, as I saw that no one did anything really useful with a real NULL dereference, it seems that the weakness he found is only a DoS, but maybe we can control something there, yet to be researched…

Proxy Functions – The Right Way

Thursday, August 21st, 2008

As much as I am an Assembly freak, I try to avoid it whenever possible. It’s just something like “pick the right language for your project” and don’t use overqualified stuff. Actually, in the beginning, when I started my patch on the IPhone, I compiled a simple stub for my proxy and then fixed it manually and only then used that code for the patch. Just to be sure about something here – a proxy function is a function that gets called instead of the original function, and then when the control belongs to the proxy function it might call the original function or not.

The way most people do this proxy function technique is using detour patching, which simply means, that we patch the first instruction (or a few, depends on the architecture) and change it to branch into our code. Now mind you that I’m messing with ARM here – iphone… However, the most important difference is that the return address of a function is stored on a register rather than in the stack, which if you’re not used to it – will get you confused easily and experiencing some crashes.

So suppose my target function begins with something like:

SUB SP, SP, #4
STMFD SP!, {R4-R7,LR}
ADD R7, SP, #0xC

This prologue is very equivalent to push ebp; mov ebp, esp thing on x86, plus storing a few registers so we can change their values without harming the caller, of course. And the last thing, we also store LR (link-register), the register which stores the return address of the caller.

Anyhow, in my case, I override (detour) the first instruction to branch into my code, wherever it is. Therefore, in order my proxy function to continue execution on the original function, I have to somehow emulate that overriden instruction and only then continue from the next instruction as if the original patched function wasn’t touched. Although, there are rare times when you cannot override some specific instructions, but then it means you only have to work harder and change the way your detour works (instructions that use the program counter as an operand or branches, etc).

Since the return address of the caller is stored onto a register, we can’t override the first instruction with a branch-link (‘call’ equivalent on x86). Because then we would have lost the original caller’s return address. Give it a thought for a second, it’s confusing in the first time, I know. Just an interesting point to note that it so happens that if there’s a function which don’t call internally to other functions, it doesn’t have to store LR on the stack and later pop the PC (program-counter, IP register) off the stack, because nobody touched that register, unless the function needs around 14 registers for optimizations, instead of using local stack variables… This way you can tell which of the functions are leaves on the call graph, although it is not guaranteed.

Once we understand how the ARM architecture works we can move on. However, I have to mention that the 4 first parameters are passed on registers (R0 to R3) and the rest on the stack, so in the proxy we will have to treat the parameters accordingly. The good thing is that this ABI (Application-Binary-Interface) is something known to the compiler (LLVM with GCC front-end in my case), so you don’t have to worry about it, unless you manually write the proxy function yourself.

My proxy function can be written fully in C, although it’s possible to use C++ as well, but then you can’t use all features…

int foo(int a, int b)
{
 if (a == 1000) b /= 2;
}

That’s my sample foo proxy function, which doesn’t do anything useful nor interesting, but usually in proxies, we want to change the arguments, before moving on to the original function.

Once it is compiled, we can rip the code from the object or executable file, doesn’t really matter, and put it inside our patched file, but we are still missing the glue code. The glue code is a sequence of manually crafted instructions that will allow you to use your C code within the rest of the binary file. And to be honest, this is what I really wanted to avoid in first place. Of course, you say, “but you could write it once and then copy paste that glue code and voila”. So in a way you’re right, I can do it. But it’s bothersome and takes too much time, even that simple copy paste. And besides it is enough that you have one or more data objects stored following your function that you have to relocate all the references to them. For instance, you might have a string that you use in the proxy function. Now the way ARM works it is all get compiled as PIC (Position-Independent-Code) for the good and bad of it, probably the good of it, in our case. But then if you want to put your glue code inside the function and before the string itself, you will have to change the offset from the current PC register to the string… Sometimes it’s just easier to see some code:

stmfd sp!, {lr} 
mov r0, #0
add r0, pc, r0
bl _strlen
ldmfd sp! {pc}
db “this function returns my length :)”, 0

 When you read the current PC, you get that current instruction’s address + 8, because of the way the pipeline works in ARM. So that’s why the offset to the string is 0. Trying to put another instruction at the end of the function, for the sake of glue code, you will have to change the offset to 4. This really gets complicated if you have more than one resource to read. Even 32 bits values are stored after the end of the function, rather than in the operand of the instruction itself, as we know it on the x86.

So to complete our proxy code in C, it will have to be:

int foo(int a, int b)
{

 int (*orig_code)(int, int) = (int (*)(int, int))<addr of orig_foo + 4>; 
// +4 = We skip the first instruction which branches into this code!
 if (a == 1000) b /= 2;
// Emulate the real instruction we overrode, so stack is balanced before we continue with original function.
 asm(“sub sp, sp, #4”);
 return orig_foo(a, b);
}

This code looks more complete than before but contains a potential bug, can you spot it? Ok, I will give you a hint, if you were to use this code for x86, it would blow, though for ARM it would work well to some extent.

The bug lies in the number of arguments the original function receives. And since on ARM, only the 5th argument is passed through the stack, our “sub sp, sp, #4” will make some things go wrong. The stack of the original function should be as if it were running without we touched that function. This means that we want to push the arguments on the stack, ONLY then, do the stack fix by 4, and afterwards branch to the second instruction of the original function. Sounds good, but this is not possible in C. :( cause it means we have to run ‘user-defined’ code between the ‘pushing-arguments’ phase and the ‘calling-function’ phase. Which is actually not possible in any language I’m aware of. Correct me if I’m wrong though. So my next sentence is going to be “except Assembly”. Saved again ;)

Since I don’t want to dirty my hands with editing the binary of my new proxy function after I compile it, we have to fix that problem I just desribed above. This is the way to do it, ladies and gentlemen:

int foo(int a, int b)
{
 if (a == 1000) b /= 2;
 return orig_foo(a, b);
}

void __attribute__((naked)) orig_foo(int a, int b)
{
// Emulate the real instruction we overrode, so stack is balanced before we continue with original function.
 asm(“sub sp, sp, #4\nldr r12, [pc]\n bx r12\n.long <FOO ADDR + 4>”);
}

The code simply fixes the stack, reads the address of the original absolute foo address, again skipping the first instruction, and branches into that code. Though, it won’t change the return address in LR, therefore when the original function is over, it will return straight to the caller of orig_foo, which is our proxy function, that way we can still control the return values, if we wish to do so.

We had to use the naked attribute (__declspec(naked) in VC) so that the compiler won’t put a prologue that will unbalance our stack again. In any way the epilogue wouldn’t get to run…

This technique will work on x86 the same way, though for branching into an absolute address, one should use: push <addr>; ret.

In the bottom line, I don’t mind to pay the price for a few code lines in Assembly, that’s perfectly ok with me. The problem was that I had to edit the binary after compilation in order to fix it so it’s becoming ready to be put in the original binary as a patch. Besides, the Assembly code is a must, if you wish to compile it without further a do, and as long as the first instruction of the function hasn’t changed, your code is good to go.

This code works well and just as I really wanted, so I thought so share it with you guys, for a better “infrastructure” to make proxy function patches.

However, it could have been perfect if the compiler would have stored the functions in the same order you write them in the source code, thus the first instruction of the block would be the first instruction you have to run. Now you might need to add another branch in the beginning of the code so it skips the non-entry code. This is really compiler dependent. GCC seems to be the best in preserving the functions’ order. VC and LLVM are more problematic when optimizations are enabled. I believe I will cover this topic in the future.

One last thing, if you use -O3, or functions inline, the orig_foo naked function gets to be part of the foo function, and then the way we assume the original function returns to our foo proxy function, won’t happen. So just be sure to peek at the code so everything is fine ;)

My Turn on the IPhone

Friday, July 25th, 2008

I tried adding some feature to the IPhone, and therefore I decided that I will do it my way, an in-memory patch, rather than on-disk patch. The reason I went for memory patching is simple, version 2 of IPSW contains code signing. Why should I smash my head against the wall trying to remove that code signing checks where I can easily do anything I want in runtime? Although, I read somewhere that you can sys-call something and disable the checks, they also said it makes the system a bit shaky… and later on I learnt there is a way to sign your own code, or existing patched files using ldid.

The thing is, I started my coding on beta 3 of version 2, and there everything worked well and wasn’t long time before my code work as expected. Then I updated my IPhone to the final second version and first thing I tried was my patch to know it doesn’t work. Now since there are no debuggers for the IPhone are really problematic, and except GDB, the other two I heard of are not free (DataRescue’s and DebuggerX).

Ok, to be honest, debuggers won’t even work because you can’t attach them to some of the tasks, this is since there is a new feature in OSX called ‘PT_DENY_ATTACH’, which simply means, no debugger can attach to this task, and if there is something currently attached, then detach it… This feature is implemented in the ptrace function, which lacks other important features, like reading and writing fro and to memory or getting registers’ values, etc. Of course, there are other ways to bypass those problems, and if you look well, you can find some resources about it.

Anyway, back to the story, I had to spot in my (long long) code what was wrong. After some short time I found that vm_protect failed. Another frustrating thing was to spot that failure in my code, because I didn’t have any way to print to debug (ala DebugView) or printf, or anything. I was kinda doomed, I had to crash the task in order to know that my code reached some point, and each time move the crash further a long the code, this is really lame, I know. Maybe if I had more knowledge with Linux/BSD/OSX I could track it down quicker. Hey, if you still got an idea how to do it next time, please drop a line, heh?

So once I knew the failing code, I tried to fix it, but actually I didn’t know what was wrong. The vm_protect returned something like ‘protection error’, and hell this doesn’t say much. I got really crazy at some point, that I used that same call on my own code block, and that failed too. I didn’t know what to do, I kept playing with that shit for hours :( and nothing came up to my mind. Then I left it and went to sleep with a bad mood (I hate when it happens, usually I keep on trying until I make it, but it was 6am in the morning already…) So later the next day, I decided I will read more in the MAN, and nothing special there, it only shows the parameters, the return value, bla bla, etc. By that time, I was sure that Apple touched something in the code related to vm_protect somehow, that was my hunch. The idea to RE the kernel and this function to see what was changed from beta 3 to the final version crossed my mind, not once. But I knew that I was missing something simple and I should not go that far, after all it’s a usermode API.

As stuck as I was, I googled for as much as possible information on this vm_protect and other OSX code snippets. Eventually I hit something interesting that used vm_protect and used another flag that I didn’t know that exists in OSX. The flag name is ‘vm_prot_copy’. This might finish the story for you if you know it. Otherwise, it means that when you try to write to a page, it will make a copy of that same page particulary for the requesting task and then let you write to it. This is used in many operating systems, when some file (code usually) is being loaded from disk, and the OS wants to optimize memory, it maps the same physical page of that code/data to all tasks which loaded that file. Then if you just want to write to that page, you are forbidden, of course. Here comes the COW (copy on write ) to save us.

The annoying thing is that since the documentation sucks I didn’t find this vm_prot_copy anywhere. I even took a look in the header files, where the ‘vm_prot_execute’ for instance, was defined, and didn’t see this extra flag. Only after I knew the solution I came back to that file again and I found this flag to be declared almost in the end of the file, LOL. The cool thing, which came too late, that Cydia had some notes regard ‘how to port applications from earlier versions to final version’ and they wrote something about NX and protections, though they didn’t say anything directly about this COW thingy…

It was kinda a surprise to see that I had to specify such a low level flag. As I come from the Windows world mostly, there you don’t have to specify such a thing when you try to change some page’s protection and write to it. Therefore I didn’t expect it to be the case in other OS’s.

Just wanted to share this frustrating story and the experience of how fun (or not) it is to code for the IPhone.

Anti-Unpacker Tricks

Friday, July 18th, 2008

Peter Ferrie, a former employee of Symantec, who now works for MS wrote a paper about Anti Unpacker tricks. I was really fascinated reading that paper. There were so many examples in there for tricks that still work nowadays. Some I knew already some were new to me, he covers so many tricks. The useful thing is that every trick has a thorough description and a code snippet (mostly Assembly). So now it becomes one of the most valueable papers in the subject and you should really read it to get up to date. The paper can be found here.

One idea I that I really like from the paper, is something that Peter himself found, that you can use ReadFile (or WriteProcessMemory) to override a memory block so no software breakpoints will be raised when you execute it. But on a second thought, why a simple memcpy won’t do the same trick?

If you guys remember the Tiny PE challenge I posted 2 years ago in Securiteam.com, then Peter was the only one who kicked my ass with a version of 232 byts,  where I came with 274 bytes. But no worries, after a long while I came back with a version of 213(!) bytes (over here) and used some new tricks. Today I still wait for Peter’s last word…

Have fun

Anti Debugging

Monday, January 14th, 2008

I found this nice page about Anti-Debugging tricks. It covers so many of them and if you know the techniques it’s really fun to read it quickly one by one. You can take a look yourself here: Window Anti-Debug Reference. One of the tricks really attracted my focus and it was soemthing like this:

push ss
pop ss
pushf

What really happens is that you write to SS and the processor has a protection mechanism, so you can safely update rSP immediately as well. Because it could have led to catastrophic results if an interrupt would occur precisely after only SS is updated but rSP wasn’t yet. Therefore the processor locks all interrupts until the end of the next instruction, whatever it is. However, it locks interrupts only once during the next instruction no matter what, and it won’t work if you pop ss and then do it again… This issue means that if you are under a debugger or a tracer, the above code will push onto the stack the real flags of the processor’s current execution context.

Thus doing this:
pop eax
and eax, 0x100
jnz under_debugging

Anding the flags we just popped with 0x100 actually examines the trap flag which if you simply try to pushf and then pop eax, will show that the trap flag is clear and you’re not being debugged, which is a potential lie. So even the trap flag is getting pended or just stalled ’till next instruction and then the debugger engine can’t get to recognize a pushf instruction and fix it. How lovely.

I really agree with some other posts I saw that claim that an anti-debugging trick is just like a zero-day, if you’re the first to use it – you will win and use it well, until it is taken care of and gets known. Although, to be honest, a zero-day is way cooler and another different story, but oh well… Besides anti-debugging can’t really harm, just waste some time for the reverser.

Since I wrote diStorm and read the specs of both Intel and AMD regarding most instructions upside down, I immediately knew about “mov ss” too. Even the docs state about this special behavior. But it never occurred to me to use this trick. Anyway, another way to do the same is:

mov eax, ss
mov ss, eax
pushf

A weird issue was that the mov ss, eax, must really be mov ss, ax. Although all disassemblers will show them all as mov ss, ax (as if it were in 16 bits). In truth you will need a db 0x66 to make this mov to work… You can do also lots of fooling around with this instruction, like mov ss, ax; jmp $-2; and if you single step that, without seeing the next instruction you might get crazy before you realize what’s going on. :)

I even went further and tried to use a priviliged instruction like CLI after the writing to SS in the hope that the processor is executing in a special mode and there might be a weird bug. And guess what? It didn’t work and an exception was raised, of course. Probably otherwise I won’t have written about it here :). It seems the processors’ logic have a kind of an internal flag to pend interrupts till end of next instruction and that’s all. To find bugs you need to be creative…never harm to try even if it sounds stupid. Maybe with another privileged instruction in different rings and modes (like pmode/realmode/etc) it can lead to something weird, but I doubt it, and I’m too lazy to check it out myself. But imagine you can run a privileged instruction from ring3…now stop.

A C Tidbit

Wednesday, December 12th, 2007

Yeah well, the title is a bit weird… So I’ve been reading the C standard for fun (get a girl?) and I was interested in two things actually. First of all, while I was reading it a friend asked me a question if the layout of local (auto) variables in the scope of a function have any special order. The short answer is no. And I won’t extend this one much. Although nothing is mentioned about the “stack” and how variables are supposed to be in memory in this case. Now this is a bit confusing because in structures (and unions) what you define is how they lie in memory. But this only sounds normal because if you need to read something from hardware in a way, you really care about the order of the members of the structure… Though as it happened to me while I was reversing I saw lots of variables-recycling. Means the compiler uses a variable (which is in the stack) and afterwards went it’s out of scope it re-uses the same memory place for another variable in the same function. Or it might be the coder sucked so much that he used the same variable a few times…:P which I doubt since for once the dword was used as a byte. So the only thing you know about the stack of the function is its layout according to that same function’s code of that specific version and only after compilation (assembly listing is good too).

The other thing I was curious about is the pointer arithmetic that is mixed with integer expressions:

char *p = 0;
p + 5

We all know that it will evaluate to 5. But what about:

long *p = 0;
p + 5 ?

This one will evalute to 20, 5 * sizeof(type)…

Actually the compiler translates the operator [], array subscript, such that p[i] = *(p + i). Nothing is special here as well, unless you didn’t mess with C for a long while… Now it becomes really powerful when you can cast from one pointer type to another and use the indirection operator, so for example, say you want to scan for a dword in a memory block:

long *p = <some input>;
for (int i = 0; i < 1000; i++)
   if (p[i] == value) found…

 But in this case, you can miss that dword since we didn’t scan in byte alignment, so we have two options, either changing the type of pointed p to be a char, or to make a cast… In both ways we need to fix the limit of scan, of course.

So now it becomes:

if (*(dword*) ( ((char*)p) +i) == value) found…

This way p is scanned in byte units, because i is multiplied by sizeof(char). And we take that pointer to char, which is incremented by 1 every iteration and cast it to a dword and then derference that… I think you cannot avoid the use of a cast (or maybe automatic conversion) in the matter of getting this task completed.

Now it might be obvious to most of you, but I doubt it, since I fell in this trap as well:

long a[4] = {0};
printf(“%d”, &a[1] – &a[0]);

What will this code print when executed? I thought 4, as the sizeof of ‘a’ element is a 4 (denoted from the ‘long’ type), but to my surprise ‘1’ was printed; as it will do it as if the subscripts were the operands, however the result will be signed integer. Thus 1 – 0.

The bottom line is that p[i] is *((char*)p + i*sizeof(p[0])) and p[i] – p[j] is ((char*)&p[i] – (char*)&p[j])/sizeof(p[0]). Well the latter is less useful, but if you thought the printf above will return the sizeof the element then you are wrong :)

C++ Singletons

Monday, October 29th, 2007

One of the first rules you learn when programming is not to use global variables. Sometimes it’s possible and sometimes it’s not. I believe that every rule has an exception, <– this too. But the thing is that if you code in C++, you say to yourself, let’s encapsulate it all in a class. How nice, huh? Thing is that from the beginning you knew that those variables should be global, to something. Everything is in relation. Certainly not all functions will use the globals. The real example I encountered was to use the AllocConsole API for creating a console screen for my process. The system allows for every process to have at most one console, so obviously wrapping it in a singleton class is a good move. That class will contain also some variables which hold the state of the console. Let’s spare the gross details for now…

So having the variables inside the class as a namespace led me to touch the variables as:

void MainClass() 
{ 
 Console::m_staticVariable = ... 
}

Then I found myself accessing that variable from the MainClass, and I understood that I should move some code from MainClass to the Console class. Hey, you can say that I designed the whole thing wrong from the first moment. But believe me, sometimes you just have that piece of code which might lie in both classes. Eventually you have to decide where it fits better. Even though after some coding I was satisfied with my code, all the methods were static as well, otherwise I can’t access the so called globals. Although, this time the globals are in a namespace, that’s a start. Noticing that all Console’s methods were static, I got disgusted at my own code. Seriously, that happens to me too ;) As long as you don’t commit the code, you are allowed to go all the way until you are satisfied and the code is to your liking. At least this is what I think.

So… singleton was my answer, of course.

The “standard” basic pattern for a singleton would be to make the constructor private and have a getInstance method which will return the one and only instance of the singleton class in the system. It looks something similar to:

class MyClass {
private:
MyClass() { … }

public:
static MyClass& getInstance()
{
 static MyClass instance; // This is all the trick.
 return instance;
}

};

Speaking technically the way the static is implemented in Assembly is just setting a boolean with true when you create the object the first time and even register its corresponding destructor to the _atexit CRT function. I found it interesting.

Here it is in a nutshell:

static MyClass& getInstance() 
{ 
 static bool isInitialized = false; 
 if (!isInitialized) { 
  isInitialized = true; 
  g_MyClassInstance::MyClass(); 
  _atexit(g_MyClassInstance::~MyClass); 
 } 
 return g_MyClassInstance; 
}

 Static variables are not magic, you have to check whether they are already initialized or not. The generated code, is no brainer, and uses a boolean to check it out as well.

So off to the way I went, changing a few bits of my code to become a singleton. While testing my code again, I got a crash when the program was finished. Now quickly thinking, there’s something wrong with a “destroy” code, either destructor or something similar doesn’t function well. Well, looking at my MainClass dtor I noticed that I have to control the death of the Console class. Note that when calling the getInstance method the first time of a singleton it will only then get initialized. So you are guaranteed when the singleton-instance is constructed but you don’t know when it will be destructed. Doh.

Well, let’s go to business, implementing a dynamic singleton. To be honest, I have no idea how that’s officially called, I’m pretty sure someone already named it with something. Using the dynamic singleton I control both construction and destruction timings.

The first getInstance implementation that comes to mind is something like this:

MyClass& MyClass:getInstance() 
{ 
 static MyClass* instancePtr = NULL; 
 if (instancePtr == NULL) { 
  instancePtr = new MyClass(); 
  // error code for handling bad_alloc... 
 } 
 return *&instancePtr; 
}

Now you need a destroy method:

void MyClass:Destroy() 
{ 
 ASSERT(instancePtr != NULL); // No no. 
 delete instancePtr; 
 instancePtr = NULL; 
}

Aha! Now notice how Destroy access instancePtr which was defined inside getInstance. Therefore, you have to move the pointer to be a static class member, how rude. But no biggy. Of course, you must not call Destroy from the destructor, to avoid recursion and bad stuff happening, in short. It is vital to remember to call Destroy, otherwise you are in big troubles. On the other hand, having C++ on our side, we can extend the code so it will use a sort of smart pointer that will know when to destroy the instance automatically. A static std::auto_ptr<MyClass> m_instancePtr, will certainly do the job.

Now you ask yourself, why the heck should I use the smart pointer mechanism if I want to control the construction and destruction of the singleton. Well that’s a good question, you have to consider multi-threading.

Some problems arise with singletons and multi-threaded applications. Even though my specific application was MT, I didn’t have to worry about the construction of the singleton instance because it was surely done before CreateThread was called. When you cannot be sure about that, you will have to protect the static instance on your own. But you cannot wrap the static declaration with acquiring any sort of a lock. Maybe with a nested anonymous scope? While I’m not really sure, it’s pretty ugly. So that’s why you need the smart pointer mechanism which will destroy the class on its own accord (assuming you don’t really care when) and you are the one who fully control the construction time…

Benny and the jets say hi.