Archive for the ‘Reversing’ Category

NTVDM #1

Sunday, July 15th, 2007

DOS is dead; and that’s a fact. But NTVDM is still a cool and handy useful tool. I guess that most of us are not satisfied with the way it works…usually the sound doesn’t work, which is a good enough cause to try the great opened source projects which simulate DOS. Anyways, a few years a go, a friend of mine wrote some piece of code which writes to 0xb800, remember this one? That’s the text mode buffer starting address. Anyways, I was wondering how come you write to this address and something appears on the screen (of course, there is the attribute and the character), mind you it’s NTVDM we are talking about. But this wasn’t the interesting part – Why sometimes your writes to this buffer works and sometimes simply not. I decided to learn the situation and see why it happens.

So here’s what I did:

mov ax, 0xb800
mov ds, ax
mov ax, 0x0741
mov [bx], ax
ret

Which prints a grey ‘a’ in the top left corner, yeppi. Now if you open cmd and run the .com file of this binary you won’t see anything at all. Which is unexpected because you write to the ‘screen’, after all. Now, my friend only knew that whenever he runs ‘debug’ before his program, which I just showed the important part above, then the letter ‘a’ will be displayed. So I gave it a long thought…. …. After that I tried the following addition to the above code (I put it before the original code):

mov ax, 3
int 0x10

This will only set the current video mode to text mode 80x25x16… And then voila, the writing worked as expected. Then I suspected that the VM monitors for int 0x10 and function #0, set mode. But it had seemed that every function will enable the writes…And I later confirmed that it is true.

So now that I knew how to trigger the magic, I simply searched for ‘cd 10’ (that’s int 0x10) in ‘debug’ and found a few occurrences, which proved my friend’s experience – that after running ‘debug’, writing to 0xb800 would work. Of course, if you ran other programs which used int 0x10, you’re good to go as well.

But that was only one thing of the mystery, I wanted to also understand how the writes really happens. Whether the VM monitors all instructions and checks the final effective address to see if it’s in the buffer range, or maybe the memory is specially mapped with Win32 API. Because after all, the NTVDM screen is a normal console window (not speaking of graphical modes now). Surprisingly, I found out that the solution was even simpler, a timer was issued every short interval, which called among other things to a function that copies the 0xb800 buffer to the real console screen, using some console APIs… And yes, your simulated writes really go to the virtual memory of the same address in the real NTVDM.exe process. Maybe it has a filter or something I assume, but I didn’t look for it, so I really don’t know.

Hot Patching (/Detouring)

Thursday, July 12th, 2007

Hot Patching is a nice feature which lets you apply a patch in-memory to affect the required code immediately. This is good as long as you can’t restart your system to do the on-disk patching. Since there are times that you can’t allow to restart your computer, probably only in servers…

Well speaking technically about Hot Patching, if you happen to see how code is generated in MS files, for instance, you can always see the 5 CC’s in a row before every function and then the function will begin with the infamous MOV EDI, EDI.

It looks something like this:

0005951e (01) 90                      NOP
0005951f (01) 90                       NOP
00059520 (01) 90                      NOP
00059521 (01) 90                      NOP
00059522 (01) 90                      NOP
00059523 (02) 8bff                   MOV EDI, EDI
00059525 (01) 55                      PUSH EBP
00059526 (02) 8bec                  MOV EBP, ESP

This is a real example, but this time it uses NOP’s instead of INT3’s… It doesn’t really matter, that piece of padding code isn’t really executed.
First things first – So why the MOV EDI, EDI is really executed?
So before I answer directly to this question, I will just say that when you want to patch the function, you will make a detour. So instead of patching a few bytes here and there, you will probably load a new whole copy of the patched and fixed function to a new region in the memory. This will be easier than specific spots patching… And then you will want this new code to run instead of the old one. Now you have two options to patch all callers to this function, which is a crazy thing to do. Or the more popular way- the trick comes in, the MOV EDI, EDI is used as a pseudo NOP, and it is executed on purpose every time the function runs. So when time comes and you apply the patch you can simply override this instruction with a short JMP instruction which takes 2 bytes as well. The jump instruction will jump 5 bytes backward to the beginning of the padding precisely before the patched function. So why 5 bytes of padding and not less or more? This is an easy one, in 5 bytes you can jump anywhere in the address space of 32 bits. Thus, no matter where your new patched function lies in memory you can jump to it. So the 5 bytes will be patched to contain a long JMP instruction. The offset of the long JMP will be calculated once as a relative offset.

Well, actually I didn’t really answer the first question yet. But now that you got a better understanding of this mechanism I really can. The thing is, that in old times the perfect patchers had to disassemble the beginning of the patched function in order to see where it can replace a few instructions to put the 5 bytes long JMP. So it transfers control to you in the beginning of the original function and when you are done, you run the overriden instruction, but as whole instructions(!) and then continue executing that same function from the place you finished overriding it.

Here’s some example, the first instruction for the sake of conversation takes 3 bytes and then the second instruction takes 3 bytes too. Now if you put the long JMP instruction at the first byte of the function and then you want to continue execution after you got control at offset 5, you will be out of synchronization and run incorrect code, because you are supposed to continue execution from offset 6… Eventually it will crash, probably for a access-violation exception.

So now instead of having all this headache, you know that you can safely change the first 2 bytes, to a short JMP and it will always work no matter what.

Another crazy reason for this new way is because say the patched function can run in a few threads at the same time. Now think that you patched the first 5 bytes, and then a different thread start running at offset 3 (because it already ran the first instruction, it just continue normally, but with changed code), then bam… you broke the instruction…

 The reason for using the specific MOV instruction is understood, since it’s a pseudo NOP, it doesn’t really affect (although it is not a real NOP) the CPU context but the program counter. And EDI, was chosen to my guess, because it makes the second byte of the instruction as 0xFF when both operands are EDI, like in this case. And yet there is no specific reason that I can come up with.

You can see that in two memcpy’s for the matter, you can detour a function successfuly without any potential problems. Piece of cake. The problem is that not all files support this feature yet, thus sometimes you still have to stick to the old methods and find a generic solution, like I did in ZERT’s patches…but that’s another story.

Common PE-Parsing Pitfalls

Sunday, June 3rd, 2007

PE, or Portable-Executable is Windows’ executable format. Looking only at the PE , as opposed to the code inside, can teach you alot about the application. But sometimes the tools you use to parse the file don’t do their work well. I, hereby, want to show a few problems about this kind of tools. As a matter of fact, .DLL and .SYS are also PE files under Windows, so I consider them the same when talking about PE files.

  1. If the Export-Directory offset points to a garbage pointer, Windows will still load the PE and run it successfully. It will get crazy and probably crash only when you try to GetProcAddress on this file. You can use this to render some tools unuseful but the file is still valid and runnable. Don’t confuse with Import-Directory which is necessary for loading the PE.
  2. Another annoying thing, whether the so-called “PE-Standard” states it or not, is the way of RVA (relative-virtual -address) offsets. RVA offset is the distance from the beginning of the file in memory to some specific field or section. Most people treat these RVA’s as if they must point into a section. But in reality RVA’s can point anywhere in the memory mapped file. They can even be negative numbers, (at least, as long as they still point to valid memory of the file). The problem is, most tools try to find the pointee field/section by scanning the sections list, but alas, it might be a valid RVA, which points to no section, but another part in the memory mapped file, for example, the MZ header… While Windows load these files well, some tools cannot parse them.
  3.  The most interesting problem that I found myself, not sure if anyone used it before, was changing the section’s File-Offset to be mis-aligned. The File-Offset is actually rounded down to a sector size (that’s 512 bytes) no matter what. So adding some number to the original valid File-Offset of the code section will fool some tools to read the code from that offset, instead of the rounded offset. Guess what happens? You disassemble the code from a wrong address and everything messes up. Even the mighty IDA had this bug. I introduced this technique in my Tiny PE Challenge. It seems most anti-virus software couldn’t chew up this file back then when I released it…Not sure about nowadays.
  4.  While researching for Tiny PE, Matthew Murphy hinted out that you can load files from the Internet with feeding it with a raw URL of the PE file. Later on it was extended such that Windows’ loader will use WebDAV to load an imported .DLL from the Internet! Think of an imported library with the following name \\127.0.0.1\my.dll inside the PE file itself. This one seemed to be a real blow to the AV industry. It means you can write an empty PE file which will have this special import thingy and gets it off the Internet. For samples you can check it out here, which covers Tiny PE (not my ones) very nicely.

The bottom line is that the Windows’ loader is kinda robust and yet very permissive. It seems as virii can exploit many features the PE format has to offer while AV’s still don’t support some. I guess some of the tools (some are really popular) will get better with time. As for now, my PE parser library for Python, diSlib64 seems to do the job quite well.