Archive for July, 2007

Basic Ops #2 – Add/Sub

Wednesday, July 25th, 2007

After we started with the really easy implementation of Multiplication’s algorithm (right here), I will now proceed to add and sub, which are a bit more complicated. But still, nothing like the dreaded Long Division algorithm. :)

Anyways, the intresting part about add and sub is that you cetainly will use the XOR operator. If you think of it, actually XOR can calculate add and sub together, but without fixing the carry or borrow. That you will have to do yourself. But it’s still good enough for one bit operations. For example: 1-1 = 0, and 1^1 = 0. Or another example, 1 + 0=1 –> 1^0=1. So this is all nice, but what happens when you have to borrow or to calculate the carry? So if you operate on 1 bit variables and you do 1+1, you will get 0 and a carry should be propagated to the next bit. Therefore, even if this is the case, we still can use the XOR operator, because 1^1 will yield a zero, which is what we expect. So now comes the big question – how do we calculate the carry (for now I will stick to carry, and later I will talk about borrow), the carry has to propagate from the low significant bit to the highest. In my opinion this is the most fascinating device in this algorithm, because the rest is the same for both add and sub.

So to answer that, let the input be two integers: a and b. Calculating the XOR between them is a good start and can be done as easily as that. But now we would like to fix the carry and apply it to the other bits. So if we AND (bitwise) the original a and b, we will get all parallel bits that match. For example, adding 2+6 will result in:

010 &
110
—-
010

This ANDed result is the mask of the carry we begin with. And I use the word “begin” since the carry has to propogate to the higher bits, and therefore we will need to iterate a specific operation that will fix the carry on the rest of the bits. The reason we AND a and b, is because surely the result’s bits are these which will cause a carry… Once we applied the current carry, we will have to calculate the next carry. As I said earlier, we have to iterate this operation, therefore it means there must be some condition which will stop this iteration, otherwise it won’t make any sense and stuck forever. Do you have any idea what will be this condition? It’s not the number of bits of a (aka sizeof a), nor b (which we presume are the same, of course). We will run as long as there is a carry to apply to the constructed result of the addition.

By now we know almost everything, there’s only one thing left out to be done, and that’s the actual calculation of the next carry. This is the most tricky thing in the whole algorithm. We have the mask of the carry from the a&b operation, every time we apply this mask by XORing it the result, we have to remove (mask off) these bits and continue until the carry-mask is zero. Therefore it is understandable that some bits should “go home”, but which bits do we remove? We have to remove the bits that were already applied as a carry (since carry changes after every bit). The real problem is that we need to calculate the carry for all bits at once, since we can’t and don’t want to calculate the carry for each bit, since it will be less optimized and not a challenge at all. But even then, it’s not possible to apply the carry only once, since it ‘carries’ to the next bit by ‘nature’.

Well, usually I think that when code is shown no words need to be spoken, but in the case of this algorithm, some text must be written. And that was the above…So now that you should know what’s going on abit, the code will help you to fully understand the algorithm. I never claimed to explain this algorithm well, it’s a really hard one, honestly. Sorry.

I will just remind you that XOR is the key to the add and sub operations. So if you take a look at the truth-tables of XOR, you will see that it is exactly what we need, like I tried to explain in the beginning. XOR alone is not enough to calculate the result of the wanted operation, since we have to apply the carry everytime. The carry is applied using, nothing else but, XORing again…you got that right.

unsigned int add(unsigned int a, unsigned int b) 
{ 
  a ^= b; // Do a one bit addition, now we have to calculate carry and apply it if required. 
  b &= a ^ b; // b will now contain the carry-mask, note that they are XORed again to eliminate a temporary value. 
  while (b) { // As long as there is a carry to apply to the result... 
     b <<= 1; // We've got to shift left, so the carry propagates to higher bits, right? 
     a ^= b; // Apply the new carry to the result. 
     b &= ~a; // Get the next carry-mask by removing bits that were already taken into account and fixed. 
  } 
  return a; 
}

If you really want to master this algorithm, start applying it on small number with 3 bits. You will see that the next carry is wisely calculated when you will begin with a carry that have a few bits set, for example: 7+5…

I will leave the implementation of the ‘sub’ to next week, so you can try to come up with it on your own. I can only say that it is really similar to the addition’s algorithm. Think of how you would change the carry into a borrow, and how it actually affects the mask.

NTVDM #1

Sunday, July 15th, 2007

DOS is dead; and that’s a fact. But NTVDM is still a cool and handy useful tool. I guess that most of us are not satisfied with the way it works…usually the sound doesn’t work, which is a good enough cause to try the great opened source projects which simulate DOS. Anyways, a few years a go, a friend of mine wrote some piece of code which writes to 0xb800, remember this one? That’s the text mode buffer starting address. Anyways, I was wondering how come you write to this address and something appears on the screen (of course, there is the attribute and the character), mind you it’s NTVDM we are talking about. But this wasn’t the interesting part – Why sometimes your writes to this buffer works and sometimes simply not. I decided to learn the situation and see why it happens.

So here’s what I did:

mov ax, 0xb800
mov ds, ax
mov ax, 0x0741
mov [bx], ax
ret

Which prints a grey ‘a’ in the top left corner, yeppi. Now if you open cmd and run the .com file of this binary you won’t see anything at all. Which is unexpected because you write to the ‘screen’, after all. Now, my friend only knew that whenever he runs ‘debug’ before his program, which I just showed the important part above, then the letter ‘a’ will be displayed. So I gave it a long thought…. …. After that I tried the following addition to the above code (I put it before the original code):

mov ax, 3
int 0x10

This will only set the current video mode to text mode 80x25x16… And then voila, the writing worked as expected. Then I suspected that the VM monitors for int 0x10 and function #0, set mode. But it had seemed that every function will enable the writes…And I later confirmed that it is true.

So now that I knew how to trigger the magic, I simply searched for ‘cd 10’ (that’s int 0x10) in ‘debug’ and found a few occurrences, which proved my friend’s experience – that after running ‘debug’, writing to 0xb800 would work. Of course, if you ran other programs which used int 0x10, you’re good to go as well.

But that was only one thing of the mystery, I wanted to also understand how the writes really happens. Whether the VM monitors all instructions and checks the final effective address to see if it’s in the buffer range, or maybe the memory is specially mapped with Win32 API. Because after all, the NTVDM screen is a normal console window (not speaking of graphical modes now). Surprisingly, I found out that the solution was even simpler, a timer was issued every short interval, which called among other things to a function that copies the 0xb800 buffer to the real console screen, using some console APIs… And yes, your simulated writes really go to the virtual memory of the same address in the real NTVDM.exe process. Maybe it has a filter or something I assume, but I didn’t look for it, so I really don’t know.

Hot Patching (/Detouring)

Thursday, July 12th, 2007

Hot Patching is a nice feature which lets you apply a patch in-memory to affect the required code immediately. This is good as long as you can’t restart your system to do the on-disk patching. Since there are times that you can’t allow to restart your computer, probably only in servers…

Well speaking technically about Hot Patching, if you happen to see how code is generated in MS files, for instance, you can always see the 5 CC’s in a row before every function and then the function will begin with the infamous MOV EDI, EDI.

It looks something like this:

0005951e (01) 90                      NOP
0005951f (01) 90                       NOP
00059520 (01) 90                      NOP
00059521 (01) 90                      NOP
00059522 (01) 90                      NOP
00059523 (02) 8bff                   MOV EDI, EDI
00059525 (01) 55                      PUSH EBP
00059526 (02) 8bec                  MOV EBP, ESP

This is a real example, but this time it uses NOP’s instead of INT3’s… It doesn’t really matter, that piece of padding code isn’t really executed.
First things first – So why the MOV EDI, EDI is really executed?
So before I answer directly to this question, I will just say that when you want to patch the function, you will make a detour. So instead of patching a few bytes here and there, you will probably load a new whole copy of the patched and fixed function to a new region in the memory. This will be easier than specific spots patching… And then you will want this new code to run instead of the old one. Now you have two options to patch all callers to this function, which is a crazy thing to do. Or the more popular way- the trick comes in, the MOV EDI, EDI is used as a pseudo NOP, and it is executed on purpose every time the function runs. So when time comes and you apply the patch you can simply override this instruction with a short JMP instruction which takes 2 bytes as well. The jump instruction will jump 5 bytes backward to the beginning of the padding precisely before the patched function. So why 5 bytes of padding and not less or more? This is an easy one, in 5 bytes you can jump anywhere in the address space of 32 bits. Thus, no matter where your new patched function lies in memory you can jump to it. So the 5 bytes will be patched to contain a long JMP instruction. The offset of the long JMP will be calculated once as a relative offset.

Well, actually I didn’t really answer the first question yet. But now that you got a better understanding of this mechanism I really can. The thing is, that in old times the perfect patchers had to disassemble the beginning of the patched function in order to see where it can replace a few instructions to put the 5 bytes long JMP. So it transfers control to you in the beginning of the original function and when you are done, you run the overriden instruction, but as whole instructions(!) and then continue executing that same function from the place you finished overriding it.

Here’s some example, the first instruction for the sake of conversation takes 3 bytes and then the second instruction takes 3 bytes too. Now if you put the long JMP instruction at the first byte of the function and then you want to continue execution after you got control at offset 5, you will be out of synchronization and run incorrect code, because you are supposed to continue execution from offset 6… Eventually it will crash, probably for a access-violation exception.

So now instead of having all this headache, you know that you can safely change the first 2 bytes, to a short JMP and it will always work no matter what.

Another crazy reason for this new way is because say the patched function can run in a few threads at the same time. Now think that you patched the first 5 bytes, and then a different thread start running at offset 3 (because it already ran the first instruction, it just continue normally, but with changed code), then bam… you broke the instruction…

 The reason for using the specific MOV instruction is understood, since it’s a pseudo NOP, it doesn’t really affect (although it is not a real NOP) the CPU context but the program counter. And EDI, was chosen to my guess, because it makes the second byte of the instruction as 0xFF when both operands are EDI, like in this case. And yet there is no specific reason that I can come up with.

You can see that in two memcpy’s for the matter, you can detour a function successfuly without any potential problems. Piece of cake. The problem is that not all files support this feature yet, thus sometimes you still have to stick to the old methods and find a generic solution, like I did in ZERT’s patches…but that’s another story.

std::bad_alloc

Saturday, July 7th, 2007

Exceptions are one of the most flamed topics in C++. But before I talk about the operator new I want to say a few things in general about them. There are many ways to use exceptions, different approaches. Although, it costs more than error checking in sake of speed, it is pretty convenient and usable. Well, some people use it to replace these return code tests. Others use it in rare cases where something less expected happens. I even saw some justified reasons to why not use it. For instance, you can’t see if a function throws an exception from merely looking at its interface. Every library has its own type of exception base class, CAtlException, std::exception, etc… And you should not let anything slip your catch clause. So sometimes it might be pretty annoying to catch them all, and then you have to multiple your inside-catch-clause-code for handling each different exception. And if you have C++, you won’t dare using goto, even not for a cleanup…Of course, there are good sides also and that’s why most people use them after all.

If we take a look at the operator new, that’s the C++ raw memory allocator, we will see that it might throw an std::bad_alloc in case of failure, which is 100% legitimate, because most of the times you just expect that it will return your a pointer to a valid contigious block of memory. But now, assuming I use my own internal exceptions type for my application; Why do I need everytime to catch this exception and rethrow the converted exception? This is waste of time both in development and in run time. And there are some approaches who believe you don’t have to catch it at all, because if it is thrown, you are dead anyways. Now tell me seriously, if you write a real software for a company, you are not going to catch it? now gimme a break! So catching it you must, and therefore I don’t think every usage of the new operator should be wrapped by its own try and catch if you wrap it anyways with catching your own application’s exception. So eventually, after lots of discussions with my colleagues, we decided it suits us mostly to overload the global operator new function with our own custom function with a single change, that it will throw our exception rather than std::bad_alloc in case of failure. Overloading that operator, requires you to overload the array interface also. Don’t forget, when overloading new you will want to overload delete… :)