Archive for the ‘Debugging’ Category

Anti-Unpacker Tricks

Friday, July 18th, 2008

Peter Ferrie, a former employee of Symantec, who now works for MS wrote a paper about Anti Unpacker tricks. I was really fascinated reading that paper. There were so many examples in there for tricks that still work nowadays. Some I knew already some were new to me, he covers so many tricks. The useful thing is that every trick has a thorough description and a code snippet (mostly Assembly). So now it becomes one of the most valueable papers in the subject and you should really read it to get up to date. The paper can be found here.

One idea I that I really like from the paper, is something that Peter himself found, that you can use ReadFile (or WriteProcessMemory) to override a memory block so no software breakpoints will be raised when you execute it. But on a second thought, why a simple memcpy won’t do the same trick?

If you guys remember the Tiny PE challenge I posted 2 years ago in Securiteam.com, then Peter was the only one who kicked my ass with a version of 232 byts,  where I came with 274 bytes. But no worries, after a long while I came back with a version of 213(!) bytes (over here) and used some new tricks. Today I still wait for Peter’s last word…

Have fun

Signed Division In Python (and Vial)

Friday, April 25th, 2008

My stupid hosting company decided to move the site to a different server blah blah, the point is that I lost some of the recent DB changes and my email hasn’t been working for a week now :(

Anyways I repost it. The sad truth was that I had to find the post in Google’s cache in order to restore it, way to go.

Friday, April 18th, 2008:

As I was working on Vial to implement the IDIV instruction, I needed to have a signed division operator in Python. And since the x86 is a 2’s complement based, I first have to convert the number into Python’s negative (from unsigned) and only then make the operation, in my case a simple division. It was supposed to be a matter of a few minutes to code this function which gets the two operands of IDIV and return the result, but in practice it took a few bad hours.

The conversion is really easy, say we mess with 8 bits integers, then 0xff is -1, and 0×80 is -128 etc. The equation to convert it to a Python’s negative is: val – (1 << sizeof(val)*8). Of course, you do that only if the most significant bit, sign bit, is set. Eventually you return the result of val1 / val2. So far so good, but no, as I was trying to feed my IDIV with random input numbers, I saw that the result my Python’s code returns is not the same as the processor’s. This was when I started to freak out. Trying to figure out what’s the problem with my very simple snippet of code. And alas, later on I realized nothing was wrong with my code, it’s all Python’s fault.

What’s wrong with Python’s divide operator? Well, to be strict, it does not round the negative result toward 0, but towards negative infinity. Now, to be honest, I’m not really into math stuff, but all x86 processors rounds negative numbers (and positive also to be accurate) toward 0. So one would really assume Python does the same, as would C, for instance. The simple case to show what I mean is: 5/-3, in Python results in -2. Rather than -1, as the x86 IDIV instruction is expected and should return. And besides -(5/3) is not 5/-3 in Python, now it’s the time you say WTF. Which is another annoying point. But again, as I’m not a math guy, though I was speaking with many friends about this behavior, that equality (or to be accurate, inequality) is ok in real world. Seriously, what we, coders, care about real world math now? I just want to simulate a simple instruction. I really wanted to go and shout “hey there’s a bug in Python divide operator” and how come nobody saw it before? But after some digging, this behavior is really documented in Python. As much as I would hate it and many other people I know, that’s that. I even took a look at the source code of the integer division algorithm, and saw a ‘patch’ to fix the numbers to be floored if the result is negative because of C89 doesn’t define the rounding well enough.

While you’re coding something and you have a bug, you usually just start debugging your code and track it down and then fix it easily while keeping on working on the code. Because you’re in the middle of the coding phase. There are those rare times that you really get crazy when you’re absolutely sure your code is supposed to work (which it does not) and then you realize that the layer you should trust is broken (in a way). Really you want kill someone  … being a good guy I won’t do that.

Did I hear anyone say modulo?? Oh don’t even bother, but this time I think that Python returns the (math) expected result rather than the CPU. But what does it matter now? I really want only to imitate the processor’s behavior. So I had to hack that one too.

The solution after all, was to make the Python’s negative number to be absolute and remember its original sign, that we do for both operands. And then we make an unsigned division and if the signs of the input are not the same we change the sign of the result. This is because we know that the unsigned division works as the processor does and we can then use it safely.

res = x/y; if (sign_of_x != sign_of_y) res = -res;

The bottom line is that I really hate this behavior in Python and it’s not a bug, after all. I’m not sure how many people like me encountered this issue. But it’s really annoying. I don’t believe they are going to fix it in Python 3, never know though.

Anyway, I got my IDIV working now, and that was the last instruction I had to cover in my unit tests. Now It’s analysis time :)

Debugging Symbols

Saturday, April 5th, 2008

We all like to use the PDB files when they are available. Sometimes we have to get those from the Internet from MS. Usually I download the whole package of symbols for my current OS and get done with it. The problem is that sometimes after updates and the like, they are out of date and then the files I got are not relevant anymore. And as I use WinDbg I decided to set the symbols path variable in the system environment, to have it supported for other apps as well. Though I really ask myself how come I haven’t done it before. Because at my work, I already use it for a long time now…

Anyhow, I set that variable to: SRV*http://msdl.microsoft.com/download/symbols;C:\symbols;

And I was happy then that everything loads automatically. Afterwards I noticed that everytime I start debugging code in MSVS it accessed the inet for something, blocked the whole application for a few seconds and then resumed with starting my application and let me debug it. The point was that it was my own application with full source and everything and it still accessed the inet everytime, maybe for checking timestamps with loaded modules, etc. It even caused some problems like saying that my source isn’t the same like the binary I want to debug and it hasn’t let me use the source when debugging that code. So after a few times with this same confusion, I couldn’t continue work like that anymore and I tried to think what I changed that caused this weird debugging behaviors and only then it got to my mind that I added that extra variable in the environment. So I took a look at the variable and by a hint from a friend, I switched the places of the local symbols directory with the http address. And since then I don’t have any weird seeks to the inet to get/check PDBs when not required and everything run fast as normal as before.

That’s it, just wanted to share this issue. Of course, I don’t blame any application for using the first address in the variable first, because it’s up to the user how to define the priorities. It is just that I didn’t think it would matter… To learn I was wrong.

AAA/AAS Quirk

Thursday, April 3rd, 2008

As I was working on the simulation of these two instructions I found that they have some quirk, although the algorithms for these instructions are described in Intel’s specs, which (seems to) make the output defined for all inputs, it is not the case. Everytime I finish writing an implementation for a specific instruction I add that instruction to my unit tests. The instruction is being simulated with random(and some smarter) input and then checked against pure native execution to see if the results are correct. So this way I found a quirk for a range of input that reveals how the instruction is really implemented (microcode stuff prolly) rather than how it’s documented.

AL = AL + 6 is done when the AF is set or low nibble of AL is above 9. According to the documentation the destination register is AL, but in reality the destination register is AX. Now how do we know such a thing?

If we try the following input:
mov al, 0xff
aaa

The result will be 0x205, rather than 0x105 (which is what we expect according to the docs).

What really happens is that we supply a number that when added with 6 creates a carry into AH, thus incrementing AH by 1. Then looking at the docs again, we see that if AL was added with 6, it also increments AH by 1 manually. Thus AH is really incremented by 2. :P

The question is why they do AX = AX + 6, rather then operating on AL. No, actually the biggest question is why I get this same behavior on an AMD processor (whereas I work on an Intel processor). And we already by my last post about SHLD that they don’t work the same in some undefined behavior aspects (although some people believe AMD copied Intel’s architecture and implementation)…

There might be some people who will say that I went too far with testing this instruction, because I, somewhat, supply an input which is not in the valid range (it’s unpacked BCD after all), which therefore I must not rely on the output. The thing is, the algorithm is defined well to receive any input I pass it, hence I expect it to work for even undefined input. Though I believe there is no such a thing as undefined input, only undefined output, and that’s why I implemented my instrction as they both did. Specifically where they both didn’t state anything about undefined input/output, which makes my case stronger. Anyway, the point is they don’t tell us something here, the implementation is not similar to this documented in both AMD/Intel docs.

This quirk works the same for AAS, where instead of doing AL = AL -6, it’s really AX = AX – 6. I also tried to see whether they work on the whole EAX, but I saw that the high word wasn’t changed (by carry/borrow). And I also tried to see if this ill behavior is found in both DAA/DAS, but no.

Shift Double Precision

Saturday, March 29th, 2008

Were you asking me I had no idea why Intel has support for shift double precision in the 80×86. Probably their answer would be “because it used to be a CISC processor”. The shift double precision is pretty easy to implement algorithm. But maybe it was popular back then and they decided to support it hardware-ly. Like now that they add very important instructions to the SSE sets. Even so, everyone (includes me) seems to implement the algorithm like this:

(a << c) | (b >> (32-c))

Where a and b are the 32 bits input variables(/registers) and c is the count. The code shows a shift left double precision. Shifting right will require to change the shifts direction for each one of the shifts. However, if a and b were 16 bits, the equation of the second shift amount changes to (16-c). And now there is a problem, why? Because we might enter into the magical world of undefined behavior. And why is that? Because the first thing that describes the shift/rotate instructions is that the count operand is masked to preserve only the 5 least significant bits. This is because the largest shift amount for a 32 bits input is 32 shifts (and then you get a 0, ignore SAR for now). And if the input is 16 bits, the count is still masked with 31. That means that you can shift a 16 bits register more than its size. Which doesn’t make much sense, but possible for other shift instructions. But when you use a shift double preicision, not that it doesn’t makes sense, it is also undefined. That is the result is undefined, because then you try to move bits from b into a. But the count becomes negative. For example: shld ax, bx, 17. And internally the second shift amount is calculated as (16-c) which becomes (16-17). And that’s bad, right?

In reality everything is defined when it comes to digital logic. Even the undefined stuff. There must be a reason to the result I get from executing such an instruction like in the example above, even though it’s correctly and officially undefined. And I know that there is a rational behind it, because the result is consistent (at least to my Intel Core2Duo processor). So being the stubborn I am, I decided I want to know how that calculation is really being done in the hardware level.

I forgot to mention that the reason I care of how to implement this instruction is because I have to simulate it for the Vial project. I guess eventually it’s a waste of time, but I really wanted to know what’s going on anyway. Therefore I decided to research the matter and get with the algorithm my processor uses. Examining the results of officially undefined results, I quickly managed to see how to calculate the shift like the processor does, and it goes like this for 16 bits input (I guess, it will work the same for 8 bits input as well, and note that 32 bits input can’t have an undefined range, because you can’t get a negative shift amount):

def shld(a, b, c):
 c &= 31
 if c <= 15:
  return ((a << c) | (b >> (16-c))) & 0xffff
 else:

  # Undefined behavior:
  c &= 15
  return ((b << c) | (a >> (16-c))) & 0xffff

Yes, the code is in Python. But you can see that if the the count is bigger than 15, then we are replacing the input order. And then comes the part where you say “NOW WTF?!”. Even though I got this algorithm to return the same results as the processor does for defined and undefined input, I could wager the processor won’t do this kind of stuff internally. So I sat down some (long) more, and stared at the code, doing a few experiments here and there. Eventually it occurred to me:

def shld(a, b, c):
 c &= 31
 x = a | (b << 16)
 return ((x << c) | (x >> (32-c))) & 0xffff

Now you can see that the input for the original equation is the same bits-buffer input, which contains both inputs together as one. Taking a count of 17, won’t yield a negative register, but something else. Anyway, I have no idea why they implemented this instruction like they did (and it applies to SHRD as well), but I believe it has something to do with the way their processor so-called ‘engine’ works and hardware stuff.

After I learned how it works I was so eager to see how it works on AMD. And guess what? They don’t work the same, where it comes to the undefined behavior, of course. And since I don’t have an AMD anymore I didn’t see how they really implemented their shift double precision instructions.

In the Vial project, where I simulate these instructions, I added a special check for the count, to see that it’s not bigger than the input size, and if it is, I mark the destination register and some of the flags as Undefined. This way I will know when I do code-analysis that something is really wrong/buggy with the way the application works. Now what if the application is purposely uses the undefined behavior? Screw us both then. Now why would a sane application do that? ohh and that’s another story…

By the way, other shift/rotate instructions don’t have any problem with the shift amount since they can’t yield negative shift amount internally in any way, therefore the results are always defined for every input.

Anti Debugging

Monday, January 14th, 2008

I found this nice page about Anti-Debugging tricks. It covers so many of them and if you know the techniques it’s really fun to read it quickly one by one. You can take a look yourself here: Window Anti-Debug Reference. One of the tricks really attracted my focus and it was soemthing like this:

push ss
pop ss
pushf

What really happens is that you write to SS and the processor has a protection mechanism, so you can safely update rSP immediately as well. Because it could have led to catastrophic results if an interrupt would occur precisely after only SS is updated but rSP wasn’t yet. Therefore the processor locks all interrupts until the end of the next instruction, whatever it is. However, it locks interrupts only once during the next instruction no matter what, and it won’t work if you pop ss and then do it again… This issue means that if you are under a debugger or a tracer, the above code will push onto the stack the real flags of the processor’s current execution context.

Thus doing this:
pop eax
and eax, 0x100
jnz under_debugging

Anding the flags we just popped with 0x100 actually examines the trap flag which if you simply try to pushf and then pop eax, will show that the trap flag is clear and you’re not being debugged, which is a potential lie. So even the trap flag is getting pended or just stalled ’till next instruction and then the debugger engine can’t get to recognize a pushf instruction and fix it. How lovely.

I really agree with some other posts I saw that claim that an anti-debugging trick is just like a zero-day, if you’re the first to use it – you will win and use it well, until it is taken care of and gets known. Although, to be honest, a zero-day is way cooler and another different story, but oh well… Besides anti-debugging can’t really harm, just waste some time for the reverser.

Since I wrote diStorm and read the specs of both Intel and AMD regarding most instructions upside down, I immediately knew about “mov ss” too. Even the docs state about this special behavior. But it never occurred to me to use this trick. Anyway, another way to do the same is:

mov eax, ss
mov ss, eax
pushf

A weird issue was that the mov ss, eax, must really be mov ss, ax. Although all disassemblers will show them all as mov ss, ax (as if it were in 16 bits). In truth you will need a db 0x66 to make this mov to work… You can do also lots of fooling around with this instruction, like mov ss, ax; jmp $-2; and if you single step that, without seeing the next instruction you might get crazy before you realize what’s going on. :)

I even went further and tried to use a priviliged instruction like CLI after the writing to SS in the hope that the processor is executing in a special mode and there might be a weird bug. And guess what? It didn’t work and an exception was raised, of course. Probably otherwise I won’t have written about it here :). It seems the processors’ logic have a kind of an internal flag to pend interrupts till end of next instruction and that’s all. To find bugs you need to be creative…never harm to try even if it sounds stupid. Maybe with another privileged instruction in different rings and modes (like pmode/realmode/etc) it can lead to something weird, but I doubt it, and I’m too lazy to check it out myself. But imagine you can run a privileged instruction from ring3…now stop.

SOAP Is So Tricksy

Friday, January 11th, 2008

I started to code in .Net at work a while ago, yes yes have a laugh at me, I still kick your ass in Assembly ;) Anyway, we use SOAP heavily and it is really wrapped well in C#, you invoke the remote methods seamlessly as if they were a local interface/methods to use. We wanted to see the performance of our server, so I wrote some simple stress tests in C#, of course. I configured my IIS to work with my SOAP libraries and started running the tests. Then after a long long while, an exception is thrown from the server notifying me that the session is null. Needless to say, our service needs authentication and stores some info about it (it requires a login before usage). I wasn’t sure why the session is dropped, because all other connections kept on working and invoking methods on the server. So something here was fishy. I googled for ‘dropped sessions’ and all the stuff I could find about .Net, IIS and Sessions. They gave tips like check the Event Viewer for logs that the server is recycling etc. Looking at the Event Viewer of the server I didn’t find anything special nor any indication that something happened to the server. And no, the server wasn’t recycling cause otherwise all the other connections would have dropped at the same time as well as that one I saw, therefore I eliminated options like the files in the virtual directory of the services were changed or scanned… (Some weird feature of the ASP.Net Web Service, go figure). Eventually I sniffed my own network with WireShark and since I was stress-testing the server I gathered loads of packets in a jiffy, realizing it’s not the way to go. However I managed to catch the exception while sniffing the connections (while restarting the sniffing every few seconds… :) ) and analyzing the data yielded that the client (that is, my test app) used out of the blue some session id that I didn’t see earlier in the same log I captured (mind you I didn’t have the full log). Spooky something. I did the test again, and got same results. Then I thought why not to change the session time of my webservice in IIS to 1 min. You guessed it right, it so happened that the problem occurred more frequently, so now I was sure the problem is in my code and not something wrong configured with the server or the client uses random sessions ids for some reason… never know with bugs, you know, especially when you stress-test. hihi

The next thing I was doing, since I didn’t want to sniff the whole session of the connection(s) was to add some code for logging the time of last method-invocation and when that null session exception is thrown to see how many seconds have elapsed since that last invocation. Immediately it showed that some invocation happened after 1 minute, meaning the session at the server is already expired and yet the client would still use that same session id it got from the beginning. Every connection was made from a different thread, and each thread could talk to a few webservices at the same time. When the exception is thrown I know which webservice/method raised it and to that function I added the last time thingy.

In the end, I had some ‘if random(100) < 1’ that resulted in true only after some time in a while, the bug didn’t surface up from the beginning cause everytime that it did the invocation of the remote method, the session time out will be reset, but some rare times, the invocation hasn’t occurred for more than the default session time-out (20 mins)  and thus the exception and dropped connection.

The solution was simple, though we had two options: To add cookies to the client side so we don’t need a session at the server, and even if the session is expired the server will still recognize our cookie and serve us well. The other solution, which was simpler now, was to call the login method again to re-initialize the authentication, which really creates the Session at the server side and everything works better now.

I had this stupid bug because SOAP is wrapped so nicely in .Net that you don’t ‘feel’ it, nor even remember that you use a server here, I really mean it guys. So why in the world should I call login after I did it once in the initialization?! Now I got a reason, apparently :)

NTVDM #1

Sunday, July 15th, 2007

DOS is dead; and that’s a fact. But NTVDM is still a cool and handy useful tool. I guess that most of us are not satisfied with the way it works…usually the sound doesn’t work, which is a good enough cause to try the great opened source projects which simulate DOS. Anyways, a few years a go, a friend of mine wrote some piece of code which writes to 0xb800, remember this one? That’s the text mode buffer starting address. Anyways, I was wondering how come you write to this address and something appears on the screen (of course, there is the attribute and the character), mind you it’s NTVDM we are talking about. But this wasn’t the interesting part – Why sometimes your writes to this buffer works and sometimes simply not. I decided to learn the situation and see why it happens.

So here’s what I did:

mov ax, 0xb800
mov ds, ax
mov ax, 0x0741
mov [bx], ax
ret

Which prints a grey ‘a’ in the top left corner, yeppi. Now if you open cmd and run the .com file of this binary you won’t see anything at all. Which is unexpected because you write to the ‘screen’, after all. Now, my friend only knew that whenever he runs ‘debug’ before his program, which I just showed the important part above, then the letter ‘a’ will be displayed. So I gave it a long thought…. …. After that I tried the following addition to the above code (I put it before the original code):

mov ax, 3
int 0x10

This will only set the current video mode to text mode 80x25x16… And then voila, the writing worked as expected. Then I suspected that the VM monitors for int 0x10 and function #0, set mode. But it had seemed that every function will enable the writes…And I later confirmed that it is true.

So now that I knew how to trigger the magic, I simply searched for ‘cd 10’ (that’s int 0x10) in ‘debug’ and found a few occurrences, which proved my friend’s experience – that after running ‘debug’, writing to 0xb800 would work. Of course, if you ran other programs which used int 0x10, you’re good to go as well.

But that was only one thing of the mystery, I wanted to also understand how the writes really happens. Whether the VM monitors all instructions and checks the final effective address to see if it’s in the buffer range, or maybe the memory is specially mapped with Win32 API. Because after all, the NTVDM screen is a normal console window (not speaking of graphical modes now). Surprisingly, I found out that the solution was even simpler, a timer was issued every short interval, which called among other things to a function that copies the 0xb800 buffer to the real console screen, using some console APIs… And yes, your simulated writes really go to the virtual memory of the same address in the real NTVDM.exe process. Maybe it has a filter or something I assume, but I didn’t look for it, so I really don’t know.

Hot Patching (/Detouring)

Thursday, July 12th, 2007

Hot Patching is a nice feature which lets you apply a patch in-memory to affect the required code immediately. This is good as long as you can’t restart your system to do the on-disk patching. Since there are times that you can’t allow to restart your computer, probably only in servers…

Well speaking technically about Hot Patching, if you happen to see how code is generated in MS files, for instance, you can always see the 5 CC’s in a row before every function and then the function will begin with the infamous MOV EDI, EDI.

It looks something like this:

0005951e (01) 90                      NOP
0005951f (01) 90                       NOP
00059520 (01) 90                      NOP
00059521 (01) 90                      NOP
00059522 (01) 90                      NOP
00059523 (02) 8bff                   MOV EDI, EDI
00059525 (01) 55                      PUSH EBP
00059526 (02) 8bec                  MOV EBP, ESP

This is a real example, but this time it uses NOP’s instead of INT3’s… It doesn’t really matter, that piece of padding code isn’t really executed.
First things first – So why the MOV EDI, EDI is really executed?
So before I answer directly to this question, I will just say that when you want to patch the function, you will make a detour. So instead of patching a few bytes here and there, you will probably load a new whole copy of the patched and fixed function to a new region in the memory. This will be easier than specific spots patching… And then you will want this new code to run instead of the old one. Now you have two options to patch all callers to this function, which is a crazy thing to do. Or the more popular way- the trick comes in, the MOV EDI, EDI is used as a pseudo NOP, and it is executed on purpose every time the function runs. So when time comes and you apply the patch you can simply override this instruction with a short JMP instruction which takes 2 bytes as well. The jump instruction will jump 5 bytes backward to the beginning of the padding precisely before the patched function. So why 5 bytes of padding and not less or more? This is an easy one, in 5 bytes you can jump anywhere in the address space of 32 bits. Thus, no matter where your new patched function lies in memory you can jump to it. So the 5 bytes will be patched to contain a long JMP instruction. The offset of the long JMP will be calculated once as a relative offset.

Well, actually I didn’t really answer the first question yet. But now that you got a better understanding of this mechanism I really can. The thing is, that in old times the perfect patchers had to disassemble the beginning of the patched function in order to see where it can replace a few instructions to put the 5 bytes long JMP. So it transfers control to you in the beginning of the original function and when you are done, you run the overriden instruction, but as whole instructions(!) and then continue executing that same function from the place you finished overriding it.

Here’s some example, the first instruction for the sake of conversation takes 3 bytes and then the second instruction takes 3 bytes too. Now if you put the long JMP instruction at the first byte of the function and then you want to continue execution after you got control at offset 5, you will be out of synchronization and run incorrect code, because you are supposed to continue execution from offset 6… Eventually it will crash, probably for a access-violation exception.

So now instead of having all this headache, you know that you can safely change the first 2 bytes, to a short JMP and it will always work no matter what.

Another crazy reason for this new way is because say the patched function can run in a few threads at the same time. Now think that you patched the first 5 bytes, and then a different thread start running at offset 3 (because it already ran the first instruction, it just continue normally, but with changed code), then bam… you broke the instruction…

 The reason for using the specific MOV instruction is understood, since it’s a pseudo NOP, it doesn’t really affect (although it is not a real NOP) the CPU context but the program counter. And EDI, was chosen to my guess, because it makes the second byte of the instruction as 0xFF when both operands are EDI, like in this case. And yet there is no specific reason that I can come up with.

You can see that in two memcpy’s for the matter, you can detour a function successfuly without any potential problems. Piece of cake. The problem is that not all files support this feature yet, thus sometimes you still have to stick to the old methods and find a generic solution, like I did in ZERT’s patches…but that’s another story.

Oh Boy, There’s a Bug

Tuesday, June 12th, 2007

Can you spot it?

void p(const CString & str)
{
char* endPtr = NULL; 
unsigned long x = strtoul(str.Mid(1, 2), 10, &endPtr);
if (*endPtr != ‘\0’) {
// invaild input…
}

}

 Using ATL’s CString, you can run this code and nothing will happen – the bug won’t be ‘triggered’. And more than that, everything will work as expected and correctly. But as most developers find bugs, running the software and fixing the crashes, this one didn’t come up to the surface… so you have a bug and you don’t know about it, how not fun. If you are really into C++ business it might be easy for you to spot this bug. But I’m not sure if it’s C++ to blame here, guess it’s not. If you take a look at the Mid method its declaration is as follows:

CStringT Mid(int iFirst, int nCount) const;

 This means it returns a object with the relevant slice of the original string. The destructor of this new object is immediately called, because its scope is a ‘parameter’. So now the endPtr points to freed memory. Yey. And then you access endPtr as you should…even if the memory is freed it is still in memory (because CString dynamically allocates space for its buffers), only AppVerifier forced it to be removed, thus generating an access-violation and then it surfaced up and we fixed it. But, honestly, it was still tricky to spot this one.

 So now that you know the bug and why it happened, I can only say that you should beware of methods/functions that return objects and has a very short scope, if at all. Not mentioning the copy constructor issues… Of course, merely the scope isn’t to be blamed, it’s the mixture of short scope and a stale pointer (and dynamically allocated memory),…To be honest, if the memory was local to the stack there was no way to detect this bug, unless the compiler would use the same stack space for different variables and they were written to between the call to strtoul and the access to the pointer. And even that isn’t reliable, but you could notice some weird mismatches between the input and that input validation…

 Any suggestions for tracking such bugs?