Posts Tagged ‘Memory Leaks’

Memory Management

Tuesday, December 7th, 2010

In this post I decided to write about a few things you have to keep on mind while writing or designing a big application regarding memory management, I will also try to give you a new way for looking at memory allocations.

The problem is that one day you wake up and you have either memory leaks or memory fragmentation, you don’t know which or why. To eliminate the possibility of memory fragmentation, you will have to rule out memory leaks first.

Trying to find a solution to such a case you find that your dozens of DLLs work (normally but-)sharing the same CRT, so if you pass an allocated object from one DLL to be used and freed by another DLL, you’re screwed. Also, you can’t separate the CRTs, etc. The root of the problem is because each CRT has its own heap, thus if you statically link a DLL with its own CRT, it will have its own heap. If all DLLs use the same CRT, they share the same heap, or even the application’s global heap. Why is it bad? Because then if you have memory leaks in one DLL, it will dirty the whole global heap. I will talk about it soon. An advantage for separating heaps is that each thread that belongs to some DLL can access its own heap faster, without congestion on the global heap lock which all DLLs/threads need.
The idea is to separate the applications into logical groups of components, and each logical group will use its own heap, so you can pass pointers around. This is really hard to do if you have existing code and never thought about this issue. So you’re saying, “oh yeah, but if I don’t have leaks, and my code is doing alright, I don’t need it anyway”. Well I wouldn’t be so quick about thinking that. Keep on reading.

In C++ for instance, overriding new and delete for your class, is really useful, because even if it gets used in a different component, C++ is responsible to bring the “deletion” to the appropriate heap.

Now I wanna go back a bit to some background on how memory allocators work in general, which is usually in a naive fashion. It really means that when you ask for a block of memory, the memory manager (malloc for you) allocates some chunk and returns a pointer to you. Once another request for memory is being done, the next memory address is getting used to supply the memory request… (suppose the first chunk requested is still busy), and so on. What you can see here, is that the memory allocator works linearly (allocates a block after the other) and chronologically (depends on the time you asked for the allocation).

Let me give you some example so you understand what I’m talking about, in this example there are two threads: Thread A and Thread B.
Thread A allocates chunks of 0.5kb in a loop, and at some point shuts down after it’s done with its task.
Thread B allocates chunks of 0.5kb in a loop, simultaneously, and keeps on working.
And suppose they are using the same CRT, thus the same heap.

What we will see from the memory manager perspective is something like:
<0.5kb busy> <0.5kb busy> <0.5kb busy>…………………….and then something like ….

Each chunk might belong to either Thread A or Thread B. Don’t be picky about it, it’s really up to so many environmental issues (processors, locks, etc).

Suppose Thread A is finished with its memory because it was only up to do some initialization job(!) and decides to free everything it has allocated. The memory picture would be something like this now:
<0.5kb free><0.5kb busy><0.5kb free><0.5kb busy> …………………lots of <0.5kb busy belongs to ThreadB> and then ….

What we got now is a hole-y memory map, some blocks are free, the others are busy. Now if we want to allocate another 0.5kb, no problem, we can use the first free block, it will match perfectly, so everybody’s happy so far. As long as the block we want to allocate in such a memory state is smaller or equal to 0.5kb, we will get them for sure, right?
But what happens if we ask to allocate a block of 10kb, ha ah, now we have to scan the whole memory to find a free chunk whose size is bigger than 10kb, right? Can you see what’s wrong here?
What really is wrong is that you have a total free amount of X kbytes, which surely can contain 10kb, but they are all chunked, so the memory allocator has to find the next block which can satisfy such a request and it so happens that all these free kbytes are useless to us.
In short, this example is a classic memory fragmentation scenario. To check how bad a memory fragmentation is, you have to measure it by the total free memory, and largest free block size. This way, if you have 2 megs that are fragmented in tiny chunks over 200mb, and the biggest free block is 2 megs – it means you can’t even allocate a 3mb block while the total free size is 198mb. Memory is cruel.

This brings me to say that each memory-consumer has a specific behavior. A memory-consumer like Thread A, that was doing some initialization job and then got down, and freeing all its blocks; is just an init phase memory user.
A memory-consumer like Thread B, that still works after Thread A is done and uses its allocated memory, is an application life-time memory user.
Basically each memory-consumer can be classified into some behavior! Once you mix between those behaviors on the same heap, you’re sooner or later screwed.

The rule of thumb I would suggest you should follow is:
You should distinguish between each memory-consumer type, if you know something consumes memory and it really caches some objects, make sure it works as early as possible in your application. And only then allocate your application life-time objects. This causes its memory to be allocated first and not cause fragmentations to other consumers with different behaviors.

If you know you have to cache some objects while other application life-time objects have to be allocated from the same heap at the same time(!), this is where you should see the red light and create another heap for each behavior.
A practical example is when you open some application that has a workspace, now while you open a workspace in the first time, the application is designed to init its UI stuff only at that moment. Therefore while you’re loading and parsing the data from the file into memory, you’re also setting up the UI and do lots of allocations, the UI is going to stay there until you shutdown the application, and the data itself is going to be de-allocated once you close the workspace. This will create fragmentations if you don’t split the heaps. Can you see why? Think of loading and unloading workspaces and what happens to the memory. Don’t forget that each time you open a workspace, the memory chunks that need to get allocated have a different size. And that the UI still uses its memory which is probably fragmented, so it only gets worse to some extent. Another alternative would be to make sure the UI allocations happen first, no matter what. And only then you can allocate and de-allocate lots of memory blocks without worrying for fragmentations.
Even if you got the UI allocations done first, then there still might be a problem. Suppose you load the file’s data to memory and some logic works in a way that some of the data stays cached for the application life-time even after the workspace is going down. This is a big no no, it will leave you with fragmentations again, hence you should split your heap again.

That’s it mostly, I hope I managed to make it clear.
I didn’t want to talk about the implementation of the heaps, whether they are (LFH) low-fragmentation-heap or whatever, if you mix your memory-consumers, nothing will save you.

One note to mention, that if your cached-memory leaks, for whatever reason, with the new strategy and a standalone heap, you won’t hurt anybody with your leaks, and it will even be easier to track leaks per heap rather than dozens of DLLs on the same global heap. And if you already have a big application with a jungle of one heap and various memory-consumers types, there are some techniques to save the day…

Deleting a New

Sunday, October 24th, 2010

Recently I’ve been working on some software to find memory leaks and fix its fragmentation too. I wanted to start with some example of a leak. And next time I will talk about the fragmentations also.

The leak happens when you allocate objects with new [] and delete (scalar) it. If you’re a c++ savvy, you might say “WTF, of course it leaks”. But I want to be honest for a moment and say that except from not calling the destructors of the instances in the array, I really did not except it to leak. After all, it’s a simple memory pointer we are talking about. But I was wrong and thought it’s worth a post to share this with you.

Suppose we got the following code:

class A {
 public:
  A() { }
  ~A() { }
 private:
  int x, y;
};

...
A * a = new A[3];
...
delete a; // BUGBUG 

What really happens in the code, how does the constructors and destructors get called? How does the compiler know whether to destroy a single objects, or how many objects to destroy when you’re finished working with the array ?

Well, first of all, I have to warn you that some of it is implementation specific (per compiler). I’m going to focus on Visual Studio though.

The first thing to understand is that the compiler knows which object to construct and destroy. All this information is available in compilation time (in our case since it’s all static). But if the object was dynamic, the compiler would have called the destructor dynamically, but I don’t care about that case now…

So allocating memory for the objects, it will eventually do malloc(3 * sizeof(A)) and return a pointer to assign in the variable ‘a’. Here’s the catch, the compiler can’t know how many destructors to call when it wants to delete the array, right? It has to bookkeep the count somehow. But Where??
Therefore the call to the memory allocation has more to it. The way MSVC does it is as following (some pseudo code):

int* tmpA = (int*)malloc(sizeof(A) * 3 + sizeof(int)); // Notice the extra space for the integer!
*tmpA = 3; // This is where it stores the count of instances! ta da
A* a = (A*)(tmpA + 1); // Skip that integer, really assigns the pointer allocated + 4 bytes in 'a'.

Now all it has to do is calling the constructors on the array for each entry, easy.
When you work with ‘a’ the implementation is hidden to you, eventually you get a pointer you should only use for accessing the array and nothing else.
At the end you’re supposed to delete the array. The right way to delete this array is to do ‘delete []a;’. The compiler then understands you ask to delete a number of instances rather than a single instance. So it will loop on all the entries and call a destructor for each instance, and at last free the whole memory block. One of the questions I asked in the beginning is how would the compiler know how many objects to destroy? We already know the answer by now. Simple, it stored the count before the pointer you hold.

So deleting the array in a correct manner (and reading the counter) would be as easy as:

int* tmpA = (int*)a - 1; // Notice we take the pointer the compiler handed to you, and get the 'count' from it.
for (int i = 0; i < *tmpA; i++) a[i].~a();
free (tmpA); // Notice we call free() with the original pointer that got allocated! 

So far so good, right? But the leak happens if you call a scalar delete on the pointer you get from allocating a new array. And that's the problem even if you have an array of primitive types (like integers, chars, etc) that don't require to call a destructor you still leak memory. Why's that?
Since the new array, as we saw in this implementation returns you a pointer, which does not point to the beginning of the allocated block. And then eventually you call delete upon it, will make the memory manager not find the start of the allocated block (cause you feed it with an offset into the allocated block) and then it has a few options. Either ignore your call, and leak that block. Crash your application or give you a notification, aka Debug mode. Or maybe in extreme cases cause a security breach...

In some forum I read that there are many unexpected behaviors in our case, one of them made me laugh so hard, I feel I need to share it with you:
"A* a = new A[3]; delete a; Might get the first object destroyed and released, but keep the rest in memory".

Well it doesn't take a genius to understand that the compiler prefers to bulk allocate all objects in the same block...and yet, funny.
The point the guy tries to make is that you cannot know what the compiler implementation is, as weird as it might be, don't ever rely on it. And I totally agree.

So in our case a leak happens in the following code:
(wrong:)

int*a = new int[100];
...
delete a;

The point is that when you new[], you should must call a corresponding delete [].
Except from the need to make your code readable and correct, it won't be broken, and never trust the compiler, just code right in the first place.

And now you can imagine what happens if you alloc a single object and tries to delete[] it. Not healthy, to say the least.