Posts Tagged ‘Memory Management’

Memory Management

Tuesday, December 7th, 2010

In this post I decided to write about a few things you have to keep on mind while writing or designing a big application regarding memory management, I will also try to give you a new way for looking at memory allocations.

The problem is that one day you wake up and you have either memory leaks or memory fragmentation, you don’t know which or why. To eliminate the possibility of memory fragmentation, you will have to rule out memory leaks first.

Trying to find a solution to such a case you find that your dozens of DLLs work (normally but-)sharing the same CRT, so if you pass an allocated object from one DLL to be used and freed by another DLL, you’re screwed. Also, you can’t separate the CRTs, etc. The root of the problem is because each CRT has its own heap, thus if you statically link a DLL with its own CRT, it will have its own heap. If all DLLs use the same CRT, they share the same heap, or even the application’s global heap. Why is it bad? Because then if you have memory leaks in one DLL, it will dirty the whole global heap. I will talk about it soon. An advantage for separating heaps is that each thread that belongs to some DLL can access its own heap faster, without congestion on the global heap lock which all DLLs/threads need.
The idea is to separate the applications into logical groups of components, and each logical group will use its own heap, so you can pass pointers around. This is really hard to do if you have existing code and never thought about this issue. So you’re saying, “oh yeah, but if I don’t have leaks, and my code is doing alright, I don’t need it anyway”. Well I wouldn’t be so quick about thinking that. Keep on reading.

In C++ for instance, overriding new and delete for your class, is really useful, because even if it gets used in a different component, C++ is responsible to bring the “deletion” to the appropriate heap.

Now I wanna go back a bit to some background on how memory allocators work in general, which is usually in a naive fashion. It really means that when you ask for a block of memory, the memory manager (malloc for you) allocates some chunk and returns a pointer to you. Once another request for memory is being done, the next memory address is getting used to supply the memory request… (suppose the first chunk requested is still busy), and so on. What you can see here, is that the memory allocator works linearly (allocates a block after the other) and chronologically (depends on the time you asked for the allocation).

Let me give you some example so you understand what I’m talking about, in this example there are two threads: Thread A and Thread B.
Thread A allocates chunks of 0.5kb in a loop, and at some point shuts down after it’s done with its task.
Thread B allocates chunks of 0.5kb in a loop, simultaneously, and keeps on working.
And suppose they are using the same CRT, thus the same heap.

What we will see from the memory manager perspective is something like:
<0.5kb busy> <0.5kb busy> <0.5kb busy>…………………….and then something like ….

Each chunk might belong to either Thread A or Thread B. Don’t be picky about it, it’s really up to so many environmental issues (processors, locks, etc).

Suppose Thread A is finished with its memory because it was only up to do some initialization job(!) and decides to free everything it has allocated. The memory picture would be something like this now:
<0.5kb free><0.5kb busy><0.5kb free><0.5kb busy> …………………lots of <0.5kb busy belongs to ThreadB> and then ….

What we got now is a hole-y memory map, some blocks are free, the others are busy. Now if we want to allocate another 0.5kb, no problem, we can use the first free block, it will match perfectly, so everybody’s happy so far. As long as the block we want to allocate in such a memory state is smaller or equal to 0.5kb, we will get them for sure, right?
But what happens if we ask to allocate a block of 10kb, ha ah, now we have to scan the whole memory to find a free chunk whose size is bigger than 10kb, right? Can you see what’s wrong here?
What really is wrong is that you have a total free amount of X kbytes, which surely can contain 10kb, but they are all chunked, so the memory allocator has to find the next block which can satisfy such a request and it so happens that all these free kbytes are useless to us.
In short, this example is a classic memory fragmentation scenario. To check how bad a memory fragmentation is, you have to measure it by the total free memory, and largest free block size. This way, if you have 2 megs that are fragmented in tiny chunks over 200mb, and the biggest free block is 2 megs – it means you can’t even allocate a 3mb block while the total free size is 198mb. Memory is cruel.

This brings me to say that each memory-consumer has a specific behavior. A memory-consumer like Thread A, that was doing some initialization job and then got down, and freeing all its blocks; is just an init phase memory user.
A memory-consumer like Thread B, that still works after Thread A is done and uses its allocated memory, is an application life-time memory user.
Basically each memory-consumer can be classified into some behavior! Once you mix between those behaviors on the same heap, you’re sooner or later screwed.

The rule of thumb I would suggest you should follow is:
You should distinguish between each memory-consumer type, if you know something consumes memory and it really caches some objects, make sure it works as early as possible in your application. And only then allocate your application life-time objects. This causes its memory to be allocated first and not cause fragmentations to other consumers with different behaviors.

If you know you have to cache some objects while other application life-time objects have to be allocated from the same heap at the same time(!), this is where you should see the red light and create another heap for each behavior.
A practical example is when you open some application that has a workspace, now while you open a workspace in the first time, the application is designed to init its UI stuff only at that moment. Therefore while you’re loading and parsing the data from the file into memory, you’re also setting up the UI and do lots of allocations, the UI is going to stay there until you shutdown the application, and the data itself is going to be de-allocated once you close the workspace. This will create fragmentations if you don’t split the heaps. Can you see why? Think of loading and unloading workspaces and what happens to the memory. Don’t forget that each time you open a workspace, the memory chunks that need to get allocated have a different size. And that the UI still uses its memory which is probably fragmented, so it only gets worse to some extent. Another alternative would be to make sure the UI allocations happen first, no matter what. And only then you can allocate and de-allocate lots of memory blocks without worrying for fragmentations.
Even if you got the UI allocations done first, then there still might be a problem. Suppose you load the file’s data to memory and some logic works in a way that some of the data stays cached for the application life-time even after the workspace is going down. This is a big no no, it will leave you with fragmentations again, hence you should split your heap again.

That’s it mostly, I hope I managed to make it clear.
I didn’t want to talk about the implementation of the heaps, whether they are (LFH) low-fragmentation-heap or whatever, if you mix your memory-consumers, nothing will save you.

One note to mention, that if your cached-memory leaks, for whatever reason, with the new strategy and a standalone heap, you won’t hurt anybody with your leaks, and it will even be easier to track leaks per heap rather than dozens of DLLs on the same global heap. And if you already have a big application with a jungle of one heap and various memory-consumers types, there are some techniques to save the day…