Archive for June, 2007

Recursive Data-Structures

Saturday, June 2nd, 2007

For the sake of simplicity a linked list structure is a recursive one, it points to itself.

struct Node {
 int data;


 Node* next;
};

This case is easy to follow because no matter what, you always end up with accessing a Node structure.

The problem arises when you have recursive data-structures and terminals. I treat a terminal as a data structure which stops the recursion of the recursive data-structure. Unlike the above linked list structure, which stops when the ‘next’ pointer is NULL, the ‘next’ in the complex data-structure might point to either itself or a terminal.

 struct TerminalNode {

 int data;

};

 struct Node {

 int data;

 Node* next;

};

Now you might find yourselves asking why would one make the ‘next’ point to a TerminalNode, but that’s a common data-structure layout. For example, an expression tree can represent arithmetic operations in a recursive way, but the values are the terminals – Add(5, Sub(3, 2)).

In a structured layout, it looks something like this (where children is an array of pointers to ‘next’ nodes):

0: ExpressionNode: (type) Add, (children) 1, 2

1: ExpressionNode: (type) Value 5, (children) NULL

2: ExpressionNode: (type) Sub, (children) 3, 4

3: ExpressionNode: (type) Value 3, (children) NULL

4: ExpressionNode: (type) Value 2, (children) NULL

So you can see that in this case an ExpressionNode of ‘Value’ type is a TerminalNode whereas the ExpressionNode is a recursive one.

This type of data-structure is not rare: To be honest, I have this situation in both diStorm64 and diStorm3. In diStorm64 the TerminalNode is actually the Instruction-Info structure which describes the instruction, its mnemonic, its operand types, etc… and in diStorm3, it’s the above ExpressionNode thingy, but much more complex, of course…

Everything is good and nice, but finally I can get to my point in this topic – How do you distinguish between a recursive node and a terminal node? That is, how do you know where the “tree” terminals?

So there are many solutions to this one, I chose to cover two of them. The simplest one is to change your structures by adding another flag which states, “I’m a terminal” or “I’m a recursive node”. But then, it means you have to pay at least another extra byte for both structures. And then examining the flag only after you accessed its structure. Not mentioning, it takes more memory space.

The other solution, which seems to be more common, is to set the LSB (least-significant-bit) in the ‘next’ pointer, thus, if the bit is set, you know you reached a terminal node, even before you accessed it. And seeing the LSB is cleared, you know you continue normally with a recursive node. This is also good for caching, because the structures are compact.

But what happens, if you have 4 types of terminal structures? Well, in a 32 bits environment all structures are (usually/99%) aligned to a 4 bytes boundary, so it means you can easily use the two LSBs, which represent 4 different terminals…

So accessing a single terminal structure is as simple as:

if (pNode->next & 1) {

 pTerminal = (TerminalNode*)(pNode & -1);

 // Use pTerminal here…

} else {

 pNode &= -1;

 // Continue with another node…

}

You should have noticed by now that a terminal structure doesn’t have a ‘next’ pointer. So a NULL is out of question for terminals, but it’s still OK for recursive nodes, which means “dead-end”, rather than a terminal to mark something bad happened…