Recursive Data-Structures

For the sake of simplicity a linked list structure is a recursive one, it points to itself.

struct Node {
 int data;


 Node* next;
};

This case is easy to follow because no matter what, you always end up with accessing a Node structure.

The problem arises when you have recursive data-structures and terminals. I treat a terminal as a data structure which stops the recursion of the recursive data-structure. Unlike the above linked list structure, which stops when the ‘next’ pointer is NULL, the ‘next’ in the complex data-structure might point to either itself or a terminal.

 struct TerminalNode {

 int data;

};

 struct Node {

 int data;

 Node* next;

};

Now you might find yourselves asking why would one make the ‘next’ point to a TerminalNode, but that’s a common data-structure layout. For example, an expression tree can represent arithmetic operations in a recursive way, but the values are the terminals – Add(5, Sub(3, 2)).

In a structured layout, it looks something like this (where children is an array of pointers to ‘next’ nodes):

0: ExpressionNode: (type) Add, (children) 1, 2

1: ExpressionNode: (type) Value 5, (children) NULL

2: ExpressionNode: (type) Sub, (children) 3, 4

3: ExpressionNode: (type) Value 3, (children) NULL

4: ExpressionNode: (type) Value 2, (children) NULL

So you can see that in this case an ExpressionNode of ‘Value’ type is a TerminalNode whereas the ExpressionNode is a recursive one.

This type of data-structure is not rare: To be honest, I have this situation in both diStorm64 and diStorm3. In diStorm64 the TerminalNode is actually the Instruction-Info structure which describes the instruction, its mnemonic, its operand types, etc… and in diStorm3, it’s the above ExpressionNode thingy, but much more complex, of course…

Everything is good and nice, but finally I can get to my point in this topic – How do you distinguish between a recursive node and a terminal node? That is, how do you know where the “tree” terminals?

So there are many solutions to this one, I chose to cover two of them. The simplest one is to change your structures by adding another flag which states, “I’m a terminal” or “I’m a recursive node”. But then, it means you have to pay at least another extra byte for both structures. And then examining the flag only after you accessed its structure. Not mentioning, it takes more memory space.

The other solution, which seems to be more common, is to set the LSB (least-significant-bit) in the ‘next’ pointer, thus, if the bit is set, you know you reached a terminal node, even before you accessed it. And seeing the LSB is cleared, you know you continue normally with a recursive node. This is also good for caching, because the structures are compact.

But what happens, if you have 4 types of terminal structures? Well, in a 32 bits environment all structures are (usually/99%) aligned to a 4 bytes boundary, so it means you can easily use the two LSBs, which represent 4 different terminals…

So accessing a single terminal structure is as simple as:

if (pNode->next & 1) {

 pTerminal = (TerminalNode*)(pNode & -1);

 // Use pTerminal here…

} else {

 pNode &= -1;

 // Continue with another node…

}

You should have noticed by now that a terminal structure doesn’t have a ‘next’ pointer. So a NULL is out of question for terminals, but it’s still OK for recursive nodes, which means “dead-end”, rather than a terminal to mark something bad happened…

2 Responses to “Recursive Data-Structures”

  1. lorg says:

    You should note that you don’t always need to signal a terminal. Sometimes it is just enough to have a node with no children.
    I think that the solution doesn’t have to be so tightly coupled with the underlying representation, (the LSB\MSB stuff) although I know you like the low-level.

    My preference is a having a single class, without a special terminal class – the typecasting isn’t really a good habit :)
    A possible alternative might be to have the terminal class and the regular class both part of the same class hierarchy (for example have the terminal inherit from the regular class) and then use dynamic_cast, but I don’t think thats a pretty solution either…

  2. arkon says:

    Of course the solution doesn’t have to be tightly coupled with the underlying implementation. But speaking with more people who have this kind of data structures, I understood that sometimes it is performance critical and you want to use as less as possible memory space for the structures.

    As for casting you can use unions as well, but they are problematic with statis initializations, unless you can use Designated Initializers (that is, specifying the exact field to init).

    Anyways, low level or not, I don’t have to tell you what way _we_ use in diStorm3 ;) (ohh, that’s lorg’s latter idea)

Leave a Reply