T-tree


In computer science a T-tree is a type of binary tree
data structure that is used by main-memory databases, such as
Datablitz, EXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite.
A T-tree is a balanced index tree data structure optimized for cases
where both the index and the actual data are fully kept in memory,
just as a B-tree is an index structure optimized for storage on block
oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits
of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which
is common to them.
T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields.
The 'T' in T-tree refers to the shape of the node data structures
in the original paper which first described this type of index.

Node structures

A T-tree node usually consists of pointers to the parent node, the left and right child node,
an ordered array of data pointers and some extra control data. Nodes with two subtrees
are called internal nodes, nodes without subtrees are called leaf nodes
and nodes with only one subtree are named half-leaf nodes.
A node is called the bounding node for a value if the value is between the node's current minimum and maximum value, inclusively.
For each internal node, leaf or half leaf nodes exist that contain the predecessor of its smallest
data value and one that contains the successor of its largest
data value. Leaf and half-leaf nodes can contain any number of
data elements from one to the maximum size of the data array. Internal nodes keep their occupancy
between predefined minimum and maximum numbers of elements

Algorithms

Search

If a new node was added then the tree might need to be rebalanced, as described below.

Deletion

Now we have to distinguish by node type:
If the node's data array now has less than the minimum number of elements then move the greatest lower bound value of this node to its data value. Proceed with one of the following two steps for the half leaf or leaf node the value was removed from.
If this was the only element in the data array then delete the node. Rebalance the tree if needed.
If the node's data array can be merged with its leaf's data array without overflow then do so and remove the leaf node. Rebalance the tree if needed.

Rotation and balancing

A T-tree is implemented on top of an underlying self-balancing binary search tree.
Specifically, Lehman and Carey's article describes a T-tree balanced like an AVL tree: it becomes out of balance when a node's child trees differ in height by at least two levels.
This can happen after an insertion or deletion of a node.
After an insertion or deletion, the tree is scanned from the leaf to the root.
If an imbalance is found, one tree rotation or pair of rotations is performed, which is guaranteed to balance the whole tree.
When the rotation results in an internal node having fewer than the minimum number of items, items from the node's new child are moved into the internal node.

Performance

Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware.
The main reason seems to be that the traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access.
The traditional T-tree can be modified for improved caching behavior, to create a Cache Sensitive T-Tree which substantially outperforms B+ trees and even modified, cache-sensitive B+ trees.

Other trees