HTree


An HTree is a specialized tree data structure for directory indexing, similar to a B-tree. They are constant depth of either one or two levels, have a high fanout factor, use a hash of the filename, and do not require balancing. The HTree algorithm is distinguished from standard B-tree methods by its treatment of hash collisions, which may overflow across multiple leaf and index blocks. HTree indexes are used in the ext3 and ext4 Linux filesystems, and were incorporated into the Linux kernel around 2.5.40. HTree indexing improved the scalability of Linux ext2 based filesystems from a practical limit of a few thousand files, into the range of tens of millions of files per directory.

History

The HTree index data structure and algorithm were developed by Daniel Phillips in 2000 and implemented for the ext2 filesystem in February 2001. A port to the ext3 filesystem by Christopher Li and Andrew Morton in 2002 during the 2.5 kernel series added journal based crash consistency. With minor improvements, HTree continues to be used in ext4 in the Linux 3.x.x kernel series.

Use

PHTree is a derivation intended as a successor. It fixes all the known issues with HTree except for write multiplication. It is used in the Tux3 filesystem.