B-Trees are balanced, multi-way search trees designed to optimize reads on block-based storage systems. Compared to Log Structured Merge Tree, BTrees do not delay and batch writes, resulting in higher update cost. However, while reading in LSM trees might require multiple SSTables to be scanned unless compaction has occurred, in B-trees this is never the case since the tree is always updated and balanced. With B+ tree, range queries are particularly efficient

Info

“Multi-way search tree” means that each node in the tree can have more than two children, unlike a binary tree where each node is restricted to at most two children (left and right).

B-Trees were introduced in 1972 by Rudolf Bayer and Edward McCreight in their paper “Organization and Maintenance of Large Ordered Indexes”. The goal was to create a tree structure optimized for secondary storage (e.g., magnetic disks) by minimizing the number of I/O operations required to access large datasets.Here’s the updated note based on your requirements, focusing on the structure of B-Trees, the emergence and adoption of B+ Trees, and a section on B Trees* and B^e Trees.

Original B-Trees

In the original design, internal nodes of the tree store both keys and associated values, enabling searches to terminate at any level of the tree, not just at the leaf nodes. Internal nodes also contain child pointers, which partition the key space into ranges. If a key is found in an internal node during a search, the value is returned directly, without needing to traverse further.

Tip

Since nodes have a fixed size (often aligned with the size of a disk block), storing values within each node reduce the number of keys significantly. Example:

  • With no values, 4kb could become 4kb/8 = 512 keys
  • With values (i.e. 24 bytes) stored, each node can only fit 4kb/32 = 128 values

The consequence of this limitation was an increased tree height

B+ Tree Optimization The B+ Tree emerged as a refinement of the original B-Tree, optimizing it for range queries and large datasets. In a B+ Tree, internal nodes act purely as routing points and store only keys. All key-value pairs are stored exclusively in the leaf nodes, which are linked together sequentially. This separation of data and structure increases the fan-out of internal nodes, reducing the tree’s height and improving storage efficiency.

By linking leaf nodes, the B+ Tree facilitates fast range queries, allowing sequential scans of key-value pairs without backtracking through the internal structure. These changes made B+ Trees the preferred choice for modern databases and filesystems. By the late 1970s, B+ Trees became the standard for most database and filesystem designs. Systems that adopted B+ Trees include:

  • Relational Databases: MySQL, PostgreSQL, Oracle, and DB2 all use B+ Trees for their indexing structures.
  • Filesystems: NTFS, HFS+, and ext4 rely on B+ Trees to manage metadata and directory structures.
  • Embedded Databases: SQLite uses B+ Trees for its indexing and storage layers.

B* Trees and B^e Trees

B* Trees

B* Trees are a variant of B-Trees designed to improve space utilization. Instead of splitting a node immediately when it overflows, B* Trees first attempt to redistribute keys among sibling nodes and only when redistribution is not possible does the tree perform a split.

B* Trees are rarely used in modern systems. Their added complexity offers limited practical benefits compared to B+ Trees. They are sometimes seen in embedded systems or environments where disk space is at a premium.


B^e Trees

B^e Trees are optimized for systems with high write latency, such as distributed databases or networked storage. They minimize disk writes during updates by deferring changes and batching them .

B^e Trees are niche and are primarily used in distributed systems like Ceph, a highly scalable distributed storage system. Their focus on minimizing write latency makes them suitable for environments where writes dominate workloads and efficiency is critical.