Data Structures That Power Your Database

LSM-Trees (Log-Structured Merge-Trees)

LSM-Trees turn random writes into sequential writes by using a MemTable (in-memory tree) and SSTables (Sorted String Tables on disk).

Performance Optimizations

Bloom Filters: A memory-efficient way to approximate the contents of a set. It saves unnecessary disk reads by telling you if a key does not exist in the database.
Compaction Strategies:
- Size-Tiered: Newer and smaller SSTables are merged into older and larger ones.
- Leveled: The key range is split into smaller SSTables, and older data is moved into separate "levels."

B-Trees

The standard index implementation in most relational databases. They break the database into fixed-size pages (usually 4KB) and overwrite pages in-place.

Reliability

Write-Ahead Log (WAL): An append-only file where every B-tree modification is written before it is applied to the pages. This allows recovery after a crash.
Copy-on-Write: Instead of overwriting, write a modified page to a new location and update parent pointers.

Knowledge Check

What is the primary purpose of a Bloom Filter in an LSM-Tree?

To sort keys in memory.

To avoid unnecessary disk reads for non-existent keys.

To compress SSTable files.