Data Structures That Power Your Database
LSM-Trees (Log-Structured Merge-Trees)
LSM-Trees turn random writes into sequential writes by using a MemTable (in-memory tree) and SSTables (Sorted String Tables on disk).
Performance Optimizations
- Bloom Filters: A memory-efficient way to approximate the contents of a set. It saves unnecessary disk reads by telling you if a key does not exist in the database.
- Compaction Strategies:
- Size-Tiered: Newer and smaller SSTables are merged into older and larger ones.
- Leveled: The key range is split into smaller SSTables, and older data is moved into separate "levels."
B-Trees
The standard index implementation in most relational databases. They break the database into fixed-size pages (usually 4KB) and overwrite pages in-place.
Reliability
- Write-Ahead Log (WAL): An append-only file where every B-tree modification is written before it is applied to the pages. This allows recovery after a crash.
- Copy-on-Write: Instead of overwriting, write a modified page to a new location and update parent pointers.
Knowledge Check
What is the primary purpose of a Bloom Filter in an LSM-Tree?
To sort keys in memory.
To avoid unnecessary disk reads for non-existent keys.
To compress SSTable files.