Unbundling Databases

Composing storage technologies.

Unbundling Databases

The "Database Inside-Out" philosophy suggests that we should treat a large-scale data system like a single database, but with its components "unbundled" and distributed.

The Components

  • The Log (Kafka): Acts as the "Commit Log." All writes are ordered here first.
  • The Processors (Flink/Spark): Act as the "Query Engine" or "Materialized View" maintainers. They consume the log and update derived stores.
  • The Views (Elasticsearch, Redis, Postgres): Act as the "Indexes" optimized for specific queries (search, key-value, relational).

Why Unbundle?

  1. Flexibility: You can use the best tool for each query type.
  2. Scalability: You can scale the ingestion (log), processing, and serving layers independently.
  3. Correctness: By following the order in the log, you can ensure that all derived views stay consistent with each other.

Knowledge Check

In the 'Database Inside-Out' model, what takes the place of the traditional database commit log?

A SQL Table
A Message Broker like Apache Kafka
A JSON file on S3