Unbundling Databases
The "Database Inside-Out" philosophy suggests that we should treat a large-scale data system like a single database, but with its components "unbundled" and distributed.
The Components
- The Log (Kafka): Acts as the "Commit Log." All writes are ordered here first.
- The Processors (Flink/Spark): Act as the "Query Engine" or "Materialized View" maintainers. They consume the log and update derived stores.
- The Views (Elasticsearch, Redis, Postgres): Act as the "Indexes" optimized for specific queries (search, key-value, relational).
Why Unbundle?
- Flexibility: You can use the best tool for each query type.
- Scalability: You can scale the ingestion (log), processing, and serving layers independently.
- Correctness: By following the order in the log, you can ensure that all derived views stay consistent with each other.
Knowledge Check
In the 'Database Inside-Out' model, what takes the place of the traditional database commit log?
A SQL Table
A Message Broker like Apache Kafka
A JSON file on S3