Thinking About Data Systems
Many applications today are data-intensive, as opposed to compute-intensive. Raw CPU power is rarely a limiting factor for these applications—bigger problems are usually the amount of data, the complexity of data, and the speed at which it is changing.
A data-intensive application is typically built from standard building blocks that provide commonly needed functionality. For example:
- Databases: Store data so that they, or another application, can find it again later.
- Caches: Remember the result of an expensive operation, to speed up reads.
- Search Indexes: Allow users to search data by keyword or filter it in various ways.
- Stream Processing: Send a message to another process, to be handled asynchronously.
- Batch Processing: Periodically crunch a large amount of accumulated data.
Why "Data Systems"?
We typically think of databases, queues, caches, etc., as being very different categories of tools. However, the boundaries are becoming blurred.
- Datastores like Redis are also used as message queues.
- Message queues like Kafka have database-like durability guarantees.
When you combine several tools to provide a service, the service's interface or API usually hides those implementation details from clients. You have essentially created a new, special-purpose data system from smaller, general-purpose components.
The Three Core Concerns
In this course, we focus on three concerns that are important in most software systems:
- Reliability: The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error).
- Scalability: As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth.
- Maintainability: Over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively.