Leaders and Followers
Replication means keeping a copy of the same data on multiple machines. The main reasons for this are High Availability (tolerating faults), Latency (placing data closer to users), and Scalability (handling more read load).
The most common way to handle replication is Leader-Based Replication (also known as Master-Slave).
How it works
- One replica is designated the Leader. All writes must go to the leader.
- The other replicas are Followers (read replicas). Whenever the leader writes new data, it sends the change (replication log) to the followers.
- Followers apply the write in the same order as the leader.
- Reads can be served by the leader or any follower.
Implementation of Replication Logs
How do the changes travel from leader to follower?
- Statement-based: Leader logs every write request (e.g.,
INSERT...). Problem: Non-deterministic functions likeNOW()orRAND()break replicas. - Write-Ahead Log (WAL) Shipping: Send the raw disk blocks changed. Problem: Very low-level; tied to the storage engine version.
- Logical (Row-based) Log: Send a description of which rows changed (e.g., "Row 5 updated: name='Bob'"). This is the most flexible and allows different versions of the DB to coexist.
Handling Outages & Failover
- Follower Failure: "Catch-up recovery" using its local log.
- Leader Failure (Failover): Promoting a follower to leader.
- The Risk of Split Brain: Two nodes both think they are the leader. They both accept writes, and data diverges. Usually, the old leader is killed (fencing/stoning).
- Data Loss: If replication was asynchronous, the new leader might be missing the last few writes from the old leader.
Knowledge Check
Why is 'Statement-based' replication risky?
It is too slow.
Non-deterministic functions (like NOW()) can cause data to diverge.
It requires all nodes to have the same hardware.