Database Replication
Summary of database replication chapter in book DDIA.
Replication means keeping a copy of data on multiple machines connected by network.
Why Replication?
1 For high availability, if one replica goes down, another replica will keep serving requests
2 Put replicas close to user for lower latency
3 Scale read requests, more replicas -> more read requests we can serve per second
How Replication works
If data doesn’t change at all, replication would be easy. The challenges lie in handling changes of data. There’re multiple algorithms for replication: Single Leader, Multiple Leader, Leaderless. And there are two modes of replication: Synchronous, Asynchronous. Not gonna talk about Multiple Leader and Leaderless here for simplicity of discussion.
Single Leader Replication
Key to replication is that every replica process the same write request, otherwise replicas would not be same. Single leader application achieves this as follows:
1 one of the replicas is elected as a leader, which is the only node that can process write requests, by writing into its local storage
2 leader sends logs of write requests to all its followers, which followers need in order to catch up with the leader. The followers receive the logs and apply each of them to their storage in the same order as they’re written in the leader.
3 users can either read from leader or followers
Synchronous Versus Asynchronous Replication
An important detail about database replication is whether it happens synchronously or asynchronously.
If it’s synchronous, the leader will only return “ok” to the user after data has finished replicating. If it’s asynchronous, leader will return “ok” asap.
However, most database does something in the middle, i.e some followers are replicated synchronously for reliability, and the rest are replicated asynchronously. Imagine if every write requests need to be replicated syncly, it’ll be a pain.
Disadvantage of synchronous replication: if follower doesn’t respond, your write requests are stuck, but this is usually mitigated by making an async follower synchronous in practice. Advantage of it: reliability.
Setting up new follower
To add a new follower, take a snapshot of the leader’s data, when follower starts up, ask leader to transfer whatever new data it has.
Handling Node Outages
Follower outage: when recover, ask leader for new data to catch up(use something called a log sequence number)
Leader outage: re-elect a new leader, take a snapshot of the data from most up-to-date follower
Implementation of Replication Logs
1 sql statement based, i.e execute the same sql statement on leader and follower
drawback: sql statement needs to be deterministic
2 write ahead log based, i.e record write operation in logs and ask followers to apply these logs. This is tied to implementation of storage engine, so no backward compatibility between leader follower, no gradual rollout.
3 logical row based, similar to write ahead log, but each log describes which row, column should be updated/deleted, this hides implementation detail and allows backward compatibility
Problems with Replication Lag
With asynchronous replication, the replication lag can lead to many problems
Read your own writes
What if you make a write request, and before the data is replicated to all followers, you read it from some lagged follower. To the user it looks like what they submitted is lost. In this case we could track if user has recently made a write request, if so handle his/her read requests with leader.
Monotonic reads
User may read data moving backward in time(imagine reading from a more recent replica, and then from an outdated replica)
To resolve this problem, make sure that each user always makes their reads from the same replica
Consistent prefix reads
Some writes have casual relationship, such as when you post a comment under Youtube’s comment section and someone else replies to your message. I’d be strange to see the reply earlier than your original comment. This happens because the comment and the reply lives in different shards, and your comment’s shard is lagging. Solution: arrange related data in same shard.
The solutions to above problems caused by replication lags are all in application level, in database-level, the solution is transaction, probably distributed transaction, but performance and availability is not as good.