Database Isolation Levels: Read Committed and Snapshot Isolation
Read Committed
Read Committed level has 2 safety guarantees:
1 no dirty reads
When reading from the database, you will only see data that has been committed
2 no dirty writes
When writing to the database, you will only overwrite data that has been committed. This is useful when a transaction needs to overwrite two pieces of data simultaneously.
(the explanation above reads strange because this is essentially what every database do and ppl take it for granted. Imagine dirty writes as you write to data without holding any locks and freely overwrite anything)
To implement no dirty writes, use row-level locks on objects/rows/records that the transaction wants to overwrite.
To implement no dirty reads, it’s possible to use the same row-level locks but it’s not efficient to have read-only transaction hold up for a long time waiting for a long-running write transaction. In this case, before a write transaction acquire lock to some row, it’ll remember the value of that row and return it to whoever wants to read that row, until the write transaction has committed/aborted.
Repeatable Read/Snapshot Isolation
Read Committed is usually good enough for a lot of use cases but there can still be concurrency bugs with it. For example:
In the figure above, Alice has two accounts each with $500 initially. Alices tries to look at her two account balance while transferring $100 from account1 to account2.
After Alice sees account1 balance = 500, the transfer operation starts and commit before Alices sees account2 balance($400). So, it appears as if Alice account has only $900 in total. You could argue that Alice will be able to see consistent results after a few seconds, but in use cases such as running analytics/backup process on database, it is unacceptable to see such transient inconsistent states in the database. This phenomenon is called non-repeatable read or read skew, ideally, we want Alice’s read transaction to see the same, consistent snapshot of the database throughout the entire transaction. This is why we need the Snapshot Isolation level.
How do we implement Snapshot Isolation level?
In this level, each transaction has an increasing, unique transaction id(txid)
In this same example, for each object/row in the database, whenever there’s a write operation, we create a new version of that object(MVCC, multi version concurrency control). Following the visibility rules below, we could achieve Snapshot Isolation level:
In my understanding, these visibility rules make sure that a transaction can only see write that have started and committed before it.
What if multiple versions of an object is visible for certain transaction according to the visibility rules?? the book DDIA didn’t mention this but i think it makes sense to take the one with newest txid.
References:
DDIA book