A beginner’s guide to Dirty Read anomaly
Database transactions are defined by the four properties known as ACID. The Isolation Level (I in ACID) allows you to trade off data integrity for performance.
The weaker the isolation level, the more anomalies can occur, and in this article, we are going to describe the Dirty Read phenomenon.
Reading uncommitted data
As previously mentioned, all database changes are applied to the actual data structures (memory buffers, data blocks, indexes). A dirty read happens when a transaction is allowed to read the uncommitted changes of some other concurrent transaction.
Taking a business decision on a value that has not been committed is risky because uncommitted changes might get rolled back.
In the diagram above, the flow of statements goes like this:
- Alice and Bob start two database transactions.
- Alice modifies the title of a given
- Bob reads the uncommitted
- If Alice commits her transaction, everything is fine. But if Alice rolls back, then Bob will see a record version which no longer exists in the database transaction log.
This anomaly is only permitted by the Read Uncommitted isolation level, and, because of the impact on data integrity, most database systems offer a higher default isolation level.
How the database prevents it
To prevent dirty reads, the database engine must hide uncommitted changes from all other concurrent transactions. Each transaction is allowed to see its own changes because otherwise the read-your-own-writes consistency guarantee is compromised.
If the underlying database uses 2PL (Two-Phase Locking), the uncommitted rows are protected by write locks which prevent other concurrent transactions from reading these records until they are committed.
When the underlying database uses MVCC (Multi-Version Concurrency Control), the database engine can use the undo log which already captures the previous version of every uncommitted record, to restore the previous value in other concurrent transaction queries. Because this mechanism is used by all other isolation levels (Read Committed, Repeatable Read, Serializable), most database systems optimize the before image restoring process (lowering its overhead on the overall application performance).
Normally, the Read Uncommitted isolation level is rarely needed (non-strict reporting queries where dirty reads are acceptable), so Read Committed is usually the lowest practical isolation level.
However, if you have a transaction that runs a batch processing task and you want to know its progress, you could start a new transaction in the Read Uncommitted isolation level and due to Dirty Reads, you will see how much progress the other transaction has done.
Download free ebook sample
If you subscribe to my newsletter, you'll get:
- A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
- 3 chapters from my book, High-Performance Java Persistence,
- a 10% discount coupon for my book.