A beginner’s guide to Dirty Read anomaly

(Last Updated On: May 22, 2018)

Introduction

Database transactions are defined by the four properties known as ACID. The Isolation Level (I in ACID) allows you to trade off data integrity for performance.

The weaker the isolation level, the more anomalies can occur, and in this article, we are going to describe the Dirty Read phenomenon.

Reading uncommitted data

As previously mentioned, all database changes are applied to the actual data structures (memory buffers, data blocks, indexes). A dirty read happens when a transaction is allowed to read the uncommitted changes of some other concurrent transaction.

Taking a business decision on a value that has not been committed is risky because uncommitted changes might get rolled back.

In the diagram above, the flow of statements goes like this:

  1. Alice and Bob start two database transactions.
  2. Alice modifies the title of a given post record.
  3. Bob reads the uncommitted post record.
  4. If Alice commits her transaction, everything is fine. But if Alice rolls back, then Bob will see a record version which no longer exists in the database transaction log.

This anomaly is only permitted by the Read Uncommitted isolation level, and, because of the impact on data integrity, most database systems offer a higher default isolation level.

How the database prevents it

To prevent dirty reads, the database engine must hide uncommitted changes from all other concurrent transactions. Each transaction is allowed to see its own changes because otherwise the read-your-own-writes consistency guarantee is compromised.

If the underlying database uses 2PL (Two-Phase Locking), the uncommitted rows are protected by write locks which prevent other concurrent transactions from reading these records until they are committed.

When the underlying database uses MVCC (Multi-Version Concurrency Control), the database engine can use the undo log which already captures the previous version of every uncommitted record, to restore the previous value in other concurrent transaction queries. Because this mechanism is used by all other isolation levels (Read Committed, Repeatable Read, Serializable), most database systems optimize the before image restoring process (lowering its overhead on the overall application performance).

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Normally, the Read Uncommitted isolation level is rarely needed (non-strict reporting queries where dirty reads are acceptable), so Read Committed is usually the lowest practical isolation level.

However, if you have a transaction that runs a batch processing task and you want to know its progress, you could start a new transaction in the Read Uncommitted isolation level and due to Dirty Reads, you will see how much progress the other transaction has done.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.