A beginner’s guide to Dirty Read anomaly

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

Database transactions are defined by the four properties known as ACID. The Isolation Level (I in ACID) allows you to trade off data integrity for performance.

The weaker the isolation level, the more anomalies can occur, and in this article, we are going to describe the Dirty Read phenomenon.

Reading uncommitted data

As previously mentioned, all database changes are applied to the actual data structures (memory buffers, data blocks, indexes). A dirty read happens when a transaction is allowed to read the uncommitted changes of some other concurrent transaction.

Taking a business decision on a value that has not been committed is risky because uncommitted changes might get rolled back.

In the diagram above, the flow of statements goes like this:

  1. Alice and Bob start two database transactions.
  2. Alice modifies the title of a given post record.
  3. Bob reads the uncommitted post record.
  4. If Alice commits her transaction, everything is fine. But if Alice rolls back, then Bob will see a record version that no longer exists in the database transaction log.

This anomaly is only permitted by the Read Uncommitted isolation level, and, because of the impact on data integrity, most database systems offer a higher default isolation level.

How the database prevents it

To prevent dirty reads, the database engine must hide uncommitted changes from all other concurrent transactions. Each transaction is allowed to see its own changes because otherwise the read-your-own-writes consistency guarantee is compromised.

If the underlying database uses 2PL (Two-Phase Locking), the uncommitted rows are protected by write locks which prevent other concurrent transactions from reading these records until they are committed.

When the underlying database uses MVCC (Multi-Version Concurrency Control), the database engine can use the undo log which already captures the previous version of every uncommitted record, to restore the previous value in other concurrent transaction queries. Because this mechanism is used by all other isolation levels (Read Committed, Repeatable Read, Serializable), most database systems optimize the before image restoring process (lowering its overhead on the overall application performance).

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Normally, the Read Uncommitted isolation level is rarely needed (non-strict reporting queries where dirty reads are acceptable), so Read Committed is usually the lowest practical isolation level.

However, if you have a transaction that runs a batch processing task and you want to know its progress, you could start a new transaction in the Read Uncommitted isolation level and due to Dirty Reads, you will see how much progress the other transaction has done.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.