A beginner’s guide to Phantom Read anomaly

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

Database transactions are defined by the four properties known as ACID. The Isolation Level (I in ACID) allows you to trade off data integrity for performance.

The weaker the isolation level, the more anomalies can occur, and in this article, we are going to describe the Phantom Read phenomenon.

A beginner’s guide to Phantom Read anomaly - @vlad_mihalcea https://t.co/TJ6otpwu4V pic.twitter.com/dI9hPZ7gXK
— Java (@java) July 12, 2018

Observing data changed by a concurrent transaction

If a transaction makes a business decision based on a set of rows satisfying a given predicate, without range locks, a concurrent transaction might insert a record matching that particular predicate.

In the diagram above, the flow of statements goes like this:

Alice and Bob start two database transactions.
Bob’s reads all the post_comment records associated with the post row with the identifier value of 1.
Alice adds a new post_comment record which is associated with the post row having the identifier value of 1.
Alice commits her database transaction.
If Bob’s re-reads the post_comment records having the post_id column value equal to 1, he will observe a different version of this result set.

This phenomenon is problematic when the current transaction makes a business decision based on the first version of the given result set.

How the database prevents it

The SQL standard says that Phantom Read occurs if two consecutive query executions render different results because a concurrent transaction has modified the range of records in between the two calls.

Although providing consistent reads is a mandatory requirement for Serializability, that is not sufficient. For instance, one buyer might purchase a product without being aware of a better offer that was added right after the user has finished fetching the offer list.

The 2PL-based Serializable isolation prevents Phantom Reads through the use of predicate locking while MVCC (Multi-Version Concurrency Control) database engines address the Phantom Read anomaly by returning consistent snapshots.

However, a concurrent transaction can still modify the range of records that was read previously. Even if the MVCC database engine introspects the transaction schedule, the outcome is not always the same as a 2PL-based implementation. One such example is when the second transaction issues an insert without reading the same range of records as the first transaction. In this particular use case, some MVCC database engines will not end up rolling back the first transaction.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

This phenomenon is typical for both Read Uncommitted, Read Committed and Repeatable Read isolation levels. The default isolation level being either Read Committed (Oracle, SQL Server or PostgreSQL) or Repeatable Read (MySQL) does not prevent this anomaly.

Nevertheless, preventing this anomaly is fairly simple. All you need to do is use a higher isolation level like Serializable. Or, if the underlying RDBMS supports predicate locks, you can simply lock the range of records using a share (read) lock or an exclusive (write) range lock as explained in this article.

Follow @vlad_mihalcea