A beginner’s guide to read and write skew phenomena

(Last Updated On: January 22, 2019)
Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

In my article about ACID and database transactions, I introduced the three phenomena described by the SQL standard:

  • dirty read
  • non-repeatable read
  • phantom read

While these are good to differentiate the four isolation levels (Read Uncommitted, Read Committed, Repeatable Read and Serializable), in reality, there are more phenomena to take into consideration as well. The 1995 paper (A Critique of ANSI SQL Isolation Levels) introduces the other phenomena that are omitted from the standard specification.

In my High-Performance Java Persistence book, I decided to insist on the Transaction chapter as it is very important for both data-access effectiveness and efficiency.

Domain model

For the following examples, I’m going to use the following two entities:

ReadWrite Skew

In our fictional application, when the Post title is changed, the author must be recorded in the associated PostDetails record.

If read and write skew are not prevented, this domain model constraint can be compromised, as you will see in the following test cases.

Read skew

Read Skew

  • Alice selects a Post entity.
  • Bob sneaks in and updates both the Post and the PostDetails entities.
  • Alice thread is resumed and she selects the PostDetails record.

If read skew is permitted, Alice sees Bob’s update and she can assume that the previous Post version (that she read at the beginning of her transaction) was issued by Bob, therefore breaking consistency.

Running this test on the four most common relation database systems gives the following results:

Database isolation level Read skew
Oracle Read Committed Yes
Oracle Serializable No
SQL Server Read Uncommitted Yes
SQL Server Read Committed Yes
SQL Server Read Committed Snapshot Isolation Yes
SQL Server Repeatable Read No
SQL Server Serializable No
SQL Server Snapshot Isolation No
PostgreSQL Read Uncommitted Yes
PostgreSQL Read Committed Yes
PostgreSQL Repeatable Read No
PostgreSQL Serializable No
MySQL Read Uncommitted Yes
MySQL Read Committed Yes
MySQL Repeatable Read No
MySQL Serializable No

Write skew

Write Skew

  • Both Alice and Bob select the Post and the PostDetails entities.
  • Bob modifies the Post title, but, since the PostDetails is already marked as updated by Bob, the dirty checking mechanism will skip updating the PostDetails entity, therefore preventing a redundant UPDATE statement.
  • Alice wants to update the Post entity, but the entity already has the same value as the one she wants to apply so only the PostDetails record will mark that the latest change is the one proposed by Alice.

If write skew is permitted, Alice and Bob disjoint writes will proceed, therefore breaking the guarantee that Post and PostDetails should always be in sync.

Running this test on the four most common relation database systems gives the following results:

Database isolation level Write skew
Oracle Read Committed Yes
Oracle Serializable Yes
SQL Server Read Uncommitted Yes
SQL Server Read Committed Yes
SQL Server Read Committed Snapshot Isolation Yes
SQL Server Repeatable Read No
SQL Server Serializable No
SQL Server Snapshot Isolation Yes
PostgreSQL Read Uncommitted Yes
PostgreSQL Read Committed Yes
PostgreSQL Repeatable Read Yes
PostgreSQL Serializable No
MySQL Read Uncommitted Yes
MySQL Read Committed Yes
MySQL Repeatable Read Yes
MySQL Serializable No

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

  • Write skew is prevalent among MVCC (Multi-Version Concurrency Control) mechanisms and Oracle cannot prevent it even when claiming to be using Serializable, which in fact is just the Snapshot Isolation level.
  • SQL Server default locking-based isolation levels can prevent write skews when using Repeatable Read and Serializable. Neither one of its MVCC-based isolation levels (MVCC-based) can prevent/detect it instead.
  • PostgreSQL prevents it using its more advanced Serializable Snapshot Isolation level.
  • MySQL employs shared locks when using Serializable so the write skew can be prevented even if InnoDB is also MVCC-based.

Download free ebook sample

Newsletter logo
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence,
  • a 10% discount coupon for my book.

8 Comments on “A beginner’s guide to read and write skew phenomena

  1. Wikipedia says write skew is possible in repeatable read isolation level: https://en.wikipedia.org/wiki/Isolation_(database_systems)#Repeatable_reads

    However I dont think so that is the case. In lock based concurrency control, repeatable reads require that transaction keeps read and write locks (acquired on selected data) until the end of the transaction.

    In your example of write skew, if either Alice or Bob keeps read locks of both Post and Post author, then other one cannot modify anything.

    Thus I believe write skew cannot occur with repeatable read isolation level. Am right with this?

    • There are two ways to implement concurrency control: 2PL or MVCC. Wikipedia and the SQL Standard assumes 2PL-based systems, which were traditionally implemented when the standard emerged. in 2PL, locks are aquired, hence read and write skew and lost updates can never happen if locks are taken upon read. However, in MVCC, they can happen, and most DBs use MVCC nowadays (e.g. Oracle, MySQL, MariaDB, PostgreSQL). For more details, check out my High-Performance Java Persistence book.

  2. Hello,
    I don’t see a problem with “Write skew” example. After both transactions commit both tables will be in consistent state.
    Could you please provide more obvious example of the inconsistency?

      • Thanks for a quick reply!
        In fact I’m reading your book right now and it uses the same example for the “Write skew” phenomenon.
        I understand where is the problem with other 7 phenomena that you mentioned, but for the “Write skew” I can’t see what might go wrong. I think the more obvious example might help.

      • The outcome is not equivalent to any serial execution: A – B or B – A. That’s it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Hypersistence Optimizer can automatically detect if you are using JPA and Hibernate properly