Before we start discussing the 2PL algorithm implementation, it’s very important to explain how the read and write locks work.
A read or share lock prevents a resource from being written while allowing other concurrent reads.
A write or exclusive lock disallows both read and write operations on a given resource.
Some database systems, like PostgreSQL, MySQL, or SQL Server, offer the possibility of acquiring read and write locks on a given tuple or range of tuples. Other database systems, like Oracle, only allow write/exclusive locks to be acquired via the FOR UPDATE clause.
Read lock clause
Write lock clause
LOCK IN SHARE MODE
For more details about how you can acquire read or write locks with JPA and Hibernate, check out this article.
However, read and write locks are not limited to database systems only. While traditionally, entering a Java synchronized block allows the acquisition of an exclusive lock, since version 1.5, Java allows both read and write locks via the ReentrantReadWriteLock object.
Locks alone are not sufficient for preventing conflicts. A concurrency control strategy must define how locks are being acquired and released because this also has an impact on transaction interleaving.
For this purpose, the 2PL protocol defines a lock management strategy for ensuring strict serializability.
The 2PL protocol splits a transaction into two sections:
expanding phase (locks are acquired, and no lock is allowed to be released)
shrinking phase (all locks are released, and no other lock can be further acquired).
For a database transaction, the expanding phase means that locks are allowed to ve acquired from the beginning of the transaction until its end, while the shrinking phase is represented by the commit or rollback phase, as at the end of a transaction, all the acquired locks are being released.
The following diagram shows how transaction interleaving is coordinated by 2PL:
Both Alice and Bob acquire a read lock on a given a post record via a SELECT FOR SHARE PostgreSQL clause.
When Bob attempts to execute an UPDATE statement on the post entry, his statement is blocked by the Lock Manager because the UPDATE statement needs to acquire a write lock on the post row while Alice is still holding a read lock on this database record.
Only after Alice’s transaction ends and all her locks are being released, Bob can resume his UPDATE operation.
Bob’s UPDATE statement will generate a lock upgrade, so his previously acquired read lock is replaced by an exclusive lock, which will prevent other transactions from acquiring a read or write lock on the same post record.
Alice starts a new transaction and issues a SELECT FOR SHARE query with a read lock acquisition request for the same post entry, but the statement is blocked by the Lock Manager since Bob owns an exclusive lock on this record.
After Bob’s transaction is committed, all his locks are released, and Alice’s SELECT query can be resumed.
The 2PL algorithm offers Strict Serializability, which is the golden standard when it comes to data integrity. Strict Serializability means that the outcome is both Serializable and Linearizable.
Two or more transactions are Serializable if their associated read and write operations are interleaved in such a way that the outcome is equivalent to some serial execution. For example, if we have two transactions A and B, as long as the outcome is either A, B or B, A, the two transactions are Serializable. For N transactions, the outcome must be equivalent to one of the N! transaction permutations.
However, Serializability does not take the flow of time into consideration. On the other hand, Linearizability implies time-based ordering. For example, a system is linearizable if any subsequent read will reflect the changes done by a previous write operation. For more details about Lienearizbaility, check out this article.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
The 2PL (Two-Phase Locking) algorithm was introduced in 1976 in [The Notions of Consistency and Predicate Locks in a Database System] (http://research.microsoft.com/en-us/um/people/gray/papers/On%20the%20Notions%20of%20Consistency%20and%20Predicate%20Locks%20in%20a%20Database%20System%20CACM.pdf) paper by Kapali Eswaran and Jim Gray (et al.), which demonstrated that Serializability could be obtained if all transactions used the 2PL algorithm.
Initially, all database systems employed 2PL for implementing Serializable transactions, but, with time, many vendors have moved towards MVCC (Multi-Version Concurrency Control) concurrency control mechanisms.
Nowadays, only SQL Server uses the 2PL algorithm by default, unless the READ_COMMITTED_SNAPSHOT orALLOW_SNAPSHOT_ISOLATIONMVCC modes are not set at the database level. Even if the InnoDB MySQL storage engine is MVCC-based when switching to the Serializable isolation level, the database will use the 2PL algorithm since locks will be acquired on both read and write operations.
For this reason, it’s very important to understand how the 2PL algorithm works and that it can guarantee Strict Serializability.
Based on my book, High-Performance Java Persistence, this workshop teaches you various data access performance optimizations from JDBC, to JPA, Hibernate and jOOQ for the major rational database systems (e.g. Oracle, SQL Server, MySQL and PostgreSQL).