How does the 2PL (Two-Phase Locking) algorithm work
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Hypersistence Optimizer is that tool!
The 2PL (Two-Phase Locking) algorithm is one of the oldest concurrency control mechanisms used by relational database systems to guarantee data integrity.
In this article, I’m going to explain how the 2PL algorithm works and how you can implement it in any programming language.
Before we start discussing the 2PL algorithm implementation, it’s very important to explain how the read and write locks work.
A read or share lock prevents a resource from being written while allowing other concurrent reads.
A write or exclusive lock disallows both read and write operations on a given resource.
|Compatibility matrix||Read lock||Write Lock|
Some database systems, like PostgreSQL, MySQL, or SQL Server, offer the possibility of acquiring read and write locks on a given tuple or range of tuples. Other database systems, like Oracle, only allow write/exclusive locks to be acquired via the
FOR UPDATE clause.
|Database name||Read lock clause||Write lock clause|
|Oracle||FOR UPDATE||FOR UPDATE|
|SQL Server||WITH (HOLDLOCK,
|PostgreSQL||FOR SHARE||FOR UPDATE|
|MySQL||LOCK IN SHARE MODE||FOR UPDATE|
For more details about how you can acquire read or write locks with JPA and Hibernate, check out this article.
However, read and write locks are not limited to database systems only. While traditionally, entering a Java
synchronized block allows the acquisition of an exclusive lock, since version 1.5, Java allows both read and write locks via the
Locks alone are not sufficient for preventing conflicts. A concurrency control strategy must define how locks are being acquired and released because this also has an impact on transaction interleaving.
For this purpose, the 2PL protocol defines a lock management strategy for ensuring strict serializability.
The 2PL protocol splits a transaction into two sections:
- expanding phase (locks are acquired, and no lock is allowed to be released)
- shrinking phase (all locks are released, and no other lock can be further acquired).
For a database transaction, the expanding phase means that locks are allowed to ve acquired from the beginning of the transaction until its end, while the shrinking phase is represented by the commit or rollback phase, as at the end of a transaction, all the acquired locks are being released.
The following diagram shows how transaction interleaving is coordinated by 2PL:
- Both Alice and Bob acquire a read lock on a given a
postrecord via a
SELECT FOR SHAREPostgreSQL clause.
- When Bob attempts to execute an UPDATE statement on the
postentry, his statement is blocked by the Lock Manager because the UPDATE statement needs to acquire a write lock on the
postrow while Alice is still holding a read lock on this database record.
- Only after Alice’s transaction ends and all her locks are being released, Bob can resume his UPDATE operation.
- Bob’s UPDATE statement will generate a lock upgrade, so his previously acquired read lock is replaced by an exclusive lock, which will prevent other transactions from acquiring a read or write lock on the same
- Alice starts a new transaction and issues a
SELECT FOR SHAREquery with a read lock acquisition request for the same
postentry, but the statement is blocked by the Lock Manager since Bob owns an exclusive lock on this record.
- After Bob’s transaction is committed, all his locks are released, and Alice’s SELECT query can be resumed.
The 2PL algorithm offers Strict Serializability, which is the golden standard when it comes to data integrity. Strict Serializability means that the outcome is both Serializable and Linearizable.
Two or more transactions are Serializable if their associated read and write operations are interleaved in such a way that the outcome is equivalent to some serial execution. For example, if we have two transactions A and B, as long as the outcome is either A, B or B, A, the two transactions are Serializable. For N transactions, the outcome must be equivalent to one of the
N! transaction permutations.
However, Serializability does not take the flow of time into consideration. On the other hand, Linearizability implies time-based ordering. For example, a system is linearizable if any subsequent read will reflect the changes done by a previous write operation. For more details about Lienearizbaility, check out this article.
The 2PL (Two-Phase Locking) algorithm was introduced in 1976 in The Notions of Consistency and Predicate Locks in a Database System paper by Kapali Eswaran and Jim Gray (et al.), which demonstrated that Serializability could be obtained if all transactions used the 2PL algorithm.
Initially, all database systems employed 2PL for implementing Serializable transactions, but, with time, many vendors have moved towards MVCC (Multi-Version Concurrency Control) concurrency control mechanisms.
Nowadays, only SQL Server uses the 2PL algorithm by default. However, if you set the
MVCC modes at the database level, then SQL Server will switch to using MVCC.
Even if the InnoDB MySQL storage engine is MVCC-based when switching to the Serializable isolation level, the database will use the 2PL algorithm since locks will be acquired on both read and write operations.
For this reason, it’s very important to understand how the 2PL algorithm works and that it can guarantee Strict Serializability.