A beginner’s guide to Linearizability
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Hypersistence Optimizer is that tool!
Linearizability is a lesser-known, yet omnipresent property of a data registry in the context of read and write operations that might happen concurrently.
This article aims to explain what linearizability consists of, and why it’s more prevalent that you might have previously thought.
Instantaneous reads and writes
Now, assuming you have a database system with a single node like in the following diagram:
The first SELECT statement reads the value of 50, while the second SELECT reads the value of 10 since in between the two read operations a write operation was executed.
Linearizability means that modifications happen instantaneously, and once a registry value is written, any subsequent read operation will find the very same value as long as the registry will not undergo any modification.
Linearizability is what the CAP Theorem calls Consistency.
To demonstrate what it means for a system to be non-lineraizable, consider the following diagram:
This time, we don’t have a single registry or a single source of truth. Our system uses asynchronous database replication, and we have a Primary node that takes both reads and writes and a Follower node used for read operations only.
Because replication happens asynchronously, there’s a lag between the Primary node row modification and the time when the Follower applies the same change.
One database connection changes the account balance from 50 to 10 and commits the transaction. Right after, a second transaction reads from the Follower node, but since replication did not apply the balance modification, the value of 50 is read.
Therefore, this system is not linearizable since changes don’t appear to happen instantaneously. In order to make this system linearizable, we need to use synchronous replication, and the Primary node UPDATE operation will not complete until the Follower node also applies the same modification.
However, if the number of nodes increases, synchronous replication will not be feasible for two reasons. First, updating multiple nodes synchronously increases response time which can impact application responsiveness. Second, if one node does not respond anymore, all writes will have to stall until that node becomes responsive or if the system is reconfigured to exclude that particular node.
Java Memory Model
Linearizability is not limited to distributed systems and databases. When using Java, reads and writes are not guaranteed to be linearizable unless the modifying variable is
volatile or if both the write and the read are executed from within a
Because most computer systems use multiple CPU cores, and each core has its own cache, a write operation might only change the variable in the CPU cache. For the change to propagate to the main memory, the write-behind CPU cache needs to be flushed, and that’s exactly what the
volatile keyword actually does.
In a single-threaded enthronement, every read and write operation is automatically linearizable, which makes it very easy to reason about state, as well as guaranteeing correctness when it comes to implementing a certain algorithm.
In a multi-threaded environment, if the system is not linearizable, it’s going to be much more challenging to guarantee correctness since reads and writes appear to happen at different times than their actual wall-clock time.
To conclude, a linearizable system guarantees a strict time ordering of read and writes operations matching the wall-clock time flow.