How does MVCC (Multi-Version Concurrency Control) work

Introduction

In Concurrency Control theory, there are two ways you can deal with conflicts:

  • You can avoid them, by employing a pessimistic locking mechanism (e.g. Read/Write locks, Two-Phase Locking)
  • You can allow conflicts to occur, but you need to detect them using an optimistic locking mechanism (e.g. logical clock, MVCC)

Because MVCC (Multi-Version Concurrency Control) is such a prevalent Concurrency Control technique (not only in relational database systems, in this article, I’m going to explain how it works.

What’s the goal

When the ACID transaction properties were first defined, Serializability was assumed. And to provide a Strict Serializable transaction outcome, the 2PL (Two-Phase Locking) mechanism was employed. When using 2PL, every read requires a shared lock acquisition, while a write operation requires taking an exclusive lock.

  • a shared lock blocks Writers, but it allows other Readers to acquire the same shared lock
  • an exclusive lock blocks both Readers and Writers concurring for the same lock

However, locking incurs contention, and contention affects scalability. The Amdhal’s Law or the Universal Scalability Law demonstrate how contention can affect response Time speedup.

For this reason, database researchers have come up with a different Concurrency Control model which tries to reduce locking to a bare minimum so that:

  • Readers don’t block Writers
  • Writers don’t block Readers

The only use case that can still generate contention is when two concurrent transactions try to modify the same record since, once modified, a row is always locked until the transaction that modified this record either commits or rolls back.

In order to specify the aforementioned Reader/Writer non-locking behavior, the Concurrency Control mechanism must operate on multiple versions of the same record, hence this mechanism is called Multi-Version Concurrency Control (MVCC).

While 2PL is pretty much standard, there’s no standard MVCC implementation, each database taking a slightly different approach. In this article, we are going to use PostgreSQL since its MVCC implementation is the easiest one to visualize.

PostgreSQL

While Oracle and MySQL use the undo log to capture uncommitted changes so that rows can be reconstructed to their previously committed version, PostgreSQL stores all row versions in the table data structure.

What’s even more interesting is that every row has two additional columns:

  • t_{\text{min}} – which defines the transaction id that inserted the record
  • t_{\text{max}} – which defines the transaction id that deleted the row

In PostgreSQL, the Transaction Id is a 32-bit integer, and the VACUUM process is responsible (among other things like reclaiming old row versions that are no longer in use) for making sure that the id does not overflow.

For this reason, you should never disable the VACUUM as transaction wraparound can lean to catastrophic situations.

Inserting a record

To understand how INSERT works in MVCC, consider the following diagram:

mvcc_insert

  1. Both Alice and Bob start a new transaction, and we can see their transaction ids by calling the txid_current() PostgreSQL function
  2. When Alice inserts a new post row, the t_{\text{min}} column value is set to Alice’s transaction id
  3. Under default Read Committed isolation level, Bob cannot see Alice’s newly inserted record until Alice committs her transaction
  4. After Alice has committed, Bob can now see Alice’s newly inserted row

As a rule of thumb, if the transaction id is lower than the t_{\text{min}} value of a committed row, the transaction is not allowed to read this record version.

Deleting a record

To understand how DELETE works in MVCC, consider the following diagram:

mvcc_delete

  1. Both Alice and Bob start a new transaction, and we can see their transaction ids by calling the txid_current() PostgreSQL function
  2. When Bob deletes a post row, the t_{\text{max}} column value is set to Bob’s transaction id
  3. Under default Read Committed isolation level, until Bob manages to commit his transaction, Alice can still see the record that was deleted by ob
  4. After Bob has committed, Alice can no longer see the deleted row

While in 2PL, Bob’s modification would block Alice read statement, in MVCC Alice is still allowed to see the previous version until Bob manages to commit his transaction.

The DELETE operation does not physically remove a record, it just marks it as ready for deletion, and the VACUUM process will collect it when this row is no longer in use by any current running transaction.

As a rule of thumb, if the transaction id is greater than the t_{\text{max}} value of a committed row, the transaction is not allowed to read this record version anymore.

Updating a record

To understand how UPDATE works in MVCC, consider the following diagram:

mvcc_update

  1. Both Alice and Bob start a new transaction, and we can see their transaction ids by calling the txid_current() PostgreSQL function
  2. When Bob updates a post record, we can see two operations happening: a DELETE and an INSERT.
    The previous row version is marked as deleted by setting the t_{\text{max}} column value to Bob’s transaction id, and a new row version is created which has the t_{\text{min}} column value set to Bob’s transaction id
  3. Under default Read Committed isolation level, until Bob manages to commit his transaction, Alice can still see the previous record version
  4. After Bob has committed, Alice can now see the new row version that was updated by Bob

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

By allowing multiple versions of the same record, there is going to be less contention on reading/writing records since Readers will not block writers and Writers will not block Readers as well.

Although not as intuitive as 2PL (Two-Phase Locking), MVCC is not very difficult to understand either. However, it’s very important to understand how it works, especially since data anomalies are treated differently than when locking is being employed.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

13 thoughts on “How does MVCC (Multi-Version Concurrency Control) work

  1. First, thanks for the article …
    What I don’t understand or miss, is how t_min and t_max are actually used in all flows presented by the sequence diagrams.
    I’m aware about “applicative” approach based on increasing version counters for each row, and all operations performed with SQL filtering on the current version with “where version=N”. Is it the same approach, but in Postgres core?

  2. Thanks for the great article!

    Don’t you think the default wildfly h2 datasource (ExampleDS) should have MVCC enabled by default or is there any (good) reason against doing it?

    1. Implementing MVCC and providing a truly Serializable isolation level is very hard. Probably, that’s why H2 defaults to 2PL. It’s much easier to provide a Strict Serializable transcation outcome.

  3. from pg READ COMMITTED docs:
    “When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.”

  4. Hi Vlad,

    Thank you for this article.
    I have one question though.

    You write (in delete example):

    As a rule of thumb, if the transaction id is greater than the t_max value of a committed row, the transaction is not allowed to read this record version anymore.

    But in your example transaction id of reader (Alice) is 313409, so smaller than t_max of the committed (deleted) row (313410)…

    Best,
    Piotr

    PS: also in update example and delete example you write that Alice deletes and updates, but from the diagram it looks like Bob is doing it…

    1. Thanks for the tip. I updated the description to match the diagram.

      As for the DELETE example, that’s why I mentioned the “commit” part. Prior to committed a record, there are two extra fields (t_ctid, t_cid) that are taken into consideration. Check out this article about PostgreSQL internals for more details. However, the purpose of this article was to give an overview of how MVCC works, so it does not go into more complex details.

      1. Thanks for the link I will have a look.

        There is still one tiny correction to do in the text – for delete description you still have:
        “While in 2PL, Alice’s modification would block Bob read request, in MVCC Bob is still allowed to see the previous version until Alice manages to commit her transaction.”
        “Alice” and “Bob” names should be switched here.

      2. I think I found it.

        Essential info is:
        1) Rules of visibility in MVCC are applied to “active” transaction as denoted by the snapshot. Transactions are “active” when they are “in progress” or not yest started. When using the obtained snapshot for the visibility check, active transactions in the snapshot must be treated as in progress even if they have actually been committed or aborted. This rule is important because it causes the difference in the behavior between READ COMMITTED and REPEATABLE READ (or SERIALIZABLE)
        2) In the READ COMMITTED isolation level, the transaction obtains a snapshot whenever an SQL command is executed; otherwise (REPEATABLE READ or SERIALIZABLE), the transaction only gets a snapshot when the first SQL command is executed.

        So the examples you described are of course perfectly fine. They are for READ COMMITED, snapshot on which visibility rules are executed is fetched for each SQL statement.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s