A beginner’s guide to ACID and database transactions

(Last Updated On: January 4, 2018)

Introduction

Transactions are omnipresent in today’s enterprise systems, providing data integrity even in highly concurrent environments. So let’s get started by first defining the term and the context where you might usually employ it.

A transaction is a collection of read/write operations succeeding only if all contained operations succeed.

Transaction-workflow

Inherently a transaction is characterized by four properties (commonly referred as ACID):

  1. Atomicity
  2. Consistency
  3. Isolation
  4. Durability

In a relational database, every SQL statement must execute in the scope of a transaction. Without defining the transaction boundaries explicitly, the database is going to use an implicit transaction which is wraps around every individual statement. The implicit transaction begins before the statement is executed and end (commit or rollback) after the statement is executed.
The implicit transaction mode is commonly known as autocommit.

For an enterprise application, the auto-commit mode is something you’d generally want to avoid since it has serious performance penalties, and it doesn’t allow you to include multiple DML operations in a single atomic Unit of Work.

It’s very important to understand those, hence we will discuss each and every one of them as follows.

Atomicity

Atomicity takes individual operations and turns them into an all-or-nothing unit of work, succeeding if and only if all contained operations succeed.

A transaction might encapsulate a state change (unless it is a read-only one). A transaction must always leave the system in a consistent state, no matter how many concurrent transactions are interleaved at any given time.

Consistency

Consistency means that constraints are enforced for every committed transaction. That implies that all Keys, Data types, Checks and Trigger are successful and no constraint violation is triggered.

Isolation

Transactions require concurrency control mechanisms, and they guarantee correctness even when being interleaved. Isolation brings us the benefit of hiding uncommitted state changes from the outside world, as failing transactions shouldn’t ever corrupt the state of the system. Isolation is achieved through concurrency control using pessimistic or optimistic locking mechanisms.

Durability

A successful transaction must permanently change the state of a system, and before ending it, the state changes are recorded in a persisted transaction log. If our system is suddenly affected by a system crash or a power outage, then all unfinished committed transactions may be replayed.

For messaging systems like JMS, transactions are not mandatory. That’s why we have non-transacted acknowledgement modes.

File system operations are usually non-managed, but if your business requirements demand transaction file operations, you might make use a tool such as XADisk.

While messaging and file systems use transactions optionally, for database management systems transactions are compulsory.

Challanges

ACID is old school. Jim Gray described atomicity, consistency and durability long before I was even born. But that particular paper doesn’t mention anything about isolation. This is understandable if we think of the production systems of the late 70’s, which according to Jim Gray:

“At present, the largest airlines and banks have about 10,000 terminals and about 100 active transactions at any instant”.

So all efforts were spent on delivering correctness rather than concurrency. Things have changed drastically ever since, and nowadays even modest set-ups are able to run 1000 TPS.

From a database perspective, the atomicity is a fixed property, but everything else may be traded off for performance/scalability reasons.

If the database system is composed of multiple nodes, then distributed system consistency (C in CAP Theorem not C in ACID) mandates that all changes be propagated to all nodes (multi-master replication). If slaves nodes are updated asynchronously then we break the consistency rule, the system becoming “eventually consistent“.

Peter Bailis has a very good article explaining the difference between Consistency in CAP Theorem and Consistency in ACID.

A transaction is a data state transition, so the system must operate as if all transactions occur in a serial form even if those are concurrently executed.
If there would be only one connection running at all times, then serializability wouldn’t impose any concurrency control cost. In reality, all transactional systems must accommodate concurrent requests, hence serialization has its toll on scalability. The Amdahl’s law describes the relation between serial execution and concurrency:

“The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.”

As you’ll see later, most database management systems choose (by default) to relax correctness guarantees to achieve better concurrency.

Playing with durability makes sense for highly performing clustered databases if the enterprise system business requirements don’t mandate durable transactions. But, most often durability is better off untouched.

Isolation Levels

Although some database management systems offer MVCC, usually concurrency control is achieved through locking. But as we all know, locking increases the serializable portion of the executed code, affecting parallelization.

The SQL standard defines four Isolation levels:

  • READ_UNCOMMITTED
  • READ_COMMITTED
  • REPEATABLE_READ
  • SERIALIZABLE

All but the SERIALIZABLE level are subject to data anomalies (phenomena) that might occur according to the following pattern:

Isolation Level Dirty read Non-repeatable read Phantom read
READ_UNCOMMITTED allowed allowed allowed
READ_COMMITTED prevented allowed allowed
REPEATABLE_READ prevented prevented allowed
SERIALIZABLE prevented prevented prevented

Phenomena

But what are all those phenomena we have just listed? Let’s discuss each and every one of them.

Dirty read

ACID-dirty-read

A dirty read happens when a transaction is allowed to read uncommitted changes of some other running transaction. This happens because there is no locking preventing it. In the picture above, you can see that the second transaction uses an inconsistent value as of the first transaction had rollbacked.

Non-repeatable read

ACID-non-repeatable-read

A non-repeatable read manifests when consecutive reads yield different results due to a concurring transaction that has just updated the record we’re reading. This is undesirable since we end up using stale data. This is prevented by holding a shared lock (read lock) on the read record for the whole duration of the current transaction.

Phantom read

ACID-phantom-read

A phantom read happens when a second transaction inserts a row that matches a previous select criteria of the first transaction. We, therefore, end up using stale data, which might affect our business operation. This is prevented using range locks or predicate locking.

Even more phenomena

Even if not mentioned in the SQL Stanard, there are even more phenomena that you should be aware of, like:

Knowing when these phenomena can occur can addressing them properly is what data integrity is all about.

If you enjoyed this article, I bet you are going to love my book as well.

Default Isolation Levels

Even if the SOL standard mandates the use of the SERIALIZABLE isolation level, most database management systems use a different default level.

Database Default isolation Level
Oracle READ_COMMITTED
MySQL REPEATABLE_READ
Microsoft SQL Server READ_COMMITTED
PostgreSQL READ_COMMITTED
DB2 CURSOR STABILITY

Usually, READ_COMMITED is the right choice, since not even SERIALIZABLE can protect you from a lost update where the reads/writes happen in different transactions (and web requests). You should take into consideration your enterprise system requirements and set up tests for deciding which isolation level best suits your needs.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

23 thoughts on “A beginner’s guide to ACID and database transactions

  1. it was good briefing on ACID and database transaction. It is good for beginners to understand the importance of ACID in a database transaction.

  2. As you’ll see later, most database management systems choose (by default) to relax consistency to >achieve better concurrency.
    I think it should have been “relax isolation”, not “consistency”.

      1. I don’t know if I should agree, or disagree 🙂 I want to explain my interpretation of your words. Would you please tell me if this is what you mean?

        ACID Consistency is a property of an ACID systems that states that a transaction shifts the system from one state, which is consistent with some system rules (e.g. constraints), to another consistent state (see https://msdn.microsoft.com/en-us/library/aa480356.aspx , https://www.ibm.com/support/knowledgecenter/en/SSGMCP_5.3.0/com.ibm.cics.ts.productoverview.doc/concepts/acid.html ). This definition (if correct) presumes, that inside a transaction we still may have and observe an inconsistent state of the system even if all transactions are serializable (e.g. Oracle deferrable constraints https://asktom.oracle.com/pls/apex/asktom.search?tag=deferrable-constraints). Of course, by relaxing serializability (i.e. relaxing ACID Isolation by allowing transactions to see some side effects of the concurrently running transactions) we allow more situations when a transaction may observe state that is not consistent with the system rules. So one might say that we are relaxing ACID Consistency by relaxing ACID Isolation. Does this correctly describe your idea?

        Speaking of referring to serializability as strong consistency I believe that this comes not from ACID Consistency, but from consistency in concurrent programming. Isolation level in ACID databases is exactly the same concept as consistency model in concurrent programming / distributed systems. And in fact, sequential consistency (concurrent programming term) is a particular case of serializability (database term) when all transactions consist of exactly one action (either read or write). And because sequential consistency is a strong consistency model, people also refer to serializability as strong consistency. Btw, many also refer to linearizability as strong consistency.

      2. If you are interested in this topic, you should watch my Transactions and Concurrency Control presentation too.

        Now, back to your question. Consistency means different things in different contexts:

        1. In ACID, Consistency means Constraints: FK, Unique, Type, Rules, Checks, etc.
        2. In distributed systems, Consistency like in CAP, means Linearizability.
        3. In terms of read-write anomalies, Consistency means correctness.

        So one might say that we are relaxing ACID Consistency by relaxing ACID Isolation. Does this correctly describe your idea?

        No. You never relax ACID consistency. You just relax the read-write interleaving correctness level (a.k.a Isolation).

        Speaking of referring to serializability as strong consistency I believe that this comes not from ACID Consistency, but from consistency in concurrent programming.

        Serializability gives you strong consistency if consistency means having a transaction schedule that is equivalent to one that would otherwise be executed serially.

        Isolation level in ACID databases is exactly the same concept as consistency model in concurrent programming / distributed systems.

        It depends. Isolation Levels as defined by ACID are meant for when you write to a single Master, while in a distributed system, linearizability is harder to achieve without a quorum. Originally, isolation Levels were meant to define how much locking do you want to employ since reads would imply a Shared Lock in the 2PL concurrency control model that was standard in the 80’s.

        However, in a distributed system, achieving Serializability is way much harder to achieve especially when you use an unreliable network.

        And in fact, sequential consistency (concurrent programming term) is a particular case of serializability (database term) when all transactions consist of exactly one action (either read or write). And because sequential consistency is a strong consistency model, people also refer to serializability as strong consistency. Btw, many also refer to linearizability as strong consistency.

        Sequential consistency is below Linearizability or Serializability. I think you are talking about Serial execution there. Check out this article for more details.

      3. “No. You never relax ACID consistency. You just relax the read-write interleaving correctness level (a.k.a Isolation).”

        So, long story short, this means that your original phrase “most database management systems choose (by default) to relax consistency to achieve better concurrency” is not about ACID Consistency. In this case, I completely agree with it 🙂 But it might be a good idea to emphasize this fact in the article.

        “If you are interested in this topic, you should watch my Transactions and Concurrency Control presentation too.”

        Yes, I’ve seen it 🙂 And I am grateful for your presentations and the blog. I am also going to buy your book after I finish my current reading.

        “Sequential consistency is below Linearizability or Serializability. I think you are talking about Serial execution there. Check out this article for more details.”

        If by saying “below” you mean “weaker”, then yes. However, let’s imagine an ACID system only supporting transactions that constitute of exactly one operation (either read or write on a single key), then serializability (of transactions) for such a system would guarantee not more and not less than sequential consistency does for read / write operations on registers. The last Aphyr’s comment under the article you mentioned (https://aphyr.com/posts/313-strong-consistency-models#comment-2911) states exactly this. And this is what I meant by saying, that sequential consistency is a particular case of serializability. Does this make sense to you? 🙂

      4. And this is what I meant by saying, that sequential consistency is a particular case of serializability. Does this make sense to you?

        Sequential Consistency is a concurrency control model where you need a Session to queue your operations so that they all appear in the same order so it guarantees read-your-writes. In this light, I think that sequential consistency is worth when you need more operations per session.

        Comparing Sequential or Causal Consistency with Serializability is a little bit problematic. Check out the diagram on Aphyr blog post. On the left side, you have Visibility properties while on the right side you get Operation interleaving options. So, the consistency models on the left are not strictly related to the one of the right.

        What you see at the top is Strict Serializability which implies both Serializability and Linearizability, meaning that read and writes happen in the same order they were issued. The order of operations is the real order of operations in time as well. So, there is no reordering going on.

        Back, to your question, Serializability only guarantees that you get some order which is equivalent to a serial execution. However, there could be many possible serial executions. I even wrote an article about this topic which illustrates that what people think of Serializability is actual Strict Serializability because humans expect that operations are always executed in the same order they were issued according to the wall clock time. But, Strict Serializability is hard to achieve. You need 2PL to get that. Usually, you get just Serializability and the transaction schedule might surprise you in the end, due to how operations were ordered.

      5. Back, to your question, Serializability only guarantees that you get some order which is equivalent to a serial execution. However, there could be many possible serial executions. I even wrote an article about this topic which illustrates that what people think of Serializability is actual Strict Serializability

        Now I see! It’s strikingly, how such a thing that lies on the surface has been eluding me for all this time. It seems to be true that Serializability does not demand that transactions issued by the same client (DB session) take place in that same order. While sequential consistency demands the preservation of the order of actions made by each process. And this explains why you initially said: “I think you are talking about Serial execution there.”, when I compared serializability to sequential consistency. This is a revelation, thanks for helping me to realize this.

  3. Hi, Nice article…I have a doubt though. does optimistic lock work in mulitple JVM enronment? could you please let me know how it works in multi request update scenario as hte first request could one JVM and following request could go to other JVM.

    1. Of course, it works with multiple JVMs or even systems using .NET, Node.js, Python or Ruby. The reason why it works is that the RDBMS does the coordination, not the data access layer.

      The version is incremented by the database, and since even the lowest isolation level prohibits dirty writes, you are assured that once a Transaction has modified a record, no other concurrent Transaction can modify the same record until the firsts one either commits or rolls back.

      The data access layer just inspects the executeUpdate row count, and if the value is less than 1, you are seeing stale results: either the row was deleted or the version was changed.

      For a more detailed explanation, check out the Transaction and Concurrency Control chapters in my book, High-Performance Java Persistence.

Leave a Reply

Your email address will not be published. Required fields are marked *