A beginner’s guide to database locking and the lost update phenomena

Introduction

A database is highly concurrent system. There’s always a chance of update conflicts, like when two concurring transactions try to update the same record. If there would be only one database transaction at any time then all operations would be executed sequentially. The challenge comes when multiple transactions try to update the same database rows as we still have to ensure consistent data state transitions.

The SQL standard defines three consistency anomalies (phenomena):

  • Dirty reads, prevented by Read Committed, Repeatable Read and Serializable isolation levels
  • Non-repeatable reads, prevented by Repeatable Read and Serializable isolation levels
  • Phantom reads, prevented by the Serializable isolation level

A lesser-known phenomena is the lost updates anomaly and that’s what we are going to discuss in this current article.

Isolation levels

Most database systems use Read Committed as the default isolation level (MySQL using Repeatable Read instead). Choosing the isolation level is about finding the right balance of consistency and scalability for our current application requirements.

All the following examples are going to be run on PostgreSQL 9.3. Other database systems may behave differently according to their specific ACID implementation.

PostgreSQL uses both locks and MVCC (Multiversion Concurrency Control). In MVCC read and write locks are not conflicting, so reading doesn’t block writing and writing doesn’t block reading either.

Because most applications use the default isolation level, it’s very important to understand the Read Committed characteristics:

  • Queries only see data committed before the query began and also the current transaction uncommitted changes
  • Concurrent changes committed during a query execution won’t be visible to the current query
  • UPDATE/DELETE statements use locks to prevent concurrent modifications

If two transactions try to update the same row, the second transaction must wait for the first one to either commit or rollback, and if the first transaction has been committed, then the second transaction DML WHERE clause must be reevaluated to see if the match is still relevant.

UncontendedTransactions

In this example Bob’s UPDATE must wait for Alice’s transaction to end (commit/rollback) in order to proceed further.

Read Committed accommodates more concurrent transactions than other stricter isolation levels, but less locking leads to better chances of losing updates.

Lost updates

If two transactions are updating different columns of the same row, then there is no conflict. The second update blocks until the first transaction is committed and the final result reflects both update changes.

If the two transactions want to change the same columns, the second transaction will overwrite the first one, therefore loosing the first transaction update.

So an update is lost when a user overrides the current database state without realizing that someone else changed it between the moment of data loading and the moment the update occurs.

LostUpdateSingleRequestTransactions

In this example Bob is not aware that Alice has just changed the quantity from 7 to 6, so her UPDATE is overwritten by Bob’s change.

The typical find-modify-flush ORM strategy

Hibernate (like any other ORM tool) automatically translates entity state transitions to SQL queries. You first load an entity, change it and let the Hibernate flush mechanism syncronize all changes with the database.

public Product incrementLikes(Long id) {
	Product product = entityManager.find(Product.class, id);
	product.incrementLikes(); 
	return product;
}

public Product setProductQuantity(Long id, Long quantity) {
	Product product = entityManager.find(Product.class, id);
	product.setQuantity(quantity);
	return product;
}

As I’ve already pointed out, all UPDATE statements acquire write locks, even in Read Committed isolation. The persistence context write-behind policy aims to reduce the lock holding interval but the longer the period between the read and the write operations the more chances of getting into a lost update situation.

Hibernate includes all row columns in an UPDATE statement. This strategy can be changed to include only the dirty properties (through the @DynamicUpdate annotation) but the reference documentation warns us about its effectiveness:

Although these settings can increase performance in some cases, they can actually decrease performance in others.

So let’s see how Alice and Bob concurrently update the same Product using an ORM framework:

Alice Bob
store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (5, 10) WHERE ID = 1;
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 6 | 7
(1 ROW)

store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 10
(1 ROW)

store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 10
(1 ROW)

Again Alice’s update is lost without Bob ever knowing he overwrote her changes. We should always prevent data integrity anomalies, so let’s see how we can overcome this phenomena.

Repeatable Read

Using Repeatable Read (as well as Serializable which offers a even stricter isolation level) can prevent lost updates across concurrent database transactions.

Alice Bob
store=# BEGIN;
store=# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# BEGIN;
store=# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (5, 10) WHERE ID = 1;
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 6 | 7
(1 ROW)

ERROR: could not serialize access due to concurrent update
store=# SELECT * FROM PRODUCT WHERE ID = 1;
ERROR: current transaction is aborted, commands ignored until end of transaction block
(1 ROW)

This time, Bob couldn’t overwrite Alice’s changes and his transaction was aborted. In Repeatable Read, a query will see the data snapshot as of the start of the current transaction. Changes committed by other concurrent transactions are not visible to the current transaction.

If two transactions attempt to modify the same record, the second transaction will wait for the first one to either commit or rollback. If the first transaction commits, then the second one must be aborted to prevent lost updates.

SELECT FOR UPDATE

Another solution would be to use the FOR UPDATE with the default Read Committed isolation level. This locking clause acquires the same write locks as with UPDATE and DELETE statements.

Alice Bob
store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 6 | 7
(1 ROW)

id | likes | quantity
—-+——-+———-
1 | 6 | 7
(1 row)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 10) WHERE ID = 1;
UPDATE 1
store=# COMMIT;
COMMIT
store=# SELECT * FROM PRODUCT WHERE ID = 1;
id | likes | quantity
—-+——-+———-
1 | 6 | 10
(1 row)

Bob couldn’t proceed with the SELECT statement because Alice has already acquired the write locks on the same row. Bob will have to wait for Alice to end her transaction and when Bob’s SELECT is unblocked he will automatically see her changes, therefore Alice’s UPDATE won’t be lost.

Both transactions should use the FOR UPDATE locking. If the first transaction doesn’t acquire the write locks, the lost update can still happen.

Alice Bob
store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 5 | 7
(1 row)

store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE

id | likes | quantity
—-+——-+———-
1 | 5 | 7
(1 row)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 10) WHERE ID = 1;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 6 | 10
(1 row)
store=# COMMIT;

store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 6 | 7
(1 row)

store=# COMMIT;

store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 6 | 7
(1 row)

Alice’s UPDATE is blocked until Bob releases the write locks at the end of his current transaction. But Alice’s persistence context is using a stale entity snapshot, so she overwrites Bob changes, leading to another lost update situation.

Optimistic Locking

My favorite approach is to replace pessimistic locking with an optimistic locking mechanism. Like MVCC, optimistic locking defines a versioning concurrency control model that works without acquiring additional database write locks.

The product table will also include a version column that prevents old data snapshots to overwrite the latest data.

Alice Bob
store=# BEGIN;
BEGIN
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 5 | 7 | 2
(1 row)

store=# BEGIN;
BEGIN
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 5 | 7 | 2
(1 row)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY, VERSION) = (6, 7, 3) WHERE (ID, VERSION) = (1, 2);
UPDATE 1
store=# UPDATE PRODUCT SET (LIKES, QUANTITY, VERSION) = (5, 10, 3) WHERE (ID, VERSION) = (1, 2);
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 6 | 7 | 3
(1 row)

UPDATE 0
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 6 | 7 | 3
(1 row)

Every UPDATE takes the load-time version into the WHERE clause, assuming no one has changed this row since it was retrieved from the database. If some other transaction manages to commit a newer entity version, the UPDATE WHERE clause will no longer match any row and so the lost update is prevented.

Hibernate uses the PreparedStatement#executeUpdate result to check the number of updated rows. If no row was matched, it then throws a StaleObjectStateException (when using Hibernate API) or an OptimisticLockException (when using JPA).

Like with Repeatable Read the current transaction and the persistence context are aborted, in respect to atomicity guarantees.

Conclusion

Lost updates can happen unless you plan for preventing such situations. Other than optimistic locking, all pessimistic locking approaches are effective only in the scope of the same database transaction, when both the SELECT and the UPDATE statements are executed in the same physical transaction.

In my next post I will explain why optimistic locking is the only viable solution when using application-level transactions, like it’s the case for most web applications.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate bytecode enhancement

Introduction

Now that you know the basics of Hibernate dirty checking, we can dig into enhanced dirty checking mechanisms. While the default graph-traversal algorithm might be sufficient for most use-cases, there might be times when you need an optimized dirty checking algorithm and instrumentation is much more convenient than building your own custom strategy.

Using Ant Hibernate Tools

Traditionally, The Hibernate Tools have been focused on Ant and Eclipse. Bytecode instrumentation has been possible since Hibernate 3, but it required an Ant task to run the CGLIB or Javassist bytecode enhancement routines.

Maven supports running Ant tasks through the maven-antrun-plugin:

<build>
	<plugins>
		<plugin>
			<artifactId>maven-antrun-plugin</artifactId>
			<executions>
				<execution>
					<id>Instrument domain classes</id>
					<configuration>
						<tasks>
							<taskdef name="instrument"
									 classname="org.hibernate.tool.instrument.javassist.InstrumentTask">
								<classpath>
									<path refid="maven.dependency.classpath"/>
									<path refid="maven.plugin.classpath"/>
								</classpath>
							</taskdef>
							<instrument verbose="true">
								<fileset dir="${project.build.outputDirectory}">
									<include name="**/flushing/*.class"/>
								</fileset>
							</instrument>
						</tasks>
					</configuration>
					<phase>process-classes</phase>
					<goals>
						<goal>run</goal>
					</goals>
				</execution>
			</executions>
			<dependencies>
				<dependency>
					<groupId>org.hibernate</groupId>
					<artifactId>hibernate-core</artifactId>
					<version>${hibernate.version}</version>
				</dependency>
				<dependency>
					<groupId>org.javassist</groupId>
					<artifactId>javassist</artifactId>
					<version>${javassist.version}</version>
				</dependency>
			</dependencies>
		</plugin>
	</plugins>
</build>

So for the following entity source class:

@Entity
public class EnhancedOrderLine {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private Long number;

    private String orderedBy;

    private Date orderedOn;

    public Long getId() {
        return id;
    }

    public Long getNumber() {
        return number;
    }

    public void setNumber(Long number) {
        this.number = number;
    }

    public String getOrderedBy() {
        return orderedBy;
    }

    public void setOrderedBy(String orderedBy) {
        this.orderedBy = orderedBy;
    }

    public Date getOrderedOn() {
        return orderedOn;
    }

    public void setOrderedOn(Date orderedOn) {
        this.orderedOn = orderedOn;
    }
}

During build-time the following class is generated:

@Entity
public class EnhancedOrderLine implements FieldHandled {

  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  private Long id;
  private Long number;
  private String orderedBy;
  private Date orderedOn;
  private transient FieldHandler $JAVASSIST_READ_WRITE_HANDLER;

  public Long getId() {
    return $javassist_read_id();
  }

  public Long getNumber() {
    return $javassist_read_number();
  }

  public void setNumber(Long number) {
    $javassist_write_number(number);
  }

  public String getOrderedBy() {
    return $javassist_read_orderedBy();
  }

  public void setOrderedBy(String orderedBy) {
    $javassist_write_orderedBy(orderedBy);
  }

  public Date getOrderedOn() {
    return $javassist_read_orderedOn();
  }

  public void setOrderedOn(Date orderedOn) {
    $javassist_write_orderedOn(orderedOn);
  }

  public FieldHandler getFieldHandler() {
    return this.$JAVASSIST_READ_WRITE_HANDLER;
  }

  public void setFieldHandler(FieldHandler paramFieldHandler) {
    this.$JAVASSIST_READ_WRITE_HANDLER = paramFieldHandler;
  }

  public Long $javassist_read_id() {
    if (getFieldHandler() == null)
      return this.id;
  }

  public void $javassist_write_id(Long paramLong) {
    if (getFieldHandler() == null) {
      this.id = paramLong;
      return;
    }
    this.id = ((Long)getFieldHandler().writeObject(this, "id", this.id, paramLong));
  }

  public Long $javassist_read_number() {
    if (getFieldHandler() == null)
      return this.number;
  }

  public void $javassist_write_number(Long paramLong) {
    if (getFieldHandler() == null) {
      this.number = paramLong;
      return;
    }
    this.number = ((Long)getFieldHandler().writeObject(this, "number", this.number, paramLong));
  }

  public String $javassist_read_orderedBy() {
    if (getFieldHandler() == null)
      return this.orderedBy;
  }

  public void $javassist_write_orderedBy(String paramString) {
    if (getFieldHandler() == null) {
      this.orderedBy = paramString;
      return;
    }
    this.orderedBy = ((String)getFieldHandler().writeObject(this, "orderedBy", this.orderedBy, paramString));
  }

  public Date $javassist_read_orderedOn() {
    if (getFieldHandler() == null)
      return this.orderedOn;
  }

  public void $javassist_write_orderedOn(Date paramDate) {
    if (getFieldHandler() == null) {
      this.orderedOn = paramDate;
      return;
    }
    this.orderedOn = ((Date)getFieldHandler().writeObject(this, "orderedOn", this.orderedOn, paramDate));
  }
}

Although the org.hibernate.bytecode.instrumentation.spi.AbstractFieldInterceptor manages to intercept dirty fields, this info is never really enquired during dirtiness tracking.

The InstrumentTask bytecode enhancement can only tell whether an entity is dirty, lacking support for indicating which properties have been modified, therefore making the InstrumentTask more suitable for “No-proxy” LAZY fetching strategy.

hibernate-enhance-maven-plugin

Hibernate 4.2.8 added support for a dedicated Maven bytecode enhancement plugin.

The Maven bytecode enhancement plugin is easy to configure:

<build>
    <plugins>
        <plugin>
            <groupId>org.hibernate.orm.tooling</groupId>
            <artifactId>hibernate-enhance-maven-plugin</artifactId>
            <executions>
                 <execution>
                     <phase>compile</phase>
                     <goals>
                         <goal>enhance</goal>
                     </goals>
                 </execution>
             </executions>
        </plugin>
    </plugins>
</build>

During project build-time, the following class is being generated:

@Entity
public class EnhancedOrderLine
        implements ManagedEntity, PersistentAttributeInterceptable, SelfDirtinessTracker {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
    private Long number;
    private String orderedBy;
    private Date orderedOn;

    @Transient
    private transient PersistentAttributeInterceptor $$_hibernate_attributeInterceptor;

    @Transient
    private transient Set $$_hibernate_tracker;

    @Transient
    private transient CollectionTracker $$_hibernate_collectionTracker;

    @Transient
    private transient EntityEntry $$_hibernate_entityEntryHolder;

    @Transient
    private transient ManagedEntity $$_hibernate_previousManagedEntity;

    @Transient
    private transient ManagedEntity $$_hibernate_nextManagedEntity;

    public Long getId() {
        return $$_hibernate_read_id();
    }

    public Long getNumber() {
        return $$_hibernate_read_number();
    }

    public void setNumber(Long number) {
        $$_hibernate_write_number(number);
    }

    public String getOrderedBy() {
        return $$_hibernate_read_orderedBy();
    }

    public void setOrderedBy(String orderedBy) {
        $$_hibernate_write_orderedBy(orderedBy);
    }

    public Date getOrderedOn() {
        return $$_hibernate_read_orderedOn();
    }

    public void setOrderedOn(Date orderedOn) {
        $$_hibernate_write_orderedOn(orderedOn);
    }

    public PersistentAttributeInterceptor $$_hibernate_getInterceptor() {
        return this.$$_hibernate_attributeInterceptor;
    }

    public void $$_hibernate_setInterceptor(PersistentAttributeInterceptor paramPersistentAttributeInterceptor) {
        this.$$_hibernate_attributeInterceptor = paramPersistentAttributeInterceptor;
    }

    public void $$_hibernate_trackChange(String paramString) {
        if (this.$$_hibernate_tracker == null)
            this.$$_hibernate_tracker = new HashSet();
        if (!this.$$_hibernate_tracker.contains(paramString))
            this.$$_hibernate_tracker.add(paramString);
    }

    private boolean $$_hibernate_areCollectionFieldsDirty() {
        return ($$_hibernate_getInterceptor() != null) && (this.$$_hibernate_collectionTracker != null);
    }

    private void $$_hibernate_getCollectionFieldDirtyNames(Set paramSet) {
        if (this.$$_hibernate_collectionTracker == null)
            return;
    }

    public boolean $$_hibernate_hasDirtyAttributes() {
        return ((this.$$_hibernate_tracker == null) || (this.$$_hibernate_tracker.isEmpty())) && ($$_hibernate_areCollectionFieldsDirty());
    }

    private void $$_hibernate_clearDirtyCollectionNames() {
        if (this.$$_hibernate_collectionTracker == null)
            this.$$_hibernate_collectionTracker = new CollectionTracker();
    }

    public void $$_hibernate_clearDirtyAttributes() {
        if (this.$$_hibernate_tracker != null)
            this.$$_hibernate_tracker.clear();
        $$_hibernate_clearDirtyCollectionNames();
    }

    public Set<String> $$_hibernate_getDirtyAttributes() {
        if (this.$$_hibernate_tracker == null)
            this.$$_hibernate_tracker = new HashSet();
        $$_hibernate_getCollectionFieldDirtyNames(this.$$_hibernate_tracker);
        return this.$$_hibernate_tracker;
    }

    private Long $$_hibernate_read_id() {
        if ($$_hibernate_getInterceptor() != null)
            this.id = ((Long) $$_hibernate_getInterceptor().readObject(this, "id", this.id));
        return this.id;
    }

    private void $$_hibernate_write_id(Long paramLong) {
        if (($$_hibernate_getInterceptor() == null) || ((this.id == null) || (this.id.equals(paramLong))))
            break label39;
        $$_hibernate_trackChange("id");
        label39:
        Long localLong = paramLong;
        if ($$_hibernate_getInterceptor() != null)
            localLong = (Long) $$_hibernate_getInterceptor().writeObject(this, "id", this.id, paramLong);
        this.id = localLong;
    }

    private Long $$_hibernate_read_number() {
        if ($$_hibernate_getInterceptor() != null)
            this.number = ((Long) $$_hibernate_getInterceptor().readObject(this, "number", this.number));
        return this.number;
    }

    private void $$_hibernate_write_number(Long paramLong) {
        if (($$_hibernate_getInterceptor() == null) || ((this.number == null) || (this.number.equals(paramLong))))
            break label39;
        $$_hibernate_trackChange("number");
        label39:
        Long localLong = paramLong;
        if ($$_hibernate_getInterceptor() != null)
            localLong = (Long) $$_hibernate_getInterceptor().writeObject(this, "number", this.number, paramLong);
        this.number = localLong;
    }

    private String $$_hibernate_read_orderedBy() {
        if ($$_hibernate_getInterceptor() != null)
            this.orderedBy = ((String) $$_hibernate_getInterceptor().readObject(this, "orderedBy", this.orderedBy));
        return this.orderedBy;
    }

    private void $$_hibernate_write_orderedBy(String paramString) {
        if (($$_hibernate_getInterceptor() == null) || ((this.orderedBy == null) || (this.orderedBy.equals(paramString))))
            break label39;
        $$_hibernate_trackChange("orderedBy");
        label39:
        String str = paramString;
        if ($$_hibernate_getInterceptor() != null)
            str = (String) $$_hibernate_getInterceptor().writeObject(this, "orderedBy", this.orderedBy, paramString);
        this.orderedBy = str;
    }

    private Date $$_hibernate_read_orderedOn() {
        if ($$_hibernate_getInterceptor() != null)
            this.orderedOn = ((Date) $$_hibernate_getInterceptor().readObject(this, "orderedOn", this.orderedOn));
        return this.orderedOn;
    }

    private void $$_hibernate_write_orderedOn(Date paramDate) {
        if (($$_hibernate_getInterceptor() == null) || ((this.orderedOn == null) || (this.orderedOn.equals(paramDate))))
            break label39;
        $$_hibernate_trackChange("orderedOn");
        label39:
        Date localDate = paramDate;
        if ($$_hibernate_getInterceptor() != null)
            localDate = (Date) $$_hibernate_getInterceptor().writeObject(this, "orderedOn", this.orderedOn, paramDate);
        this.orderedOn = localDate;
    }

    public Object $$_hibernate_getEntityInstance() {
        return this;
    }

    public EntityEntry $$_hibernate_getEntityEntry() {
        return this.$$_hibernate_entityEntryHolder;
    }

    public void $$_hibernate_setEntityEntry(EntityEntry paramEntityEntry) {
        this.$$_hibernate_entityEntryHolder = paramEntityEntry;
    }

    public ManagedEntity $$_hibernate_getPreviousManagedEntity() {
        return this.$$_hibernate_previousManagedEntity;
    }

    public void $$_hibernate_setPreviousManagedEntity(ManagedEntity paramManagedEntity) {
        this.$$_hibernate_previousManagedEntity = paramManagedEntity;
    }

    public ManagedEntity $$_hibernate_getNextManagedEntity() {
        return this.$$_hibernate_nextManagedEntity;
    }

    public void $$_hibernate_setNextManagedEntity(ManagedEntity paramManagedEntity) {
        this.$$_hibernate_nextManagedEntity = paramManagedEntity;
    }
}

It’s easy to realize that the new bytecode enhancement logic is different than the one generated by the previous InstrumentTask.

Like the custom dirty checking mechanism, the new bytecode enhancement version records what properties have changed, not just a simple dirty boolean flag. The enhancement logic marks dirty fields upon changing. This approach is much more efficient than having to compare all current property values against the load-time snapshot data.

Are we there yet?

Even if the entity class bytecode is being enhanced, somehow with Hibernate 4.3.6 there are still missing puzzle pieces.

For instance, when calling setNumber(Long number) the following intercepting method gets executed:

private void $$_hibernate_write_number(Long paramLong) {
	if (($$_hibernate_getInterceptor() == null) || ((this.number == null) || (this.number.equals(paramLong))))
		break label39;
	$$_hibernate_trackChange("number");
	label39:
	Long localLong = paramLong;
	if ($$_hibernate_getInterceptor() != null)
		localLong = (Long) $$_hibernate_getInterceptor().writeObject(this, "number", this.number, paramLong);
	this.number = localLong;
}    

In my examples, $$_hibernate_getInterceptor() is always null, which bypasses the $$_hibernate_trackChange(“number”) call. Because of this, no dirty property is going to be recorded, forcing Hibernate to fall-back to the default deep-comparison dirty checking algorithm.

So, even if Hibernate has made considerable progress in this particular area, the dirty checking enhancement still requires additional work to become readily available.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

From mostly interested to most interesting

No money can buy this feeling

Being appreciated for my work is what pushes me forward for contributing more. I am proud to be nominated as one of the most interesting developers.

Ever since I started this blog, helping others on Stack Overflow or contributing to Open Source Software many good things have happened.

Being mentioned on the same page with John Sonmez or Christoph Engelbert (Hazelcast) is more than flattering. With all optimism, I’ve never thought I’d get this nomination.

Thank you Jelastic, you made my day!

For more, you can read the full article.

How to customize Hibernate dirty checking mechanism

Introduction

In my previous article I described the Hibernate automatic dirty checking mechanism. While you should always prefer it, there might be times when you want to add your own custom dirtiness detection strategy.

Custom dirty checking strategies

Hibernate offers the following customization mechanisms:

A manual dirty checking exercise

As an exercise, I’ll build a manual dirty checking mechanism to illustrate how easy you can customize the change detection strategy:

Self dirty checking entity

First, I’ll define a DirtyAware interface all manual dirty checking entities will have to implement:

public interface DirtyAware {

    Set<String> getDirtyProperties();

    void clearDirtyProperties();
}

Next I am going to encapsulate our current dirty checking logic in a base class:

public abstract class SelfDirtyCheckingEntity implements DirtyAware {

    private final Map<String, String> setterToPropertyMap = new HashMap<String, String>();

    @Transient
    private Set<String> dirtyProperties = new LinkedHashSet<String>();

    public SelfDirtyCheckingEntity() {
        try {
            BeanInfo beanInfo = Introspector.getBeanInfo(getClass());
            PropertyDescriptor[] descriptors = beanInfo.getPropertyDescriptors();
            for (PropertyDescriptor descriptor : descriptors) {
                Method setter = descriptor.getWriteMethod();
                if (setter != null) {
                    setterToPropertyMap.put(setter.getName(), descriptor.getName());
                }
            }
        } catch (IntrospectionException e) {
            throw new IllegalStateException(e);
        }

    }

    @Override
    public Set<String> getDirtyProperties() {
        return dirtyProperties;
    }

    @Override
    public void clearDirtyProperties() {
        dirtyProperties.clear();
    }

    protected void markDirtyProperty() {
        String methodName = Thread.currentThread().getStackTrace()[2].getMethodName();
        dirtyProperties.add(setterToPropertyMap.get(methodName));
    }
}

All manual dirty checking entities will have to extend this base class and explicitly flag the dirty properties through a call to the markDirtyProperty method.

The actual self dirty checking entity looks like this:

@Entity
@Table(name = "ORDER_LINE")
public class OrderLine extends SelfDirtyCheckingEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private Long number;

    private String orderedBy;

    private Date orderedOn;

    public Long getId() {
        return id;
    }

    public Long getNumber() {
        return number;
    }

    public void setNumber(Long number) {
        this.number = number;
        markDirtyProperty();
    }

    public String getOrderedBy() {
        return orderedBy;
    }

    public void setOrderedBy(String orderedBy) {
        this.orderedBy = orderedBy;
        markDirtyProperty();
    }

    public Date getOrderedOn() {
        return orderedOn;
    }

    public void setOrderedOn(Date orderedOn) {
        this.orderedOn = orderedOn;
        markDirtyProperty();
    }
}

Whenever a setter gets called, the associated property becomes dirty. For simplicity sake this simple exercise doesn’t cover the use case when we revert a property to its original value.

The dirty checking test

To test the self dirty checking mechanisms I’m going to run the following test case:

@Test
public void testDirtyChecking() {
    doInTransaction(new TransactionCallable<Void>() {
         @Override
         public Void execute(Session session) {
            OrderLine orderLine = new OrderLine();
            session.persist(orderLine);
            session.flush();
            orderLine.setNumber(123L);
            orderLine.setOrderedBy("Vlad");
            orderLine.setOrderedOn(new Date());
            session.flush();
            orderLine.setOrderedBy("Alex");
            return null;
        }
    });
}

The Hibernate Interceptor solution

The Hibernate Interceptor findDirty callback allows us to control the dirty properties discovery process. This method may return:

  • null, to delegate the dirty checking to Hibernate default strategy
  • an int[] array, containing the modified properties indicies

Our Hibernate dirty checking interceptor looks like this:

public class DirtyCheckingInterceptor extends EmptyInterceptor {
        @Override
        public int[] findDirty(Object entity, Serializable id, Object[] currentState, Object[] previousState, String[] propertyNames, Type[] types) {
            if(entity instanceof DirtyAware) {
                DirtyAware dirtyAware = (DirtyAware) entity;
                Set<String> dirtyProperties = dirtyAware.getDirtyProperties();
                int[] dirtyPropertiesIndices = new int[dirtyProperties.size()];
                List<String> propertyNamesList = Arrays.asList(propertyNames);
                int i = 0;
                for(String dirtyProperty : dirtyProperties) {
                    LOGGER.info("The {} property is dirty", dirtyProperty);
                    dirtyPropertiesIndices[i++] = propertyNamesList.indexOf(dirtyProperty);
                }
                dirtyAware.clearDirtyProperties();
                return dirtyPropertiesIndices;
            }
            return super.findDirty(entity, id, currentState, previousState, propertyNames, types);
        }
    }

When passing this interceptor to our current SessionFactory configuration we get the following output:

INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The number property is dirty
INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedBy property is dirty
INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedOn property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:1 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Vlad,2014-08-20 07:35:05.649,1]} 
INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedBy property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Alex,2014-08-20 07:35:05.649,1]}

The manual dirty checking mechanism has detected incoming changes and propagated them to the flushing event listener.

The lesser-known CustomEntityDirtinessStrategy

The CustomEntityDirtinessStrategy is a recent Hibernate API addition, allowing us to provide an application specific dirty checking mechanism. This interface can be implemented as follows:

    public static class EntityDirtinessStrategy implements CustomEntityDirtinessStrategy {

        @Override
        public boolean canDirtyCheck(Object entity, EntityPersister persister, Session session) {
            return entity instanceof DirtyAware;
        }

        @Override
        public boolean isDirty(Object entity, EntityPersister persister, Session session) {
            return !cast(entity).getDirtyProperties().isEmpty();
        }

        @Override
        public void resetDirty(Object entity, EntityPersister persister, Session session) {
            cast(entity).clearDirtyProperties();
        }

        @Override
        public void findDirty(Object entity, EntityPersister persister, Session session, DirtyCheckContext dirtyCheckContext) {
            final DirtyAware dirtyAware = cast(entity);
            dirtyCheckContext.doDirtyChecking(
                    new AttributeChecker() {
                        @Override
                        public boolean isDirty(AttributeInformation attributeInformation) {
                            String propertyName = attributeInformation.getName();
                            boolean dirty = dirtyAware.getDirtyProperties().contains( propertyName );
                            if (dirty) {
                                LOGGER.info("The {} property is dirty", propertyName);
                            }
                            return dirty;
                        }
                    }
            );
        }

        private DirtyAware cast(Object entity) {
            return DirtyAware.class.cast(entity);
        }
    }

To register the CustomEntityDirtinessStrategy implementation we have to set the following Hibernate property:

properties.setProperty("hibernate.entity_dirtiness_strategy", EntityDirtinessStrategy.class.getName());

Running our test yields the following output:

INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The number property is dirty
INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedBy property is dirty
INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedOn property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:1 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Vlad,2014-08-20 12:51:30.068,1]} 
INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedBy property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Alex,2014-08-20 12:51:30.068,1]} 

Conclusion

Although the default field-level checking or the bytecode instrumentation alternative are sufficient for most applications, there might be times when you want to gain control over the change detection process. On a long-term project, it’s not uncommon to customize certain built-in mechanisms, to satisfy exceptional quality of service requirements. A framework adoption decision should also consider the framework extensibility and customization support.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The anatomy of Hibernate dirty checking

Introduction

The persistence context enqueues entity state transitions that get translated to database statements upon flushing. For managed entities, Hibernate can auto-detect incoming changes and schedule SQL UPDATES on our behalf. This mechanism is called automatic dirty checking.

The default dirty checking strategy

By default Hibernate checks all managed entity properties. Every time an entity is loaded, Hibernate makes an additional copy of all entity property values. At flush time, every managed entity property is matched against the loading-time snapshot value:

DefaultFlushEventFlow

So the number of individual dirty checks is given by the following formula:

N = \sum\limits_{k=1}^n p_{k}

where

n = The number of managed entities
p = The number of properties of a given entity

Even if only one property of a single entity has ever changed, Hibernate will still check all managed entities. For a large number of managed entities, the default dirty checking mechanism may have a significant CPU and memory footprint. Since the initial entity snapshot is held separately, the persistence context requires twice as much memory as all managed entities would normally occupy.

Bytecode instrumentation

A more efficient approach would be to mark dirty properties upon value changing. Analogue to the original deep comparison strategy, it’s good practice to decouple the domain model structures from the change detection logic. The automatic entity change detection mechanism is a cross-cutting concern, that can be woven either at build-time or at runtime.

The entity class can be appended with bytecode level instructions implementing the automatic dirty checking mechanism.

Weaving types

The bytecode enhancement can happen at:

  • Build-time

    After the hibernate entities are compiled, the build tool (e.g. ANT, Maven) will insert bytecode level instructions into each compiled entity class. Because the classes are enhanced at build-time, this process exhibits no extra runtime penalty. Testing can be done against enhanced class versions, so that the actual production code is validated before the project gets built.

  • Runtime

    The runtime weaving can be done using:

Towards a default bytecode enhancement dirty checking

Hibernate 3 has been offering bytecode instrumentation through an ANT target but it never became mainstream and most Hibernate projects are still currently using the default deep comparison approach.

While other JPA providers (e.g. OpenJPA, DataNucleus) have been favouring the bytecode enhancement approach, Hibernate has only recently started moving in this direction, offering better build-time options and even custom dirty checking callbacks.

In my next post I’ll show you how you can customize the dirty checking mechanism with your own application specific strategy.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The dark side of Hibernate AUTO flush

Introduction

Now that I described the the basics of JPA and Hibernate flush strategies, I can continue unraveling the surprising behavior of Hibernate’s AUTO flush mode.

Not all queries trigger a Session flush

Many would assume that Hibernate always flushes the Session before any executing query. While this might have been a more intuitive approach, and probably closer to the JPA’s AUTO FlushModeType, Hibernate tries to optimize that. If the current executed query is not going to hit the pending SQL INSERT/UPDATE/DELETE statements then the flush is not strictly required.

As stated in the reference documentation, the AUTO flush strategy may sometimes synchronize the current persistence context prior to a query execution. It would have been more intuitive if the framework authors had chosen to name it FlushMode.SOMETIMES.

JPQL/HQL and SQL

Like many other ORM solutions, Hibernate offers a limited Entity querying language (JPQL/HQL) that’s very much based on SQL-92 syntax.

The entity query language is translated to SQL by the current database dialect and so it must offer the same functionality across different database products. Since most database systems are SQL-92 complaint, the Entity Query Language is an abstraction of the most common database querying syntax.

While you can use the Entity Query Language in many use cases (selecting Entities and even projections), there are times when its limited capabilities are no match for an advanced querying request. Whenever we want to make use of some specific querying techniques, such as:

we have no other option, but to run native SQL queries.

Hibernate is a persistence framework. Hibernate was never meant to replace SQL. If some query is better expressed in a native query, then it’s not worth sacrificing application performance on the altar of database portability.

AUTO flush and HQL/JPQL

First we are going to test how the AUTO flush mode behaves when an HQL query is about to be executed. For this we define the following unrelated entities:

FlushAUTOEntities

The test will execute the following actions:

  • A Person is going to be persisted.
  • Selecting User(s) should not trigger a the flush.
  • Querying for Person, the AUTO flush should trigger the entity state transition synchronization (A person INSERT should be executed prior to executing the select query).
Product product = new Product();
session.persist(product);
assertEquals(0L,  session.createQuery("select count(id) from User").uniqueResult());
assertEquals(product.getId(), session.createQuery("select p.id from Product p").uniqueResult());

Giving the following SQL output:

[main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: f76f61e2-f3e3-4ea4-8f44-82e9804ceed0, using strategy: org.hibernate.id.UUIDGenerator
Query:{[select count(user0_.id) as col_0_0_ from user user0_][]} 
Query:{[insert into product (color, id) values (?, ?)][12,f76f61e2-f3e3-4ea4-8f44-82e9804ceed0]} 
Query:{[select product0_.id as col_0_0_ from product product0_][]}

As you can see, the User select hasn’t triggered the Session flush. This is because Hibernate inspects the current query space against the pending table statements. If the current executing query doesn’t overlap with the unflushed table statements, the a flush can be safely ignored.

HQL can detect the Product flush even for:

  • Sub-selects

    session.persist(product);
    assertEquals(0L,  session.createQuery(
        "select count(*) " +
        "from User u " +
        "where u.favoriteColor in (select distinct(p.color) from Product p)").uniqueResult());
    

    Resulting in a proper flush call:

    Query:{[insert into product (color, id) values (?, ?)][Blue,2d9d1b4f-eaee-45f1-a480-120eb66da9e8]} 
    Query:{[select count(*) as col_0_0_ from user user0_ where user0_.favoriteColor in (select distinct product1_.color from product product1_)][]}
    
  • Or theta-style joins

    session.persist(product);
    assertEquals(0L,  session.createQuery(
        "select count(*) " +
        "from User u, Product p " +
        "where u.favoriteColor = p.color").uniqueResult());
    

    Triggering the expected flush :

    Query:{[insert into product (color, id) values (?, ?)][Blue,4af0b843-da3f-4b38-aa42-1e590db186a9]} 
    Query:{[select count(*) as col_0_0_ from user user0_ cross join product product1_ where user0_.favoriteColor=product1_.color][]} 
    

The reason why it works is because Entity Queries are parsed and translated to SQL queries. Hibernate cannot reference a non existing table, therefore it always knows the database tables an HQL/JPQL query will hit.

So Hibernate is only aware of those tables we explicitly referenced in our HQL query. If the current pending DML statements imply database triggers or database level cascading, Hibernate won’t be aware of those. So even for HQL, the AUTO flush mode can cause consistency issues.

AUTO flush and native SQL queries

When it comes to native SQL queries, things are getting much more complicated. Hibernate cannot parse SQL queries, because it only supports a limited database query syntax. Many database systems offer proprietary features that are beyond Hibernate Entity Query capabilities.

Querying the Person table, with a native SQL query is not going to trigger the flush, causing an inconsistency issue:

Product product = new Product();
session.persist(product);
assertNull(session.createSQLQuery("select id from product").uniqueResult());
DEBUG [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: 718b84d8-9270-48f3-86ff-0b8da7f9af7c, using strategy: org.hibernate.id.UUIDGenerator
Query:{[select id from product][]} 
Query:{[insert into product (color, id) values (?, ?)][12,718b84d8-9270-48f3-86ff-0b8da7f9af7c]} 

The newly persisted Product was only inserted during transaction commit, because the native SQL query didn’t triggered the flush. This is major consistency problem, one that’s hard to debug or even foreseen by many developers. That’s one more reason for always inspecting auto-generated SQL statements.

The same behaviour is observed even for named native queries:

@NamedNativeQueries(
    @NamedNativeQuery(name = "product_ids", query = "select id from product")
)
assertNull(session.getNamedQuery("product_ids").uniqueResult());

So even if the SQL query is pre-loaded, Hibernate won’t extract the associated query space for matching it against the pending DML statements.

Overruling the current flush strategy

Even if the current Session defines a default flush strategy, you can always override it on a query basis.

Query flush mode

The ALWAYS mode is going to flush the persistence context before any query execution (HQL or SQL). This time, Hibernate applies no optimization and all pending entity state transitions are going to be synchronized with the current database transaction.

assertEquals(product.getId(), session.createSQLQuery("select id from product").setFlushMode(FlushMode.ALWAYS).uniqueResult());

Instructing Hibernate which tables should be syncronized

You could also add a synchronization rule on your current executing SQL query. Hibernate will then know what database tables need to be syncronzied prior to executing the query. This is also useful for second level caching as well.

assertEquals(product.getId(), session.createSQLQuery("select id from product").addSynchronizedEntityClass(Product.class).uniqueResult());

Conclusion

The AUTO flush mode is tricky and fixing consistency issues on a query basis is a maintainer’s nightmare. If you decide to add a database trigger, you’ll have to check all Hibernate queries to make sure they won’t end up running against stale data.

My suggestion is to use the ALWAYS flush mode, even if Hibernate authors warned us that:

this strategy is almost always unnecessary and inefficient.

Inconsistency is much more of an issue that some occasional premature flushes. While mixing DML operations and queries may cause unnecessary flushing this situation is not that difficult to mitigate. During a session transaction, it’s best to execute queries at the beginning (when no pending entity state transitions are to be synchronized) and towards the end of the transaction (when the current persistence context is going to be flushed anyway).

The entity state transition operations should be pushed towards the end of the transaction, trying to avoid interleaving them with query operations (therefore preventing a premature flush trigger).

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

A beginner’s guide to JPA/Hibernate flush strategies

Introduction

In my previous post I introduced the entity state transitions Object-relational mapping paradigm.

All managed entity state transitions are translated to associated database statements when the current Persistence Context gets flushed. Hibernate’s flush behavior is not always as obvious as one might think.

Write-behind

Hibernate tries to defer the Persistence Context flushing up until the last possible moment. This strategy has been traditionally known as transactional write-behind.

The write-behind is more related to Hibernate flushing rather than any logical or physical transaction. During a transaction, the flush may occur multiple times.

The flushed changes are visible only for the current database transaction. Until the current transaction is committed, no change is visible by other concurrent transactions.

The persistence context, also known as the first level cache, acts as a buffer between the current entity state transitions and the database.

In caching theory, the write-behind synchronization requires that all changes happen against the cache, whose responsibility is to eventually synchronize with the backing store.

Reducing lock contention

Every DML statement runs inside a database transaction. Based on the current database transaction isolation level, locks (shared or explicit) may be acquired for the current selected/modified table rows.

Reducing the lock holding holding time lowers the dead-lock probability, and according to the scalability theory, it increases throughput. Locks always introduce serial executions, and according to Amdahl’s law, the maximum speedup is inversely proportional with the serial part of the currently executing program.

Even in READ_COMMITTED isolation level, UPDATE and DELETE statements acquire locks. This behavior prevents other concurring transactions from reading uncommitted changes or modify the rows in question.

So, deferring locking statements (UPDATE/DELETE) may increase performance, but we must make sure that data consistency is not affected whatsoever.

Batching

Postponing the entity state transition synchronization has another major advantage. Since all changes are being flushed at once, Hibernate may benefit from the JDBC batching optimization.

Batching improves performance by grouping multiple DML statements into a single operation, therefore reducing database round-trips.

Read-your-own-writes consistency

Since queries are always running against the database (unless second level query cache is being hit), we need to make sure that all pending changes are synchronized before the query starts running.

Therefore, both JPA and Hibernate define a flush-before-query synchronization strategy.

From JPA to Hibernate flushing strategies

JPA FlushModeType Hibernate FlushMode Hibernate implementation details
AUTO AUTO The Session is sometimes flushed before query execution.
COMMIT COMMIT The Session is only flushed prior to a transaction commit.
ALWAYS The Session is always flushed before query execution.
MANUAL The Session can only be manually flushed.
NEVER Deprecated. Use MANUAL instead. This was the original name given to manual flushing, but it was misleading users into thinking that the Session won’t ever be flushed.

Current Flush scope

The Persistence Context defines a default flush mode, that can be overridden upon Hibernate Session creation. Queries can also take a flush strategy, therefore overruling the current Persistence Context flush mode.

Scope Hibernate JPA
Persistence Context Session EntityManager
Query Query
Criteria
Query
TypedQuery

Stay tuned

In my next post, you’ll find out that Hibernate FlushMode.AUTO breaks data consistency for SQL queries and you’ll see how you can overcome this shortcoming.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.