The anatomy of Hibernate dirty checking

Introduction

The persistence context enqueues entity state transitions that get translated to database statements upon flushing. For managed entities, Hibernate can auto-detect incoming changes and schedule SQL UPDATES on our behalf. This mechanism is called automatic dirty checking.

The default dirty checking strategy

By default Hibernate checks all managed entity properties. Every time an entity is loaded, Hibernate makes an additional copy of all entity property values. At flush time, every managed entity property is matched against the loading-time snapshot value:

DefaultFlushEventFlow

So the number of individual dirty checks is given by the following formula:

N = \sum\limits_{k=1}^n p_{k}

where

n = The number of managed entities
p = The number of entities of a given entity

Even if only one property of a single entity has ever changed, Hibernate will still check all managed entities. For a large number of managed entities, the default dirty checking mechanism may have a significant CPU and memory footprint. Since the initial entity snapshot is held separately, the persistence context requires twice as much memory as all managed entities would normally occupy.

Bytecode instrumentation

A more efficient approach would be to mark dirty properties upon value changing. Analogue to the original deep comparison strategy, it’s good practice to decouple the domain model structures from the change detection logic. The automatic entity change detection mechanism is a cross-cutting concern, that can be woven either at build-time or at runtime.

The entity class can be appended with bytecode level instructions implementing the automatic dirty checking mechanism.

Weaving types

The bytecode enhancement can happen at:

  • Build-time

    After the hibernate entities are compiled, the build tool (e.g. ANT, Maven) will insert bytecode level instructions into each compiled entity class. Because the classes are enhanced at build-time, this process exhibits no extra runtime penalty. Testing can be done against enhanced class versions, so that the actual production code is validated before the project gets built.

  • Runtime

    The runtime weaving can be done using:

Towards a default bytecode enhancement dirty checking

Hibernate 3 has been offering bytecode instrumentation through an ANT target but it never became mainstream and most Hibernate projects are still currently using the default deep comparison approach.

While other JPA providers (e.g. OpenJPA, DataNucleus) have been favouring the bytecode enhancement approach, Hibernate has only recently started moving in this direction, offering better build-time options and even custom dirty checking callbacks.

In my next post I’ll show you how you can customize the dirty checking mechanism with your own application specific strategy.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The dark side of Hibernate AUTO flush

Introduction

Now that I described the the basics of JPA and Hibernate flush strategies, I can continue unraveling the surprising behavior of Hibernate’s AUTO flush mode.

Not all queries trigger a Session flush

Many would assume that Hibernate always flushes the Session before any executing query. While this might have been a more intuitive approach, and probably closer to the JPA’s AUTO FlushModeType, Hibernate tries to optimize that. If the current executed query is not going to hit the pending SQL INSERT/UPDATE/DELETE statements then the flush is not strictly required.

As stated in the reference documentation, the AUTO flush strategy may sometimes synchronize the current persistence context prior to a query execution. It would have been more intuitive if the framework authors had chosen to name it FlushMode.SOMETIMES.

JPQL/HQL and SQL

Like many other ORM solutions, Hibernate offers a limited Entity querying language (JPQL/HQL) that’s very much based on SQL-92 syntax.

The entity query language is translated to SQL by the current database dialect and so it must offer the same functionality across different database products. Since most database systems are SQL-92 complaint, the Entity Query Language is an abstraction of the most common database querying syntax.

While you can use the Entity Query Language in many use cases (selecting Entities and even projections), there are times when its limited capabilities are no match for an advanced querying request. Whenever we want to make use of some specific querying techniques, such as:

we have no other option, but to run native SQL queries.

Hibernate is a persistence framework. Hibernate was never meant to replace SQL. If some query is better expressed in a native query, then it’s not worth sacrificing application performance on the altar of database portability.

AUTO flush and HQL/JPQL

First we are going to test how the AUTO flush mode behaves when an HQL query is about to be executed. For this we define the following unrelated entities:

FlushAUTOEntities

The test will execute the following actions:

  • A Person is going to be persisted.
  • Selecting User(s) should not trigger a the flush.
  • Querying for Person, the AUTO flush should trigger the entity state transition synchronization (A person INSERT should be executed prior to executing the select query).
Product product = new Product();
session.persist(product);
assertEquals(0L,  session.createQuery("select count(id) from User").uniqueResult());
assertEquals(product.getId(), session.createQuery("select p.id from Product p").uniqueResult());

Giving the following SQL output:

[main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: f76f61e2-f3e3-4ea4-8f44-82e9804ceed0, using strategy: org.hibernate.id.UUIDGenerator
Query:{[select count(user0_.id) as col_0_0_ from user user0_][]} 
Query:{[insert into product (color, id) values (?, ?)][12,f76f61e2-f3e3-4ea4-8f44-82e9804ceed0]} 
Query:{[select product0_.id as col_0_0_ from product product0_][]}

As you can see, the User select hasn’t triggered the Session flush. This is because Hibernate inspects the current query space against the pending table statements. If the current executing query doesn’t overlap with the unflushed table statements, the a flush can be safely ignored.

HQL can detect the Product flush even for:

  • Sub-selects

    session.persist(product);
    assertEquals(0L,  session.createQuery(
        "select count(*) " +
        "from User u " +
        "where u.favoriteColor in (select distinct(p.color) from Product p)").uniqueResult());
    

    Resulting in a proper flush call:

    Query:{[insert into product (color, id) values (?, ?)][Blue,2d9d1b4f-eaee-45f1-a480-120eb66da9e8]} 
    Query:{[select count(*) as col_0_0_ from user user0_ where user0_.favoriteColor in (select distinct product1_.color from product product1_)][]}
    
  • Or theta-style joins

    session.persist(product);
    assertEquals(0L,  session.createQuery(
        "select count(*) " +
        "from User u, Product p " +
        "where u.favoriteColor = p.color").uniqueResult());
    

    Triggering the expected flush :

    Query:{[insert into product (color, id) values (?, ?)][Blue,4af0b843-da3f-4b38-aa42-1e590db186a9]} 
    Query:{[select count(*) as col_0_0_ from user user0_ cross join product product1_ where user0_.favoriteColor=product1_.color][]} 
    

The reason why it works is because Entity Queries are parsed and translated to SQL queries. Hibernate cannot reference a non existing table, therefore it always knows the database tables an HQL/JPQL query will hit.

So Hibernate is only aware of those tables we explicitly referenced in our HQL query. If the current pending DML statements imply database triggers or database level cascading, Hibernate won’t be aware of those. So even for HQL, the AUTO flush mode can cause consistency issues.

AUTO flush and native SQL queries

When it comes to native SQL queries, things are getting much more complicated. Hibernate cannot parse SQL queries, because it only supports a limited database query syntax. Many database systems offer proprietary features that are beyond Hibernate Entity Query capabilities.

Querying the Person table, with a native SQL query is not going to trigger the flush, causing an inconsistency issue:

Product product = new Product();
session.persist(product);
assertNull(session.createSQLQuery("select id from product").uniqueResult());
DEBUG [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: 718b84d8-9270-48f3-86ff-0b8da7f9af7c, using strategy: org.hibernate.id.UUIDGenerator
Query:{[select id from product][]} 
Query:{[insert into product (color, id) values (?, ?)][12,718b84d8-9270-48f3-86ff-0b8da7f9af7c]} 

The newly persisted Product was only inserted during transaction commit, because the native SQL query didn’t triggered the flush. This is major consistency problem, one that’s hard to debug or even foreseen by many developers. That’s one more reason for always inspecting auto-generated SQL statements.

The same behaviour is observed even for named native queries:

@NamedNativeQueries(
    @NamedNativeQuery(name = "product_ids", query = "select id from product")
)
assertNull(session.getNamedQuery("product_ids").uniqueResult());

So even if the SQL query is pre-loaded, Hibernate won’t extract the associated query space for matching it against the pending DML statements.

Overruling the current flush strategy

Even if the current Session defines a default flush strategy, you can always override it on a query basis.

Query flush mode

The ALWAYS mode is going to flush the persistence context before any query execution (HQL or SQL). This time, Hibernate applies no optimization and all pending entity state transitions are going to be synchronized with the current database transaction.

assertEquals(product.getId(), session.createSQLQuery("select id from product").setFlushMode(FlushMode.ALWAYS).uniqueResult());

Instructing Hibernate which tables should be syncronized

You could also add a synchronization rule on your current executing SQL query. Hibernate will then know what database tables need to be syncronzied prior to executing the query. This is also useful for second level caching as well.

assertEquals(product.getId(), session.createSQLQuery("select id from product").addSynchronizedEntityClass(Product.class).uniqueResult());

Conclusion

The AUTO flush mode is tricky and fixing consistency issues on a query basis is a maintainer’s nightmare. If you decide to add a database trigger, you’ll have to check all Hibernate queries to make sure they won’t end up running against stale data.

My suggestion is to use the ALWAYS flush mode, even if Hibernate authors warned us that:

this strategy is almost always unnecessary and inefficient.

Inconsistency is much more of an issue that some occasional premature flushes. While mixing DML operations and queries may cause unnecessary flushing this situation is not that difficult to mitigate. During a session transaction, it’s best to execute queries at the beginning (when no pending entity state transitions are to be synchronized) and towards the end of the transaction (when the current persistence context is going to be flushed anyway).

The entity state transition operations should be pushed towards the end of the transaction, trying to avoid interleaving them with query operations (therefore preventing a premature flush trigger).

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

A beginner’s guide to JPA/Hibernate flush strategies

Introduction

In my previous post I introduced the entity state transitions Object-relational mapping paradigm.

All managed entity state transitions are translated to associated database statements when the current Persistence Context gets flushed. Hibernate’s flush behavior is not always as obvious as one might think.

Write-behind

Hibernate tries to defer the Persistence Context flushing up until the last possible moment. This strategy has been traditionally known as transactional write-behind.

The write-behind is more related to Hibernate flushing rather than any logical or physical transaction. During a transaction, the flush may occur multiple times.

The flushed changes are visible only for the current database transaction. Until the current transaction is committed, no change is visible by other concurrent transactions.

The persistence context, also known as the first level cache, acts as a buffer between the current entity state transitions and the database.

In caching theory, the write-behind synchronization requires that all changes happen against the cache, whose responsibility is to eventually synchronize with the backing store.

Reducing lock contention

Every DML statement runs inside a database transaction. Based on the current database transaction isolation level, locks (shared or explicit) may be acquired for the current selected/modified table rows.

Reducing the lock holding holding time lowers the dead-lock probability, and according to the scalability theory, it increases throughput. Locks always introduce serial executions, and according to Amdahl’s law, the maximum speedup is inversely proportional with the serial part of the currently executing program.

Even in READ_COMMITTED isolation level, UPDATE and DELETE statements acquire locks. This behavior prevents other concurring transactions from reading uncommitted changes or modify the rows in question.

So, deferring locking statements (UPDATE/DELETE) may increase performance, but we must make sure that data consistency is not affected whatsoever.

Batching

Postponing the entity state transition synchronization has another major advantage. Since all changes are being flushed at once, Hibernate may benefit from the JDBC batching optimization.

Batching improves performance by grouping multiple DML statements into a single operation, therefore reducing database round-trips.

Read-your-own-writes consistency

Since queries are always running against the database (unless second level query cache is being hit), we need to make sure that all pending changes are synchronized before the query starts running.

Therefore, both JPA and Hibernate define a flush-before-query synchronization strategy.

From JPA to Hibernate flushing strategies

JPA FlushModeType Hibernate FlushMode Hibernate implementation details
AUTO AUTO The Session is sometimes flushed before query execution.
COMMIT COMMIT The Session is only flushed prior to a transaction commit.
ALWAYS The Session is always flushed before query execution.
MANUAL The Session can only be manually flushed.
NEVER Deprecated. Use MANUAL instead. This was the original name given to manual flushing, but it was misleading users into thinking that the Session won’t ever be flushed.

Current Flush scope

The Persistence Context defines a default flush mode, that can be overridden upon Hibernate Session creation. Queries can also take a flush strategy, therefore overruling the current Persistence Context flush mode.

Scope Hibernate JPA
Persistence Context Session EntityManager
Query Query
Criteria
Query
TypedQuery

Stay tuned

In my next post, you’ll find out that Hibernate FlushMode.AUTO breaks data consistency for SQL queries and you’ll see how you can overcome this shortcoming.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

A beginner’s guide to JPA/Hibernate entity state transitions

Introduction

Hibernate shifts the developer mindset from SQL statements to entity state transitions. Once an entity is actively managed by Hibernate, all changes are going to be automatically propagated to the database.

Manipulating domain model entities (along with their associations) is much easier than writing and maintaining SQL statements. Without an ORM tool, adding a new column requires modifying all associated INSERT/UPDATE statements.

But Hibernate is no silver bullet either. Hibernate doesn’t free us from ever worrying about the actual executed SQL statements. Controlling Hibernate is not as straightforward as one might think and it’s mandatory to check all SQL statements Hibernate executes on our behalf.

The entity states

As I previously mentioned, Hibernate monitors currently attached entities. But for an entity to become managed, it must be in the right entity state.

First we must define all entity states:

  • New (Transient)

    A newly created object that hasn’t ever been associated with a Hibernate Session (a.k.a Persistence Context) and is not mapped to any database table row is considered to be in the New (Transient) state.

    To become persisted we need to either explicitly call the EntityManager#persist method or make use of the transitive persistence mechanism.

  • Persistent (Managed)

    A persistent entity has been associated with a database table row and it’s being managed by the current running Persistence Context. Any change made to such entity is going to be detected and propagated to the database (during the Session flush-time). With Hibernate, we no longer have to execute INSERT/UPDATE/DELETE statements. Hibernate employs a “transactional write-behind” working style and changes are synchronized at the very last responsible moment, during the current Session flush-time.

  • Detached

    Once the current running Persistence Context is closed all the previously managed entities become detached. Successive changes will no longer be tracked and no automatic database synchronization is going to happen.

    To associate a detached entity to an active Hibernate Session, you can choose one of the following options:

    • Reattaching

      Hibernate (but not JPA 2.1) supports reattaching through the Session#update method.

      A Hibernate Session can only associate one Entity object for a given database row. This is because the Persistence Context acts as an in-memory cache (first level cache) and only one value (entity) is associated to a given key (entity type and database identifier).

      An entity can be reattached only if there is no other JVM object (matching the same database row) already associated to the current Hibernate Session.

    • Merging

      The merge is going to copy the detached entity state (source) to a managed entity instance (destination). If the merging entity has no equivalent in the current Session, one will be fetched from the database.

      The detached object instance will continue to remain detached even after the merge operation.

  • Removed

    Although JPA demands that managed entities only are allowed to be removed, Hibernate can also delete detached entities (but only through a Session#delete method call).

    A removed entity is only scheduled for deletion and the actual database DELETE statement will be executed during Session flush-time.

Entity state transitions

To change one Entity state, we need to use one of the following entity management interfaces:

These interfaces define the entity state transition operations we must explicitly call to notify Hibernate of the entity state change. At flush-time the entity state transition is materialized into a database SQL statement (INSERT/UPDATE/DELETE).

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate hidden gem: the pooled-lo optimizer

Introduction

In this post we’ll uncover a sequence identifier generator combining identifier assignment efficiency and interoperability with other external systems (concurrently accessing the underlying database system).

Traditionally there have been two sequence identifier strategies to choose from.

  • The sequence identifier, always hitting the database for every new value assignment. Even with database sequence preallocation we have a significant database round-trip cost.
  • The seqhilo identifier, using the hi/lo algorithm. This generator calculates some identifier values in-memory, therefore reducing the database round-trip calls. The problem with this optimization technique is that the current database sequence value no longer reflects the current highest in-memory generated value. The database sequence is used as a bucket number, making it difficult for other systems to interoperate with the database table in question. Other applications must know the inner-workings of the hi/lo identifier strategy to properly generate non-clashing identifiers.

The enhanced identifiers

Hibernate offers a new class of identifier generators, addressing many shortcomings of the original ones. The enhanced identifier generators don’t come with a fixed identifier allocation strategy. The optimization strategy is configurable and we can even supply our own optimization implementation. By default Hibernate comes with the following built-in optimizers:

  • none: every identifier is fetched from the database, so it’s equivalent to the original sequence generator.
  • hi/lo: it uses the hi/lo algorithm and it’s equivalent to the original seqhilo generator.
  • pooled: This optimizer uses a hi/lo optimization strategy, but the current in-memory identifiers highest boundary is extracted from an actual database sequence value.
  • pooled-lo: It’s similar to the pooled optimizer but the database sequence value is used as the current in-memory lowest boundary

In the official release announcement, the pooled optimizers are advertised as being interoperable with other external systems:

Even if other applications are also inserting values, we’ll be perfectly safe because the SEQUENCE itself will handle applying this increment_size.

This is actually what we are looking for; an identifier generator that’s both efficient and doesn’t clash when other external systems are concurrently inserting rows in the same database tables.

Testing time

The following test is going to check how the new optimizers get along with other external database table inserts. In our case the external system will be some native JDBC insert statements on the same database table/sequence.

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 8; i++) {
			session.persist(newEntityInstance());
		}
		session.flush();
		assertEquals(8, ((Number) session.createSQLQuery("SELECT COUNT(*) FROM sequenceIdentifier").uniqueResult()).intValue());
		insertNewRow(session);
		insertNewRow(session);
		insertNewRow(session);
		assertEquals(11, ((Number) session.createSQLQuery("SELECT COUNT(*) FROM sequenceIdentifier").uniqueResult()).intValue());
		List<Number> ids = session.createSQLQuery("SELECT id FROM sequenceIdentifier").list();
		for (Number id : ids) {
			LOGGER.debug("Found id: {}", id);
		}
		for (int i = 0; i < 3; i++) {
			session.persist(newEntityInstance());
		}
		session.flush();
		return null;
	}
});

The pooled optimizer

We’ll first use the pooled optimizer strategy:

@Entity(name = "sequenceIdentifier")
public static class PooledSequenceIdentifier {

	@Id
	@GenericGenerator(name = "sequenceGenerator", strategy = "enhanced-sequence",
			parameters = {
					@org.hibernate.annotations.Parameter(name = "optimizer", value = "pooled"),
					@org.hibernate.annotations.Parameter(name = "initial_value", value = "1"),
					@org.hibernate.annotations.Parameter(name = "increment_size", value = "5")
			}
	)
	@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequenceGenerator")
	private Long id;
}

Running the test ends-up throwing the following exception:

DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][9]} 
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][10]} 
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][26]} 
WARN  [main]: o.h.e.j.s.SqlExceptionHelper - SQL Error: -104, SQLState: 23505
ERROR [main]: o.h.e.j.s.SqlExceptionHelper - integrity constraint violation: unique constraint or index violation; SYS_PK_10104 table: SEQUENCEIDENTIFIER
ERROR [main]: c.v.h.m.l.i.PooledSequenceIdentifierTest - Pooled optimizer threw
org.hibernate.exception.ConstraintViolationException: could not execute statement
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:72) ~[hibernate-core-4.3.5.Final.jar:4.3.5.Final]	
Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity constraint violation: unique constraint or index violation; SYS_PK_10104 table: SEQUENCEIDENTIFIER
	at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source) ~[hsqldb-2.3.2.jar:2.3.2]	

I am not sure if this is a bug or just a design limitation, but the pooled optimizer doesn’t meet the interoperability requirement.

To visualize what happens I summarized the sequence calls in the following diagram:

PooledOptimizer

When the pooled optimizer retrieves the current sequence value, it uses it to calculate the lowest in-memory boundary. The lowest value is the actual previous sequence value and this value might have been already used by some other external INSERT statement.

The pooled-lo optimizer

Fortunately, there is one more optimizer(not mentioned in the reference documentation) to be tested. The pooled-lo optimizer uses the current database sequence value as the lowest in-memory boundary, so other systems may freely use the next sequence values without risking identifier clashing:

@Entity(name = "sequenceIdentifier")
public static class PooledLoSequenceIdentifier {

	@Id
	@GenericGenerator(name = "sequenceGenerator", strategy = "enhanced-sequence",
			parameters = {
					@org.hibernate.annotations.Parameter(name = "optimizer",
							value = "pooled-lo"
					),
					@org.hibernate.annotations.Parameter(name = "initial_value", value = "1"),
					@org.hibernate.annotations.Parameter(name = "increment_size", value = "5")
			}
	)
	@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequenceGenerator")
	private Long id;
}

To better understand the inner-workings of this optimizer, the following diagram summarizes the identifier assignment process:

PooledLoOptimizer

Conclusion

A hidden gem is one of those great features that most don’t even know of its existence. The pooled-lo optimizer is extremely useful, yet most people don’t even know of its existence.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

From JPA to Hibernate’s legacy and enhanced identifier generators

JPA identifier generators

JPA defines the following identifier strategies:

Strategy Description
AUTO The persistence provider picks the most appropriate identifier strategy supported by the underlying database
IDENTITY Identifiers are assigned by a database IDENTITY column
SEQUENCE The persistence provider uses a database sequence for generating identifiers
TABLE The persistence provider uses a separate database table to emulate a sequence object

In my previous post I exampled the pros and cons of all these surrogate identifier strategies.

Identifier optimizers

While there’s not much application-side IDENTITY generator optimization (other than configuring database identity preallocation), the sequence identifiers offer much more flexibility in this regard. One of the most common optimization strategy is based on the hi/lo allocation algorithm.

For this Hibernate offers:

Generator Description
SequenceHiLoGenerator It uses a database sequence to generate the hi value, while the low value is incremented according to the hi/lo algorithm
TableHiLoGenerator A database table is used for generating the hi values. This generator is deprecated in favour of the MultipleHiLoPerTableGenerator, the enhanced TableGenerator or the SequenceStyleGenerator.
MultipleHiLo
PerTableGenerator
It’s a hi/lo table generator capable of using a single database table even for multiple identifier sequences.
SequenceStyleGenerator It’s an enhanced version of the previous sequence generator. It uses a sequence if the underlying database supports them. If the current database doesn’t support sequences it switches to using a table for generating sequence values. While the previous generators were having a predefined optimization algorithm, the enhanced generators can be configured with an optimizer strategy:

  • none: there is no optimizing strategy applied, so every identifier is fetched from the database
  • hi/lo: it uses the original hi/lo algorithm. This strategy makes it difficult for other systems to share the same identifier sequence, requiring other systems to implement the same identifier generation logic.
  • pooled: This optimizer uses a hi/lo optimization strategy, but instead of saving the current hi value it stores the current range upper boundary (or lower boundary – hibernate.id.optimizer.pooled.prefer_lo).

Pooled is the default optimizer strategy.

TableGenerator Like MultipleHiLoPerTableGenerator it may use one single table for multiple identifier generators, while offering configurable optimizer strategies.

Pooled is the default optimizer strategy.

JPA to Hibernate identifier mapping

Having such an abundant generator offer, we cannot help asking which of those is being used as the default JPA generators.

While the JPA specification doesn’t imply any particular optimization, Hibernate will prefer an optimized generator over one that always hit the database for every new identifier.

The JPA SequenceGenerator

We’ll define one entity configured with the SEQUENCE JPA identifier generator. A unit test is going to persists five such entities.

@Entity(name = "sequenceIdentifier")
public static class SequenceIdentifier {

    @Id
    @GeneratedValue(generator = "sequence", strategy=GenerationType.SEQUENCE)
    @SequenceGenerator(name = "sequence", allocationSize = 10)
    private Long id;
}

@Test
public void testSequenceIdentifierGenerator() {
    LOGGER.debug("testSequenceIdentifierGenerator");
    doInTransaction(new TransactionCallable<Void>() {
        @Override
        public Void execute(Session session) {
            for (int i = 0; i < 5; i++) {
                session.persist(new SequenceIdentifier());
            }
            session.flush();
            return null;
        }
    });
}

Running this test we’ll give us the following output

Query:{[call next value for hibernate_sequence][]} 
Generated identifier: 10, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 11, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 12, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 13, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 14, using strategy: org.hibernate.id.SequenceHiLoGenerator
Query:{[insert into sequenceIdentifier (id) values (?)][10]} 
Query:{[insert into sequenceIdentifier (id) values (?)][11]} 
Query:{[insert into sequenceIdentifier (id) values (?)][12]} 
Query:{[insert into sequenceIdentifier (id) values (?)][13]} 
Query:{[insert into sequenceIdentifier (id) values (?)][14]} 

Hibernate chooses to use the legacy SequenceHiLoGenerator for backward compatibility with all those applications that were developed prior to releasing the enhanced generators. Migrating a legacy application to the new generators is not an easy process, so the enhanced generators are a better alternative for new applications instead.

Hibernate prefers using the “seqhilo” generator by default, which is not an intuitive assumption, since many might expect the raw “sequence” generator (always calling the database sequence for every new identifier value).

To enable the enhanced generators we need to set the following Hibernate property:

properties.put("hibernate.id.new_generator_mappings", "true");

Giveing us the following output:

Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Generated identifier: 1, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 2, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 3, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 4, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 5, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Query:{[insert into sequenceIdentifier (id) values (?)][1]} 
Query:{[insert into sequenceIdentifier (id) values (?)][2]} 
Query:{[insert into sequenceIdentifier (id) values (?)][3]} 
Query:{[insert into sequenceIdentifier (id) values (?)][4]} 
Query:{[insert into sequenceIdentifier (id) values (?)][5]} 

The new SequenceStyleGenerator generates other identifier values than the legacy SequenceHiLoGenerator. The reason why the update statements differ between the old and the new generators is because the new generators default optimizer strategy is “pooled” while the old generators can only use the “hi/lo” strategy.

The JPA TableGenerator

@Entity(name = "tableIdentifier")
public static class TableSequenceIdentifier {

    @Id
    @GeneratedValue(generator = "table", strategy=GenerationType.TABLE)
    @TableGenerator(name = "table", allocationSize = 10)
    private Long id;
}

Running the following test:

@Test
public void testTableSequenceIdentifierGenerator() {
    LOGGER.debug("testTableSequenceIdentifierGenerator");
    doInTransaction(new TransactionCallable<Void>() {
        @Override
        public Void execute(Session session) {
            for (int i = 0; i < 5; i++) {
                session.persist(new TableSequenceIdentifier());
            }
            session.flush();
            return null;
        }
    });
}

Generates the following SQL statement output:

Query:{[select sequence_next_hi_value from hibernate_sequences where sequence_name = 'tableIdentifier' for update][]} 
Query:{[insert into hibernate_sequences(sequence_name, sequence_next_hi_value) values('tableIdentifier', ?)][0]} 
Query:{[update hibernate_sequences set sequence_next_hi_value = ? where sequence_next_hi_value = ? and sequence_name = 'tableIdentifier'][1,0]} 
Generated identifier: 1, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 2, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 3, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 4, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 5, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Query:{[insert into tableIdentifier (id) values (?)][1]} 
Query:{[insert into tableIdentifier (id) values (?)][2]} 
Query:{[insert into tableIdentifier (id) values (?)][3]} 
Query:{[insert into tableIdentifier (id) values (?)][4]} 
Query:{[insert into tableIdentifier (id) values (?)][5]}

As with the previous SEQUENCE example, Hibernate uses the MultipleHiLoPerTableGenerator to maintain the backward compatibility.

Switching to the enhanced id generators:

properties.put("hibernate.id.new_generator_mappings", "true");

Give us the following output:

Query:{[select tbl.next_val from hibernate_sequences tbl where tbl.sequence_name=? for update][tableIdentifier]} 
Query:{[insert into hibernate_sequences (sequence_name, next_val)  values (?,?)][tableIdentifier,1]} 
Query:{[update hibernate_sequences set next_val=?  where next_val=? and sequence_name=?][11,1,tableIdentifier]} 
Query:{[select tbl.next_val from hibernate_sequences tbl where tbl.sequence_name=? for update][tableIdentifier]} 
Query:{[update hibernate_sequences set next_val=?  where next_val=? and sequence_name=?][21,11,tableIdentifier]} 
Generated identifier: 1, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 2, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 3, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 4, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 5, using strategy: org.hibernate.id.enhanced.TableGenerator
Query:{[insert into tableIdentifier (id) values (?)][1]} 
Query:{[insert into tableIdentifier (id) values (?)][2]} 
Query:{[insert into tableIdentifier (id) values (?)][3]} 
Query:{[insert into tableIdentifier (id) values (?)][4]} 
Query:{[insert into tableIdentifier (id) values (?)][5]} 

You can see that the new enhanced TableGenerator was used this time.

For more about these optimization strategies you can read the original release note.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate Identity, Sequence and Table (Sequence) generator

Introduction

In my previous post I talked about different database identifier strategies. This post will compare the most common surrogate primary key strategies:

  • IDENTITY
  • SEQUENCE
  • TABLE (SEQUENCE)

IDENTITY

The IDENTITY type (included in the SQL:2003 standard) is supported by:

The IDENTITY generator allows an integer/bigint column to be auto-incremented on demand. The increment process happens outside of the current running transaction, so a roll-back may end-up discarding already assigned values (value gaps may happen).

The increment process is very efficient since it uses a database internal lightweight locking mechanism as opposed to the more heavyweight transactional course-grain locks.

The only drawback is that we can’t know the newly assigned value prior to executing the INSERT statement. This restriction is hinderingthe “transactional write behind” flushing strategy adopted by Hibernate. For this reason Hibernates disables the JDBC batch support for entities using the IDENTITY generator.

For the following examples we’ll enable Session Factory JDBC batching:

properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");
properties.put("hibernate.jdbc.batch_size", "2");

Let’s define an Entity using the IDENTITY generation strategy:

@Entity(name = "identityIdentifier")
public static class IdentityIdentifier {

	@Id
	@GeneratedValue(strategy = GenerationType.IDENTITY)
	private Long id;
}

Persisting 5 entities:

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 5; i++) {
			session.persist(new IdentityIdentifier());
		}
		session.flush();
		return null;
	}
});

Will execute one query after the other (there is no JDBC batching involved):

Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 

Aside from disabling JDBC batching, the IDENTITY generator strategy doesn’t work with the Table per concrete class inheritance model, because there could be multiple subclass entities having the same identifier and a base class query will end up retrieving entities with the same identifier (even if belonging to different types).

SEQUENCE

The SEQUENCE generator (defined in the SQL:2003 standard) is supported by:

A SEQUENCE is a database object that generates incremental integers on each successive request. SEQUENCES are much more flexible than IDENTIFIER columns because:

  • A SEQUENCE is table free and the same sequence can be assigned to multiple columns or tables
  • A SEQUENCE may preallocate values to improve performance
  • A SEQUENCE may define an incremental step, allowing us to benefit from a “pooled” Hilo algorithm
  • A SEQUENCE doesn’t restrict Hibernate JDBC batching
  • A SEQUENCE doesn’t restrict Hibernate inheritance models

Let’s define a Entity using the SEQUENCE generation strategy:

@Entity(name = "sequenceIdentifier")
public static class SequenceIdentifier {
	@Id
	@GenericGenerator(name = "sequence", strategy = "sequence", parameters = {
			@org.hibernate.annotations.Parameter(name = "sequenceName", value = "sequence"),
			@org.hibernate.annotations.Parameter(name = "allocationSize", value = "1"),
	})
	@GeneratedValue(generator = "sequence", strategy=GenerationType.SEQUENCE)
	private Long id;
}

I used the “sequence” generator because I didn’t want Hibernate to choose a SequenceHiLoGenerator or a SequenceStyleGenerator on our behalf.

Adding 5 entities:

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 5; i++) {
			session.persist(new SequenceIdentifier());
		}
		session.flush();
		return null;
	}
});

Generate the following queries:

Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[insert into sequenceIdentifier (id) values (?)][1]} {[insert into sequenceIdentifier (id) values (?)][2]} 
Query:{[insert into sequenceIdentifier (id) values (?)][3]} {[insert into sequenceIdentifier (id) values (?)][4]} 
Query:{[insert into sequenceIdentifier (id) values (?)][5]} 

This table the inserts are batched, but we know have 5 sequence calls prior to inserting the entities. This can be optimized by using a HILO algorithm.

TABLE (SEQUENCE)

There is another database independent alternative to generating sequences. One or multiple tables can be used to hold the identifier sequence counter. But it means trading write performance for database portability.

While IDENTITY and SEQUENCES are transaction-less, using a database table mandate ACID, for synchronizing multiple concurrent id generation requests.

This is made possible by using row-level locking which comes at a higher cost than IDENTITY or SEQUENCE generators.

The sequence must be calculated in a separate database transaction and this requires the IsolationDelegate mechanism, which has support for both local (JDBC) and global(JTA) transactions.

  • For local transactions, it must open a new JDBC connection, therefore putting more pressure on the current connection pooling mechanism.
  • For global transactions, it requires suspending the current running transaction. After the sequence value is generated, the actual transaction has to be resumed. This process has its own cost, so the overall application performance might be affected.

Let’s define a Entity using the TABLE generation strategy:

@Entity(name = "tableIdentifier")
public static class TableSequenceIdentifier {

	@Id
	@GenericGenerator(name = "table", strategy = "enhanced-table", parameters = {
			@org.hibernate.annotations.Parameter(name = "table_name", value = "sequence_table")
	})
	@GeneratedValue(generator = "table", strategy=GenerationType.TABLE)
	private Long id;
}
	

I used the newer “enhanced-table” generator, because the legacy “table” generator has been deprecated.

Adding 5 entities:

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 5; i++) {
			session.persist(new TableSequenceIdentifier());
		}
		session.flush();
		return null;
	}
});

Generate the following queries:

Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[insert into sequence_table (sequence_name, next_val)  values (?,?)][default,1]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][2,1,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][3,2,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][4,3,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][5,4,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][6,5,default]} 
Query:{[insert into tableIdentifier (id) values (?)][1]} {[insert into tableIdentifier (id) values (?)][2]} 
Query:{[insert into tableIdentifier (id) values (?)][3]} {[insert into tableIdentifier (id) values (?)][4]} 
Query:{[insert into tableIdentifier (id) values (?)][5]}

The table generator allows JDBC batching but it resorts to SELECT FOR UPDATE queries. The row level locking is definitely less efficient than using a native IDENTITY or SEQUENCE.

So, based on your application requirements you have multiple options to choose from. There isn’t one single winning strategy, each one having both advantages and disadvantages.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.