Vlad Mihalcea

How to customize Hibernate dirty checking mechanism

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In my previous article I described the Hibernate automatic dirty checking mechanism. While you should always prefer it, there might be times when you want to add your own custom dirtiness detection strategy.

Custom dirty checking strategies

Hibernate offers the following customization mechanisms:

Read More

The anatomy of Hibernate dirty checking mechanism

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

The persistence context enqueues entity state transitions that get translated to database statements upon flushing. For managed entities, Hibernate can auto-detect incoming changes and schedule SQL UPDATES on our behalf. This mechanism is called automatic dirty checking.

The default dirty checking strategy

By default Hibernate checks all managed entity properties. Every time an entity is loaded, Hibernate makes an additional copy of all entity property values. At flush time, every managed entity property is matched against the loading-time snapshot value:

Default Flush Event Flow

So the number of individual dirty checks is given by the following formula:

N = \sum\limits_{k=1}^n p_{k}

where

n = The number of managed entities
p = The number of properties of a given entity

Even if only one property of a single entity has ever changed, Hibernate will still check all managed entities. For a large number of managed entities, the default dirty checking mechanism may have a significant CPU and memory footprint. Since the initial entity snapshot is held separately, the persistence context requires twice as much memory as all managed entities would normally occupy.

Read More

How does AUTO flush strategy work in JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

The Hibernate AUTO flush mode behaves differently whether you are bootstrapping Hibernate via JPA or using the stand-alone mechanism.

When using JPA, the AUTO flush mode causes all queries (JPQL, Criteria API, and native SQL) to trigger a flush prior to the query execution. However, this is not the case when bootstrapping Hibernate using the native API.

Not all queries trigger a Session flush

Many would assume that Hibernate always flushes the Session before any executing query. While this might have been a more intuitive approach, and probably closer to the JPA’s AUTO FlushModeType, Hibernate tries to optimize that. If the currently executed query is not going to hit the pending SQL INSERT/UPDATE/DELETE statements then the flush is not strictly required.

As stated in the reference documentation, the AUTO flush strategy may sometimes synchronize the current persistence context prior to a query execution. It would have been more intuitive if the framework authors had chosen to name it FlushMode.SOMETIMES.

Read More

A beginner’s guide to flush strategies in JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In my previous post I introduced the entity state transitions Object-relational mapping paradigm.

All managed entity state transitions are translated to associated database statements when the current Persistence Context gets flushed. Hibernate’s flush behavior is not always as obvious as one might think.

Write-behind

Hibernate tries to defer the Persistence Context flushing up until the last possible moment. This strategy has been traditionally known as transactional write-behind.

The write-behind is more related to Hibernate flushing rather than any logical or physical transaction. During a transaction, the flush may occur multiple times.

The flushed changes are visible only for the current database transaction. Until the current transaction is committed, no change is visible by other concurrent transactions.

The persistence context, also known as the first level cache, acts as a buffer between the current entity state transitions and the database.

In caching theory, the write-behind synchronization requires that all changes happen against the cache, whose responsibility is to eventually synchronize with the backing store.

Read More

A beginner’s guide to entity state transitions with JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

Hibernate shifts the developer mindset from SQL statements to entity state transitions. Once an entity is actively managed by Hibernate, all changes are going to be automatically propagated to the database.

Manipulating domain model entities (along with their associations) is much easier than writing and maintaining SQL statements. Without an ORM tool, adding a new column requires modifying all associated INSERT/UPDATE statements.

But Hibernate is no silver bullet either. Hibernate doesn’t free us from ever worrying about the actual executed SQL statements. Controlling Hibernate is not as straightforward as one might think and it’s mandatory to check all SQL statements Hibernate executes on our behalf.

Read More

Hibernate pooled and pooled-lo identifier generators

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this post, we’ll uncover a sequence identifier generator combining identifier assignment efficiency and interoperability with other external systems (concurrently accessing the underlying database system).

Traditionally there have been two sequence identifier strategies to choose from: sequence and seqhilo.

The sequence identifier, always hitting the database for every new value assignment. Even with database sequence preallocation, we have a significant database round-trip cost.

The seqhilo identifier, using the hilo algorithm. This generator calculates some identifier values in-memory, therefore reducing the database round-trip calls. The problem with this optimization technique is that the current database sequence value no longer reflects the current highest in-memory generated value.

The database sequence is used as a bucket number, making it difficult for other systems to interoperate with the database table in question. Other applications must know the inner-workings of the hilo identifier strategy to properly generate non-clashing identifiers.

Read More

A beginner’s guide to Hibernate enhanced identifier generators

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

JPA identifier generators

JPA defines the following identifier strategies:

Strategy Description
AUTO The persistence provider picks the most appropriate identifier strategy supported by the underlying database
IDENTITY Identifiers are assigned by a database IDENTITY column
SEQUENCE The persistence provider uses a database sequence for generating identifiers
TABLE The persistence provider uses a separate database table to emulate a sequence object

In my previous post I exampled the pros and cons of all these surrogate identifier strategies.

Read More

How do Identity, Sequence, and Table (sequence-like) generators work in JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In my previous post I talked about different database identifier strategies. This post will compare the most common surrogate primary key strategies:

  • IDENTITY
  • SEQUENCE
  • TABLE (SEQUENCE)

IDENTITY

The IDENTITY type (included in the SQL:2003 standard) is supported by:

The IDENTITY generator allows an integer/bigint column to be auto-incremented on demand. The increment process happens outside of the current running transaction, so a roll-back may end-up discarding already assigned values (value gaps may happen).

The increment process is very efficient since it uses a database internal lightweight locking mechanism as opposed to the more heavyweight transactional course-grain locks.

The only drawback is that we can’t know the newly assigned value prior to executing the INSERT statement. This restriction is hindering the transactional write-behind strategy adopted by Hibernate. For this reason, Hibernates cannot use JDBC batching when persisting entities that are using the IDENTITY generator.

Read More

Hibernate and UUID identifiers

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see how the UUID entity attributes are persisted when using JPA and Hibernate, for both assigned and auto-generated identifiers.

In my previous post I talked about UUID surrogate keys and the use cases when there are more appropriate than the more common auto-incrementing identifiers.

A UUID database type

There are several ways to represent a 128-bit UUID, and whenever in doubt I like to resort to Stack Exchange for an expert advice.

Because table identifiers are usually indexed, the more compact the database type the less space will the index require. From the most efficient to the least, here are our options:

  1. Some databases (PostgreSQL, SQL Server) offer a dedicated UUID storage type
  2. Otherwise we can store the bits as a byte array (e.g. RAW(16) in Oracle or the standard BINARY(16) type)
  3. Alternatively we can use 2 bigint (64-bit) columns, but a composite identifier is less efficient than a single column one
  4. We can store the hex value in a CHAR(36) column (e.g 32 hex values and 4 dashes), but this will take the most amount of space, hence it’s the least efficient alternative

Hibernate offers many identifier strategies to choose from and for UUID identifiers we have three options:

  • the assigned generator accompanied by the application logic UUID generation
  • the hexadecimal “uuid” string generator
  • the more flexible “uuid2” generator, allowing us to use java.lang.UUID, a 16 byte array or a hexadecimal String value

Read More

The hi/lo algorithm

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In my previous post I talked about various database identifier strategies, you need to be aware of when designing the database model. We concluded that database sequences are very convenient because they are both flexible and efficient for most use cases.

But even with cached sequences, the application requires a database round-trip for every new the sequence value. If your applications demand a high number of insert operations per transaction, the sequence allocation may be optimized with a hi/lo algorithm.

The hi/lo algorithm

The hi/lo algorithms split the sequences domain into “hi” groups. A “hi” value is assigned synchronously. Every “hi” group is given a maximum number of “lo” entries, that can by assigned off-line without worrying about concurrent duplicate entries.

  1. The “hi” token is assigned by the database, and two concurrent calls are guaranteed to see unique consecutive values
  2. Once a “hi” token is retrieved we only need the “incrementSize” (the number of “lo” entries)
  3. The identifiers range is given by the following formula: [(hi -1) * incrementSize) + 1, (hi * incrementSize))

    and the “lo” value will be taken from:

    [0, incrementSize)

    starting from

    [(hi -1) * incrementSize) + 1)

  4. When all “lo” values are used, a new “hi” value is fetched and the cycle continues

Here you can have an example of two concurrent transactions, each one inserting multiple entities:

Read More