How does Hibernate Collection Cache work

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

Previously, I described the second-level cache entry structure, Hibernate uses for storing entities. Besides entities, Hibernate can also store entity associations and this article will unravel the inner workings of collection caching.

Domain model

For the up-coming tests we are going to use the following entity model:

A Repository has a collection of Commit entities:

@org.hibernate.annotations.Cache(
    usage = CacheConcurrencyStrategy.READ_WRITE
)
@OneToMany(mappedBy = "repository", 
    cascade = CascadeType.ALL, orphanRemoval = true)
private List<Commit> commits = new ArrayList<>();

Each Commit entity has a collection of Change embeddable elements.

@ElementCollection
@CollectionTable(
    name="commit_change",
    joinColumns = @JoinColumn(name="commit_id")
)
@org.hibernate.annotations.Cache(
    usage = CacheConcurrencyStrategy.READ_WRITE
)
@OrderColumn(name = "index_id")
private List<Change> changes = new ArrayList<>();

And we’ll now insert some test data:

doInTransaction(session -> {
    Repository repository = 
        new Repository("High-Performance Hibernate");
    session.persist(repository);

    Commit commit1 = new Commit();
    commit1.getChanges().add(
        new Change("README.txt", "0a1,5...")
    );
    commit1.getChanges().add(
        new Change("web.xml", "17c17...")
    );

    Commit commit2 = new Commit();
    commit2.getChanges().add(
        new Change("README.txt", "0b2,5...")
    );

    repository.addCommit(commit1);
    repository.addCommit(commit2);
    session.persist(commit1);
});

Read-through caching

The Collection cache employs a read-through synchronization strategy:

doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    for (Commit commit : repository.getCommits()) {
        assertFalse(commit.getChanges().isEmpty());
    }
});

and collections are cached upon being accessed for the first time:

select
    collection0_.id as id1_0_0_,
    collection0_.name as name2_0_0_ 
from
    Repository collection0_ 
where
    collection0_.id=1  

select
    commits0_.repository_id as reposito3_0_0_,
    commits0_.id as id1_1_0_,
    commits0_.id as id1_1_1_,
    commits0_.repository_id as reposito3_1_1_,
    commits0_.review as review2_1_1_ 
from
    commit commits0_ 
where
    commits0_.r  

select
    changes0_.commit_id as commit_i1_1_0_,
    changes0_.diff as diff2_2_0_,
    changes0_.path as path3_2_0_,
    changes0_.index_id as index_id4_0_ 
from
    commit_change changes0_ 
where
    changes0_.commit_id=1  

select
    changes0_.commit_id as commit_i1_1_0_,
    changes0_.diff as diff2_2_0_,
    changes0_.path as path3_2_0_,
    changes0_.index_id as index_id4_0_ 
from
    commit_change changes0_ 
where
    changes0_.commit_id=2

After the Repository and its associated Commits get cached, loading the Repository and traversing the Commit and Change collections will not hit the database, since all entities and their associations are served from the second-level cache:

LOGGER.info("Load collections from cache");
doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    assertEquals(2, repository.getCommits().size());
});

There’s no SQL SELECT statement executed when running the previous test case:

CollectionCacheTest - Load collections from cache
JdbcTransaction - committed JDBC Connection

Collection cache entry structure

For entity collections, Hibernate only stores the entity identifiers, therefore requiring that entities be cached as well:

key = {org.hibernate.cache.spi.CacheKey@3981}
    key = {java.lang.Long@3597} "1"
    type = {org.hibernate.type.LongType@3598} 
    entityOrRoleName = {java.lang.String@3599} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Repository.commits"
    tenantId = null
    hashCode = 31
value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3982} 
    value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3986} "CollectionCacheEntry[1,2]"
    version = null
    timestamp = 5858841154416640

The CollectionCacheEntry stores the Commit identifiers associated with a given Repository entity.
Because element types don’t have identifiers, Hibernate stores their dehydrated state instead. The Change embeddable is cached as follows:

key = {org.hibernate.cache.spi.CacheKey@3970} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes#1"
    key = {java.lang.Long@3974} "1"
    type = {org.hibernate.type.LongType@3975} 
    entityOrRoleName = {java.lang.String@3976} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes"
    tenantId = null
    hashCode = 31
value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3971} 
    value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3978}
        state = {java.io.Serializable[2]@3980} 
            0 = {java.lang.Object[2]@3981} 
                0 = {java.lang.String@3985} "0a1,5..."
                1 = {java.lang.String@3986} "README.txt"
            1 = {java.lang.Object[2]@3982} 
                0 = {java.lang.String@3983} "17c17..."
                1 = {java.lang.String@3984} "web.xml"
    version = null
    timestamp = 5858843026345984

Collection Cache consistency model

Consistency is the biggest concern when employing caching, so we need to understand how the Hibernate Collection Cache handles entity state changes.

The CollectionUpdateAction is responsible for all Collection modifications and whenever the collection changes, the associated cache entry is evicted:

protected final void evict() throws CacheException {
    if ( persister.hasCache() ) {
        final CacheKey ck = session.generateCacheKey(
            key, 
            persister.getKeyType(), 
            persister.getRole()
        );
        persister.getCacheAccessStrategy().remove( ck );
    }
}

This behavior is also documented by the CollectionRegionAccessStrategy specification:

For cached collection data, all modification actions actually just invalidate the entry(s).

Based on the current concurrency strategy, the Collection Cache entry is evicted:

before the current transaction is committed, for CacheConcurrencyStrategy.NONSTRICT_READ_WRITE
right after the current transaction is committed, for CacheConcurrencyStrategy.READ_WRITE
exactly when the current transaction is committed, for CacheConcurrencyStrategy.TRANSACTIONAL

Adding new Collection entries

The following test case adds a new Commit entity to our Repository:

LOGGER.info("Adding invalidates Collection Cache");
doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    assertEquals(2, repository.getCommits().size());

    Commit commit = new Commit();
    commit.getChanges().add(
        new Change("Main.java", "0b3,17...")
    );
    repository.addCommit(commit);
});
doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    assertEquals(3, repository.getCommits().size());
});

Running this test generates the following output:

--Adding invalidates Collection Cache

insert 
into
   commit
   (id, repository_id, review) 
values
   (default, 1, false)

insert 
into
   commit_change
   (commit_id, index_id, diff, path) 
values
   (3, 0, '0b3,17...', 'Main.java')

--committed JDBC Connection

select
   commits0_.repository_id as reposito3_0_0_,
   commits0_.id as id1_1_0_,
   commits0_.id as id11_1_1_,
   commits0_.repository_id as reposito3_1_1_,
   commits0_.review as review2_1_1_ 
from
   commit commits0_ 
where
   commits0_.repository_id=1

--committed JDBC Connection

After a new Commit entity is persisted, the Repository.commits collection cache is cleared and the associated Commits entities are fetched from the database (the next time the collection is accessed).

Removing existing Collection entries

Removing a Collection element follows the same pattern:

LOGGER.info("Removing invalidates Collection Cache");
doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    assertEquals(2, repository.getCommits().size());
    Commit removable = repository.getCommits().get(0);
    repository.removeCommit(removable);
});
doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    assertEquals(1, repository.getCommits().size());
});

The following output gets generated:

--Removing invalidates Collection Cache

delete 
from
   commit_change 
where
   commit_id=1

delete 
from
   commit 
where
   id=1

--committed JDBC Connection

select
   commits0_.repository_id as reposito3_0_0_,
   commits0_.id as id1_1_0_,
   commits0_.id as id1_1_1_,
   commits0_.repository_id as reposito3_1_1_,
   commits0_.review as review2_1_1_ 
from
   commit commits0_ 
where
   commits0_.repository_id=1

--committed JDBC Connection

The Collection Cache is evicted once its structure gets changed.

Removing Collection elements directly

Hibernate can ensure cache consistency, as long as it’s aware of all changes the target cached collection undergoes. Hibernate uses its own Collection types (e.g. PersistentBag, PersistentSet) to allow lazy-loading or detect dirty state.

If an internal Collection element is deleted without updating the Collection state, Hibernate won’t be able to invalidate the currently cached Collection entry:

LOGGER.info("Removing Child causes inconsistencies");
doInTransaction(session -> {
    Commit commit = (Commit) 
        session.get(Commit.class, 1L);
    session.delete(commit);
});
try {
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        assertEquals(1, repository.getCommits().size());
    });
} catch (ObjectNotFoundException e) {
    LOGGER.warn("Object not found", e);
}

--Removing Child causes inconsistencies

delete 
from
   commit_change 
where
   commit_id=1

delete 
from
   commit 
where
   id=1

-committed JDBC Connection

select
   collection0_.id as id1_1_0_,
   collection0_.repository_id as reposito3_1_0_,
   collection0_.review as review2_1_0_ 
from
   commit collection0_ 
where
   collection0_.id=1

--No row with the given identifier exists: 
-- [CollectionCacheTest$Commit#1]

--rolled JDBC Connection

When the Commit entity was deleted, Hibernate didn’t know it had to update all the associated Collection Caches. The next time we load the Commit collection, Hibernate will realize some entities don’t exist anymore and it will throw an exception.

Updating Collection elements using HQL

Hibernate can maintain cache consistency when executing bulk updates through HQL:

LOGGER.info("Updating Child entities using HQL");
doInTransaction(session -> {
    Repository repository = (Repository)
         session.get(Repository.class, 1L);
    for (Commit commit : repository.getCommits()) {
        assertFalse(commit.review);
    }
});
doInTransaction(session -> {
    session.createQuery(
        "update Commit c " +
        "set c.review = true ")
    .executeUpdate();
});
doInTransaction(session -> {
    Repository repository = (Repository)
        session.get(Repository.class, 1L);
    for(Commit commit : repository.getCommits()) {
        assertTrue(commit.review);
    }
});

Running this test case generates the following SQL:

--Updating Child entities using HQL

--committed JDBC Connection

update
   commit 
set
   review=true

--committed JDBC Connection

select
   commits0_.repository_id as reposito3_0_0_,
   commits0_.id as id1_1_0_,
   commits0_.id as id1_1_1_,
   commits0_.repository_id as reposito3_1_1_,
   commits0_.review as review2_1_1_ 
from
   commit commits0_ 
where
   commits0_.repository_id=1

--committed JDBC Connection

The first transaction doesn’t require hitting the database, only relying on the second-level cache. The HQL UPDATE clears the Collection Cache, so Hibernate will have to reload it from the database when the collection is accessed afterward.

Updating Collection elements using SQL

Hibernate can also invalidate cache entries for bulk SQL UPDATE statements:

LOGGER.info("Updating Child entities using SQL");
doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    for (Commit commit : repository.getCommits()) {
        assertFalse(commit.review);
    }
});
doInTransaction(session -> {
    session.createSQLQuery(
        "update Commit c " +
        "set c.review = true ")
    .addSynchronizedEntityClass(Commit.class)
    .executeUpdate();
});
doInTransaction(session -> {
    Repository repository = (Repository) 
        session.get(Repository.class, 1L);
    for(Commit commit : repository.getCommits()) {
        assertTrue(commit.review);
    }
});

Generating the following output:

--Updating Child entities using SQL

--committed JDBC Connection

update
   commit 
set
   review=true

--committed JDBC Connection

select
   commits0_.repository_id as reposito3_0_0_,
   commits0_.id as id1_1_0_,
   commits0_.id as id1_1_1_,
   commits0_.repository_id as reposito3_1_1_,
   commits0_.review as review2_1_1_ 
from
   commit commits0_ 
where
   commits0_.repository_id=1  

--committed JDBC Connection

The BulkOperationCleanupAction is responsible for cleaning up the second-level cache on bulk DML statements. While Hibernate can detect the affected cache regions when executing a HQL statement, for native queries you need to instruct Hibernate what regions the statement should invalidate. If you don’t specify any such region, Hibernate will clear all second-level cache regions.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The Collection Cache is a very useful feature, complementing the second-level entity cache. This way we can store an entire entity graph, reducing the database querying workload in read-mostly applications. Like with AUTO flushing, Hibernate cannot introspect the affected tablespaces when executing native queries. To avoid consistency issues (when using AUTO flushing) or cache misses (second-level cache), whenever we need to run a native query we have to explicitly declare the targeted tables, so Hibernate can take the appropriate actions (e.g. flushing or invalidating cache regions).

Follow @vlad_mihalcea