Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly.
Hypersistence Optimizer is that tool!
Introduction
Previously, I described the second-level cache entry structure, Hibernate uses for storing entities. Besides entities, Hibernate can also store entity associations and this article will unravel the inner workings of collection caching.
Domain model
For the up-coming tests we are going to use the following entity model:
and collections are cached upon being accessed for the first time:
select
collection0_.id as id1_0_0_,
collection0_.name as name2_0_0_
from
Repository collection0_
where
collection0_.id=1
select
commits0_.repository_id as reposito3_0_0_,
commits0_.id as id1_1_0_,
commits0_.id as id1_1_1_,
commits0_.repository_id as reposito3_1_1_,
commits0_.review as review2_1_1_
from
commit commits0_
where
commits0_.r
select
changes0_.commit_id as commit_i1_1_0_,
changes0_.diff as diff2_2_0_,
changes0_.path as path3_2_0_,
changes0_.index_id as index_id4_0_
from
commit_change changes0_
where
changes0_.commit_id=1
select
changes0_.commit_id as commit_i1_1_0_,
changes0_.diff as diff2_2_0_,
changes0_.path as path3_2_0_,
changes0_.index_id as index_id4_0_
from
commit_change changes0_
where
changes0_.commit_id=2
After the Repository and its associated Commits get cached, loading the Repository and traversing the Commit and Change collections will not hit the database, since all entities and their associations are served from the second-level cache:
For entity collections, Hibernate only stores the entity identifiers, therefore requiring that entities be cached as well:
key = {org.hibernate.cache.spi.CacheKey@3981}
key = {java.lang.Long@3597} "1"
type = {org.hibernate.type.LongType@3598}
entityOrRoleName = {java.lang.String@3599} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Repository.commits"
tenantId = null
hashCode = 31
value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3982}
value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3986} "CollectionCacheEntry[1,2]"
version = null
timestamp = 5858841154416640
The CollectionCacheEntry stores the Commit identifiers associated with a given Repository entity.
Because element types don’t have identifiers, Hibernate stores their dehydrated state instead. The Change embeddable is cached as follows:
--Adding invalidates Collection Cache
insert
into
commit
(id, repository_id, review)
values
(default, 1, false)
insert
into
commit_change
(commit_id, index_id, diff, path)
values
(3, 0, '0b3,17...', 'Main.java')
--committed JDBC Connection
select
commits0_.repository_id as reposito3_0_0_,
commits0_.id as id1_1_0_,
commits0_.id as id11_1_1_,
commits0_.repository_id as reposito3_1_1_,
commits0_.review as review2_1_1_
from
commit commits0_
where
commits0_.repository_id=1
--committed JDBC Connection
After a new Commit entity is persisted, the Repository.commits collection cache is cleared and the associated Commits entities are fetched from the database (the next time the collection is accessed).
Removing existing Collection entries
Removing a Collection element follows the same pattern:
--Removing invalidates Collection Cache
delete
from
commit_change
where
commit_id=1
delete
from
commit
where
id=1
--committed JDBC Connection
select
commits0_.repository_id as reposito3_0_0_,
commits0_.id as id1_1_0_,
commits0_.id as id1_1_1_,
commits0_.repository_id as reposito3_1_1_,
commits0_.review as review2_1_1_
from
commit commits0_
where
commits0_.repository_id=1
--committed JDBC Connection
The Collection Cache is evicted once its structure gets changed.
Removing Collection elements directly
Hibernate can ensure cache consistency, as long as it’s aware of all changes the target cached collection undergoes. Hibernate uses its own Collection types (e.g. PersistentBag, PersistentSet) to allow lazy-loading or detect dirty state.
If an internal Collection element is deleted without updating the Collection state, Hibernate won’t be able to invalidate the currently cached Collection entry:
--Removing Child causes inconsistencies
delete
from
commit_change
where
commit_id=1
delete
from
commit
where
id=1
-committed JDBC Connection
select
collection0_.id as id1_1_0_,
collection0_.repository_id as reposito3_1_0_,
collection0_.review as review2_1_0_
from
commit collection0_
where
collection0_.id=1
--No row with the given identifier exists:
-- [CollectionCacheTest$Commit#1]
--rolled JDBC Connection
When the Commit entity was deleted, Hibernate didn’t know it had to update all the associated Collection Caches. The next time we load the Commit collection, Hibernate will realize some entities don’t exist anymore and it will throw an exception.
Updating Collection elements using HQL
Hibernate can maintain cache consistency when executing bulk updates through HQL:
Running this test case generates the following SQL:
--Updating Child entities using HQL
--committed JDBC Connection
update
commit
set
review=true
--committed JDBC Connection
select
commits0_.repository_id as reposito3_0_0_,
commits0_.id as id1_1_0_,
commits0_.id as id1_1_1_,
commits0_.repository_id as reposito3_1_1_,
commits0_.review as review2_1_1_
from
commit commits0_
where
commits0_.repository_id=1
--committed JDBC Connection
The first transaction doesn’t require hitting the database, only relying on the second-level cache. The HQL UPDATE clears the Collection Cache, so Hibernate will have to reload it from the database when the collection is accessed afterward.
Updating Collection elements using SQL
Hibernate can also invalidate cache entries for bulk SQL UPDATE statements:
--Updating Child entities using SQL
--committed JDBC Connection
update
commit
set
review=true
--committed JDBC Connection
select
commits0_.repository_id as reposito3_0_0_,
commits0_.id as id1_1_0_,
commits0_.id as id1_1_1_,
commits0_.repository_id as reposito3_1_1_,
commits0_.review as review2_1_1_
from
commit commits0_
where
commits0_.repository_id=1
--committed JDBC Connection
The BulkOperationCleanupAction is responsible for cleaning up the second-level cache on bulk DML statements. While Hibernate can detect the affected cache regions when executing a HQL statement, for native queries you need to instruct Hibernate what regions the statement should invalidate. If you don’t specify any such region, Hibernate will clear all second-level cache regions.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
The Collection Cache is a very useful feature, complementing the second-level entity cache. This way we can store an entire entity graph, reducing the database querying workload in read-mostly applications. Like with AUTO flushing, Hibernate cannot introspect the affected tablespaces when executing native queries. To avoid consistency issues (when using AUTO flushing) or cache misses (second-level cache), whenever we need to run a native query we have to explicitly declare the targeted tables, so Hibernate can take the appropriate actions (e.g. flushing or invalidating cache regions).
Based on my book, High-Performance Java Persistence, this workshop teaches you various data access performance optimizations from JDBC, to JPA, Hibernate and jOOQ for the major rational database systems (e.g. Oracle, SQL Server, MySQL and PostgreSQL).