How does Hibernate READ_WRITE CacheConcurrencyStrategy work

Introduction

In my previous post, I introduced the NONSTRICT_READ_WRITE second-level cache concurrency mechanism. In this article, I am going to continue this topic with the READ_WRITE strategy.

Write-through caching

NONSTRICT_READ_WRITE is a read-through caching strategy and updates end-up invalidating cache entries. As simple as this strategy may be, the performance drops with the increase of write operations. A write-through cache strategy is better choice for write-intensive applications, since cache entries can be undated rather than being discarded.

Because the database is the system of record and database operations are wrapped inside physical transactions the cache can either be updated synchronously (like it’s the case of the TRANSACTIONAL cache concurrency strategy) or asynchronously (right after the database transaction is committed).

The READ_WRITE strategy is an asynchronous cache concurrency mechanism and to prevent data integrity issues (e.g. stale cache entries), it uses a locking mechanism that provides unit-of-work isolation guarantees.

Inserting data

Because persisted entities are uniquely identified (each entity being assigned to a distinct database row), the newly created entities get cached right after the database transaction is committed:

@Override
public boolean afterInsert(
    Object key, Object value, Object version) 
        throws CacheException {
    region().writeLock( key );
    try {
        final Lockable item = 
            (Lockable) region().get( key );
        if ( item == null ) {
            region().put( key, 
                new Item( value, version, 
                    region().nextTimestamp() 
                ) 
            );
            return true;
        }
        else {
            return false;
        }
    }
    finally {
        region().writeUnlock( key );
    }
}

For an entity to be cached upon insertion, it must use a SEQUENCE generator, the cache being populated by the EntityInsertAction:

@Override
public void doAfterTransactionCompletion(boolean success, 
    SessionImplementor session) 
    throws HibernateException {

    final EntityPersister persister = getPersister();
    if ( success && isCachePutEnabled( persister, 
        getSession() ) ) {
            final CacheKey ck = getSession()
               .generateCacheKey( 
                    getId(), 
                    persister.getIdentifierType(), 
                    persister.getRootEntityName() );
                
            final boolean put = cacheAfterInsert( 
                persister, ck );
        }
    }
    postCommitInsert( success );
}

The IDENTITY generator doesn’t play well with the transactional write-behind first-level cache design, so the associated EntityIdentityInsertAction doesn’t cache newly inserted entries (at least until HHH-7964 is fixed).

Theoretically, between the database transaction commit and the second-level cache insert, one concurrent transaction might load the newly created entity, therefore triggering a cache insert. Although possible, the cache synchronization lag is very short and if a concurrent transaction is interleaved, it only makes the other transaction hit the database instead of loading the entity from the cache.

Updating data

While inserting entities is a rather simple operation, for updates, we need to synchronize both the database and the cache entry. The READ_WRITE concurrency strategy employs a locking mechanism to ensure data integrity:

ReadWriteCacheConcurrencyStrategy_Update

  1. The Hibernate Transaction commit procedure triggers a Session flush
  2. The EntityUpdateAction replaces the current cache entry with a Lock object
  3. The update method is used for synchronous cache updates so it doesn’t do anything when using an asynchronous cache concurrency strategy, like READ_WRITE
  4. After the database transaction is committed, the after-transaction-completion callbacks are called
  5. The EntityUpdateAction calls the afterUpdate method of the EntityRegionAccessStrategy
  6. The ReadWriteEhcacheEntityRegionAccessStrategy replaces the Lock entry with an actual Item, encapsulating the entity dissembled state

Deleting data

Deleting entities is similar to the update process, as we can see from the following sequence diagram:

ReadWriteCacheConcurrencyStrategy_Delete

  • The Hibernate Transaction commit procedure triggers a Session flush
  • The EntityDeleteAction replaces the current cache entry with a Lock object
  • The remove method call doesn’t do anything, since READ_WRITE is an asynchronous cache concurrency strategy
  • After the database transaction is committed, the after-transaction-completion callbacks are called
  • The EntityDeleteAction calls the unlockItem method of the EntityRegionAccessStrategy
  • The ReadWriteEhcacheEntityRegionAccessStrategy replaces the Lock entry with another Lock object whose timeout period is increased
  • After an entity is deleted, its associated second-level cache entry will be replaced by a Lock object, that’s making any subsequent request to read from the database instead of using the cache entry.

    Locking constructs

    Both the Item and the Lock classes inherit from the Lockable type and each of these two has a specific policy for allowing a cache entry to be read or written.

    The READ_WRITE Lock object

    The Lock class defines the following methods:

    @Override
    public boolean isReadable(long txTimestamp) {
        return false;
    }
    
    @Override
    public boolean isWriteable(long txTimestamp, 
        Object newVersion, Comparator versionComparator) {
        if ( txTimestamp > timeout ) {
            // if timedout then allow write
            return true;
        }
        if ( multiplicity > 0 ) {
            // if still locked then disallow write
            return false;
        }
        return version == null
            ? txTimestamp > unlockTimestamp
            : versionComparator.compare( version, 
                newVersion ) < 0;
    }
    
    • A Lock object doesn’t allow reading the cache entry, so any subsequent request must go to the database
    • If the current Session creation timestamp is greater than the Lock timeout threshold, the cache entry is allowed to be written
    • If at least one Session has managed to lock this entry, any write operation is forbidden
    • A Lock entry allows writing if the incoming entity state has incremented its version or the current Session creation timestamp is greater than the current entry unlocking timestamp

    The READ_WRITE Item object

    The Item class defines the following read/write access policy:

    @Override
    public boolean isReadable(long txTimestamp) {
        return txTimestamp > timestamp;
    }
    
    @Override
    public boolean isWriteable(long txTimestamp, 
        Object newVersion, Comparator versionComparator) {
        return version != null && versionComparator
            .compare( version, newVersion ) < 0;
    }
    
    • An Item is readable only from a Session that’s been started after the cache entry creation time
    • A Item entry allows writing only if the incoming entity state has incremented its version

    Cache entry concurrency control

    These concurrency control mechanism are invoked when saving and reading the underlying cache entries.

    The cache entry is read when the ReadWriteEhcacheEntityRegionAccessStrategy get method is called:

    public final Object get(Object key, long txTimestamp) 
        throws CacheException {
        readLockIfNeeded( key );
        try {
            final Lockable item = 
                (Lockable) region().get( key );
    
            final boolean readable = 
                item != null && 
                item.isReadable( txTimestamp );
                
            if ( readable ) {
                return item.getValue();
            }
            else {
                return null;
            }
        }
        finally {
            readUnlockIfNeeded( key );
        }
    }
    

    The cache entry is written by the ReadWriteEhcacheEntityRegionAccessStrategy putFromLoad method:

    public final boolean putFromLoad(
            Object key,
            Object value,
            long txTimestamp,
            Object version,
            boolean minimalPutOverride)
            throws CacheException {
        region().writeLock( key );
        try {
            final Lockable item = 
                (Lockable) region().get( key );
                
            final boolean writeable = 
                item == null || 
                item.isWriteable( 
                    txTimestamp, 
                    version, 
                    versionComparator );
                    
            if ( writeable ) {
                region().put( 
                    key, 
                    new Item( 
                        value, 
                        version, 
                        region().nextTimestamp() 
                    ) 
                );
                return true;
            }
            else {
                return false;
            }
        }
        finally {
            region().writeUnlock( key );
        }
    }
    

    Timing out

    If the database operation fails, the current cache entry holds a Lock object and it cannot rollback to its previous Item state. For this reason, the Lock must timeout to allow the cache entry to be replaced by an actual Item object. The EhcacheDataRegion defines the following timeout property:

    private static final String CACHE_LOCK_TIMEOUT_PROPERTY = 
        "net.sf.ehcache.hibernate.cache_lock_timeout";
    private static final int DEFAULT_CACHE_LOCK_TIMEOUT = 60000;
    

    Unless we override the net.sf.ehcache.hibernate.cache_lock_timeout property, the default timeout is 60 seconds:

    final String timeout = properties.getProperty(
        CACHE_LOCK_TIMEOUT_PROPERTY,
        Integer.toString( DEFAULT_CACHE_LOCK_TIMEOUT )
    );
    

    The following test will emulate a failing database transaction, so we can observe how the READ_WRITE cache only allows writing after the timeout threshold expires. First we are going to lower the timeout value, to reduce the cache freezing period:

    properties.put(
        "net.sf.ehcache.hibernate.cache_lock_timeout", 
        String.valueOf(250));
    

    We’ll use a custom interceptor to manually rollback the currently running transaction:

    @Override
    protected Interceptor interceptor() {
        return new EmptyInterceptor() {
            @Override
            public void beforeTransactionCompletion(
                Transaction tx) {
                if(applyInterceptor.get()) {
                    tx.rollback();
                }
            }
        };
    }
    

    The following routine will test the lock timeout behavior:

    try {
        doInTransaction(session -> {
            Repository repository = (Repository)
                session.get(Repository.class, 1L);
            repository.setName("High-Performance Hibernate");
            applyInterceptor.set(true);
        });
    } catch (Exception e) {
        LOGGER.info("Expected", e);
    }
    applyInterceptor.set(false);
    
    AtomicReference<Object> previousCacheEntryReference =
            new AtomicReference<>();
    AtomicBoolean cacheEntryChanged = new AtomicBoolean();
    
    while (!cacheEntryChanged.get()) {
        doInTransaction(session -> {
            boolean entryChange;
            session.get(Repository.class, 1L);
            
            try {
                Object previousCacheEntry = 
                    previousCacheEntryReference.get();
                Object cacheEntry = 
                    getCacheEntry(Repository.class, 1L);
                
                entryChange = previousCacheEntry != null &&
                    previousCacheEntry != cacheEntry;
                previousCacheEntryReference.set(cacheEntry);
                LOGGER.info("Cache entry {}", 
                    ToStringBuilder.reflectionToString(
                        cacheEntry));
                        
                if(!entryChange) {
                    sleep(100);
                } else {
                    cacheEntryChanged.set(true);
                }
            } catch (IllegalAccessException e) {
                LOGGER.error("Error accessing Cache", e);
            }
        });
    }
    

    Running this test generates the following output:

    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id=1
       
    update
       repository 
    set
       name='High-Performance Hibernate',
       version=1 
    where
       id=1 
       and version=0
    
    JdbcTransaction - rolled JDBC Connection
    
    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id = 1
    
    Cache entry net.sf.ehcache.Element@3f9a0805[
        key=ReadWriteCacheConcurrencyStrategyWithLockTimeoutTest$Repository#1,
        value=Lock Source-UUID:ac775350-3930-4042-84b8-362b64c47e4b Lock-ID:0,
            version=1,
            hitCount=3,
            timeToLive=120,
            timeToIdle=120,
            lastUpdateTime=1432280657865,
            cacheDefaultLifespan=true,id=0
    ]
    Wait 100 ms!
    JdbcTransaction - committed JDBC Connection
    
    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id = 1
       
    Cache entry net.sf.ehcache.Element@3f9a0805[
        key=ReadWriteCacheConcurrencyStrategyWithLockTimeoutTest$Repository#1,
        value=Lock Source-UUID:ac775350-3930-4042-84b8-362b64c47e4b Lock-ID:0,
            version=1,
            hitCount=3,
            timeToLive=120,
            timeToIdle=120,
            lastUpdateTime=1432280657865,
            cacheDefaultLifespan=true,
            id=0
    ]
    Wait 100 ms!
    JdbcTransaction - committed JDBC Connection
    
    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id = 1
    Cache entry net.sf.ehcache.Element@305f031[
        key=ReadWriteCacheConcurrencyStrategyWithLockTimeoutTest$Repository#1,
        value=org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@592e843a,
            version=1,
            hitCount=1,
            timeToLive=120,
            timeToIdle=120,
            lastUpdateTime=1432280658322,
            cacheDefaultLifespan=true,
            id=0
    ]
    JdbcTransaction - committed JDBC Connection
    
    • The first transaction tries to update an entity, so the associated second-level cache entry is locked prior to committing the transaction.
    • The first transaction fails and it gets rolled back
    • The lock is being held, so the next two successive transactions are going to the database, without replacing the Lock entry with the current loaded database entity state
    • After the Lock timeout period expires, the third transaction can finally replace the Lock with an Item cache entry (holding the entity disassembled hydrated state)

    Conclusion

    The READ_WRITE concurrency strategy offers the benefits of a write-through caching mechanism, but you need to understand it’s inner workings to decide if it’s good fit for your current project data access requirements.

    For heavy write contention scenarios, the locking constructs will make other concurrent transactions hit the database, so you must decide if a synchronous cache concurrency strategy is better suited in this situation.

    Code available on GitHub.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

    How does Hibernate NONSTRICT_READ_WRITE CacheConcurrencyStrategy work

    Introduction

    In my previous post, I introduced the READ_ONLY CacheConcurrencyStrategy, which is the obvious choice for immutable entity graphs. When cached data is changeable, we need to use a read-write caching strategy and this post will describe how NONSTRICT_READ_WRITE second-level cache works.

    Inner workings

    When the Hibernate transaction is committed, the following sequence of operations is executed:

    NonStrictReadWriteCacheConcurrencyStrategy

    First, the cache is invalidated before the database transaction gets committed, during flush time:

    1. The current Hibernate Transaction (e.g. JdbcTransaction, JtaTransaction) is flushed
    2. The DefaultFlushEventListener executes the current ActionQueue
    3. The EntityUpdateAction calls the update method of the EntityRegionAccessStrategy
    4. The NonStrictReadWriteEhcacheCollectionRegionAccessStrategy removes the cache entry from the underlying EhcacheEntityRegion

    After the database transaction is committed, the cache entry is removed once more:

    1. The current Hibernate Transaction after completion callback is called
    2. The current Session propagates this event to its internal ActionQueue
    3. The EntityUpdateAction calls the afterUpdate method on the EntityRegionAccessStrategy
    4. The NonStrictReadWriteEhcacheCollectionRegionAccessStrategy calls the remove method on the underlying EhcacheEntityRegion

    Inconsistency warning

    The NONSTRICT_READ_WRITE mode is not a write-though caching strategy because cache entries are invalidated, instead of being updated. Tthe cache invalidation is not synchronized with the current database transaction. Even if the associated Cache region entry gets invalidated twice (before and after transaction completion), there’s still a tiny time window when the cache and the database might drift apart.

    The following test will demonstrate this issue. First we are going to define Alice transaction logic:

    doInTransaction(session -> {
        LOGGER.info("Load and modify Repository");
        Repository repository = (Repository)
            session.get(Repository.class, 1L);
        assertTrue(getSessionFactory().getCache()
            .containsEntity(Repository.class, 1L));
        repository.setName("High-Performance Hibernate");
        applyInterceptor.set(true);
    });
    
    endLatch.await();
    
    assertFalse(getSessionFactory().getCache()
        .containsEntity(Repository.class, 1L));
    
    doInTransaction(session -> {
        applyInterceptor.set(false);
        Repository repository = (Repository)
            session.get(Repository.class, 1L);
        LOGGER.info("Cached Repository {}", repository);
    });
    

    Alice loads a Repository entity and modifies it in her first database transaction.
    To spawn another concurrent transaction right when Alice prepares to commit, we are going to use the following Hibernate Interceptor:

    private AtomicBoolean applyInterceptor = 
        new AtomicBoolean();
    
    private final CountDownLatch endLatch = 
        new CountDownLatch(1);
    
    private class BobTransaction extends EmptyInterceptor {
        @Override
        public void beforeTransactionCompletion(Transaction tx) {
            if(applyInterceptor.get()) {
                LOGGER.info("Fetch Repository");
    
                assertFalse(getSessionFactory().getCache()
                    .containsEntity(Repository.class, 1L));
    
                executeSync(() -> {
                    Session _session = getSessionFactory()
                        .openSession();
                    Repository repository = (Repository) 
                        _session.get(Repository.class, 1L);
                    LOGGER.info("Cached Repository {}", 
                        repository);
                    _session.close();
                    endLatch.countDown();
                });
    
                assertTrue(getSessionFactory().getCache()
                    .containsEntity(Repository.class, 1L));
            }
        }
    }
    

    Running this code generates the following output:

    [Alice]: Load and modify Repository
    [Alice]: select nonstrictr0_.id as id1_0_0_, nonstrictr0_.name as name2_0_0_ from repository nonstrictr0_ where nonstrictr0_.id=1
    [Alice]: update repository set name='High-Performance Hibernate' where id=1
    
    [Alice]: Fetch Repository from another transaction
    [Bob]: select nonstrictr0_.id as id1_0_0_, nonstrictr0_.name as name2_0_0_ from repository nonstrictr0_ where nonstrictr0_.id=1
    [Bob]: Cached Repository from Bob's transaction Repository{id=1, name='Hibernate-Master-Class'}
    
    [Alice]: committed JDBC Connection
    
    [Alice]: select nonstrictr0_.id as id1_0_0_, nonstrictr0_.name as name2_0_0_ from repository nonstrictr0_ where nonstrictr0_.id=1
    [Alice]: Cached Repository Repository{id=1, name='High-Performance Hibernate'}
    
    1. Alice fetches a Repository and updates its name
    2. The custom Hibernate Interceptor is invoked and Bob’s transaction is started
    3. Because the Repository was evicted from the Cache, Bob will load the 2nd level cache with the current database snapshot
    4. Alice transaction commits, but now the Cache contains the previous database snapshot that Bob’s just loaded
    5. If a third user will now fetch the Repository entity, he will also see a stale entity version which is different from the current database snapshot
    6. After Alice transaction is committed, the Cache entry is evicted again and any subsequent entity load request will populate the Cache with the current database snapshot

    Stale data vs lost updates

    The NONSTRICT_READ_WRITE concurrency strategy introduces a tiny window of inconsistency when the database and the second-level cache can go out of sync. While this might sound terrible, in reality we should always design our applications to cope with these situations even if we don’t use a second-level cache. Hibernate offers application-level repeatable reads through its transactional write-behind first-level cache and all managed entities are subject to becoming stale. Right after an entity is loaded into the current Persistence Context, another concurrent transaction might update it and so, we need to prevent stale data from escalating to losing updates.

    Optimistic concurrency control is an effective way of dealing with lost updates in long conversations and this technique can mitigate the NONSTRICT_READ_WRITE inconsistency issue as well.

    Conclusion

    The NONSTRICT_READ_WRITE concurrency strategy is a good choice for read-mostly applications (if backed-up by the optimistic locking mechanism). For write-intensive scenarios, the cache invalidation mechanism would increase the cache miss rate, therefore rendering this technique inefficient.

    Code available on GitHub.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

    How does Hibernate Collection Cache work

    Introduction

    Previously, I described the second-level cache entry structure, Hibernate uses for storing entities. Besides entities, Hibernate can also store entity associations and this article will unravel the inner workings of collection caching.

    Domain model

    For the up-coming tests we are going to use the following entity model:

    CollectionCacheRepositoryCommitChange

    A Repository has a collection of Commit entities:

    @org.hibernate.annotations.Cache(
        usage = CacheConcurrencyStrategy.READ_WRITE
    )
    @OneToMany(mappedBy = "repository", 
        cascade = CascadeType.ALL, orphanRemoval = true)
    private List<Commit> commits = new ArrayList<>();
    

    Each Commit entity has a collection of Change embeddable elements.

    @ElementCollection
    @CollectionTable(
        name="commit_change",
        joinColumns = @JoinColumn(name="commit_id")
    )
    @org.hibernate.annotations.Cache(
        usage = CacheConcurrencyStrategy.READ_WRITE
    )
    @OrderColumn(name = "index_id")
    private List<Change> changes = new ArrayList<>();
    

    And we’ll now insert some test data:

    doInTransaction(session -> {
        Repository repository = 
            new Repository("Hibernate-Master-Class");
        session.persist(repository);
    
        Commit commit1 = new Commit();
        commit1.getChanges().add(
            new Change("README.txt", "0a1,5...")
        );
        commit1.getChanges().add(
            new Change("web.xml", "17c17...")
        );
    
        Commit commit2 = new Commit();
        commit2.getChanges().add(
            new Change("README.txt", "0b2,5...")
        );
    
        repository.addCommit(commit1);
        repository.addCommit(commit2);
        session.persist(commit1);
    });
    

    Read-through caching

    The Collection cache employs a read-through synchronization strategy:

    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        for (Commit commit : repository.getCommits()) {
            assertFalse(commit.getChanges().isEmpty());
        }
    });
    

    and collections are cached upon being accessed for the first time:

    select
        collection0_.id as id1_0_0_,
        collection0_.name as name2_0_0_ 
    from
        Repository collection0_ 
    where
        collection0_.id=1  
        
    select
        commits0_.repository_id as reposito3_0_0_,
        commits0_.id as id1_1_0_,
        commits0_.id as id1_1_1_,
        commits0_.repository_id as reposito3_1_1_,
        commits0_.review as review2_1_1_ 
    from
        commit commits0_ 
    where
        commits0_.r  
            
    select
        changes0_.commit_id as commit_i1_1_0_,
        changes0_.diff as diff2_2_0_,
        changes0_.path as path3_2_0_,
        changes0_.index_id as index_id4_0_ 
    from
        commit_change changes0_ 
    where
        changes0_.commit_id=1  
                
    select
        changes0_.commit_id as commit_i1_1_0_,
        changes0_.diff as diff2_2_0_,
        changes0_.path as path3_2_0_,
        changes0_.index_id as index_id4_0_ 
    from
        commit_change changes0_ 
    where
        changes0_.commit_id=2
    

    After the Repository and its associated Commits get cached, loading the Repository and traversing the Commit and Change collections will not hit the database, since all entities and their associations are served from the second-level cache:

    LOGGER.info("Load collections from cache");
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        assertEquals(2, repository.getCommits().size());
    });
    

    There’s no SQL SELECT statement executed when running the previous test case:

    CollectionCacheTest - Load collections from cache
    JdbcTransaction - committed JDBC Connection
    

    Collection cache entry structure

    For entity collections, Hibernate only stores the entity identifiers, therefore requiring that entities be cached as well:

    key = {org.hibernate.cache.spi.CacheKey@3981}
        key = {java.lang.Long@3597} "1"
        type = {org.hibernate.type.LongType@3598} 
        entityOrRoleName = {java.lang.String@3599} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Repository.commits"
        tenantId = null
        hashCode = 31
    value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3982} 
        value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3986} "CollectionCacheEntry[1,2]"
        version = null
        timestamp = 5858841154416640    
    

    The CollectionCacheEntry stores the Commit identifiers associated with a given Repository entity.
    Because element types don’t have identifiers, Hibernate stores their dehydrated state instead. The Change embeddable is cached as follows:

    key = {org.hibernate.cache.spi.CacheKey@3970} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes#1"
        key = {java.lang.Long@3974} "1"
        type = {org.hibernate.type.LongType@3975} 
        entityOrRoleName = {java.lang.String@3976} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.CollectionCacheTest$Commit.changes"
        tenantId = null
        hashCode = 31
    value = {org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@3971} 
        value = {org.hibernate.cache.spi.entry.CollectionCacheEntry@3978}
            state = {java.io.Serializable[2]@3980} 
                0 = {java.lang.Object[2]@3981} 
                    0 = {java.lang.String@3985} "0a1,5..."
                    1 = {java.lang.String@3986} "README.txt"
                1 = {java.lang.Object[2]@3982} 
                    0 = {java.lang.String@3983} "17c17..."
                    1 = {java.lang.String@3984} "web.xml"
        version = null
        timestamp = 5858843026345984
    

    Collection Cache consistency model

    Consistency is the biggest concern when employing caching, so we need to understand how the Hibernate Collection Cache handles entity state changes.

    The CollectionUpdateAction is responsible for all Collection modifications and whenever the collection changes, the associated cache entry is evicted:

    protected final void evict() throws CacheException {
        if ( persister.hasCache() ) {
            final CacheKey ck = session.generateCacheKey(
                key, 
                persister.getKeyType(), 
                persister.getRole()
            );
            persister.getCacheAccessStrategy().remove( ck );
        }
    }
    

    This behavior is also documented by the CollectionRegionAccessStrategy specification:

    For cached collection data, all modification actions actually just invalidate the entry(s).

    Based on the current concurrency strategy, the Collection Cache entry is evicted:

    Adding new Collection entries

    The following test case adds a new Commit entity to our Repository:

    LOGGER.info("Adding invalidates Collection Cache");
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        assertEquals(2, repository.getCommits().size());
    
        Commit commit = new Commit();
        commit.getChanges().add(
            new Change("Main.java", "0b3,17...")
        );
        repository.addCommit(commit);
    });
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        assertEquals(3, repository.getCommits().size());
    });
    

    Running this test generates the following output:

    --Adding invalidates Collection Cache
    
    insert 
    into
       commit
       (id, repository_id, review) 
    values
       (default, 1, false)
       
    insert 
    into
       commit_change
       (commit_id, index_id, diff, path) 
    values
       (3, 0, '0b3,17...', 'Main.java')
     
    --committed JDBC Connection
    
    select
       commits0_.repository_id as reposito3_0_0_,
       commits0_.id as id1_1_0_,
       commits0_.id as id11_1_1_,
       commits0_.repository_id as reposito3_1_1_,
       commits0_.review as review2_1_1_ 
    from
       commit commits0_ 
    where
       commits0_.repository_id=1
       
    --committed JDBC Connection
    

    After a new Commit entity is persisted, the Repository.commits collection cache is cleared and the associated Commits entities are fetched from the database (the next time the collection is accessed).

    Removing existing Collection entries

    Removing a Collection element follows the same pattern:

    LOGGER.info("Removing invalidates Collection Cache");
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        assertEquals(2, repository.getCommits().size());
        Commit removable = repository.getCommits().get(0);
        repository.removeCommit(removable);
    });
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        assertEquals(1, repository.getCommits().size());
    });
    

    The following output gets generated:

    --Removing invalidates Collection Cache
    
    delete 
    from
       commit_change 
    where
       commit_id=1
       
    delete 
    from
       commit 
    where
       id=1
       
    --committed JDBC Connection
    
    select
       commits0_.repository_id as reposito3_0_0_,
       commits0_.id as id1_1_0_,
       commits0_.id as id1_1_1_,
       commits0_.repository_id as reposito3_1_1_,
       commits0_.review as review2_1_1_ 
    from
       commit commits0_ 
    where
       commits0_.repository_id=1
       
    --committed JDBC Connection
    

    The Collection Cache is evicted once its structure gets changed.

    Removing Collection elements directly

    Hibernate can ensure cache consistency, as long as it’s aware of all changes the target cached collection undergoes. Hibernate uses its own Collection types (e.g. PersistentBag, PersistentSet) to allow lazy-loading or detect dirty state.

    If an internal Collection element is deleted without updating the Collection state, Hibernate won’t be able to invalidate the currently cached Collection entry:

    LOGGER.info("Removing Child causes inconsistencies");
    doInTransaction(session -> {
        Commit commit = (Commit) 
            session.get(Commit.class, 1L);
        session.delete(commit);
    });
    try {
        doInTransaction(session -> {
            Repository repository = (Repository) 
                session.get(Repository.class, 1L);
            assertEquals(1, repository.getCommits().size());
        });
    } catch (ObjectNotFoundException e) {
        LOGGER.warn("Object not found", e);
    }
    
    --Removing Child causes inconsistencies
    
    delete 
    from
       commit_change 
    where
       commit_id=1
       
    delete 
    from
       commit 
    where
       id=1
    
    -committed JDBC Connection
    
    select
       collection0_.id as id1_1_0_,
       collection0_.repository_id as reposito3_1_0_,
       collection0_.review as review2_1_0_ 
    from
       commit collection0_ 
    where
       collection0_.id=1
    
    --No row with the given identifier exists: 
    -- [CollectionCacheTest$Commit#1]
    
    --rolled JDBC Connection
    

    When the Commit entity was deleted, Hibernate didn’t know it had to update all the associated Collection Caches. The next time we load the Commit collection, Hibernate will realize some entities don’t exist anymore and it will throw an exception.

    Updating Collection elements using HQL

    Hibernate can maintain cache consistency when executing bulk updates through HQL:

    LOGGER.info("Updating Child entities using HQL");
    doInTransaction(session -> {
        Repository repository = (Repository)
             session.get(Repository.class, 1L);
        for (Commit commit : repository.getCommits()) {
            assertFalse(commit.review);
        }
    });
    doInTransaction(session -> {
        session.createQuery(
            "update Commit c " +
            "set c.review = true ")
        .executeUpdate();
    });
    doInTransaction(session -> {
        Repository repository = (Repository)
            session.get(Repository.class, 1L);
        for(Commit commit : repository.getCommits()) {
            assertTrue(commit.review);
        }
    });
    

    Running this test case generates the following SQL:

    --Updating Child entities using HQL
    
    --committed JDBC Connection
    
    update
       commit 
    set
       review=true
       
    --committed JDBC Connection
    
    select
       commits0_.repository_id as reposito3_0_0_,
       commits0_.id as id1_1_0_,
       commits0_.id as id1_1_1_,
       commits0_.repository_id as reposito3_1_1_,
       commits0_.review as review2_1_1_ 
    from
       commit commits0_ 
    where
       commits0_.repository_id=1
       
    --committed JDBC Connection
    

    The first transaction doesn’t require hitting the database, only relying on the second-level cache. The HQL UPDATE clears the Collection Cache, so Hibernate will have to reload it from the database when the collection is accessed afterwards.

    Updating Collection elements using SQL

    Hibernate can also invalidate cache entries for bulk SQL UPDATE statements:

    LOGGER.info("Updating Child entities using SQL");
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        for (Commit commit : repository.getCommits()) {
            assertFalse(commit.review);
        }
    });
    doInTransaction(session -> {
        session.createSQLQuery(
            "update Commit c " +
            "set c.review = true ")
        .addSynchronizedEntityClass(Commit.class)
        .executeUpdate();
    });
    doInTransaction(session -> {
        Repository repository = (Repository) 
            session.get(Repository.class, 1L);
        for(Commit commit : repository.getCommits()) {
            assertTrue(commit.review);
        }
    });
    

    Generating the following output:

    --Updating Child entities using SQL
    
    --committed JDBC Connection
    
    update
       commit 
    set
       review=true
       
    --committed JDBC Connection
       
    select
       commits0_.repository_id as reposito3_0_0_,
       commits0_.id as id1_1_0_,
       commits0_.id as id1_1_1_,
       commits0_.repository_id as reposito3_1_1_,
       commits0_.review as review2_1_1_ 
    from
       commit commits0_ 
    where
       commits0_.repository_id=1  
       
    --committed JDBC Connection
    

    The BulkOperationCleanupAction is responsible for cleaning up the second-level cache on bulk DML statements. While Hibernate can detect the affected cache regions when executing an HQL statement, for native queries you need to instruct Hibernate what regions the statement should invalidate. If you don’t specify any such region, Hibernate will clear all second-level cache regions.

    Conclusion

    The Collection Cache is a very useful feature, complementing the second-level entity cache. This way we can store an entire entity graph, reducing the database querying workload in read-mostly applications. Like with AUTO flushing, Hibernate cannot introspect the affected table spaces when executing native queries. To avoid consistency issues (when using AUTO flushing) or cache misses (second-level cache), whenever we need to run a native query we have to explicitly declare the targeted tables, so Hibernate can take the appropriate actions (e.g. flushing or invalidating cache regions).

    Code available on GitHub.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

    How to optimize Hibernate ElementCollection statements

    Introduction

    Hibernate supports three data mapping types: basic (e.g String, int), Embeddable and Entity. Most often, a database row is mapped to an Entity, each database column being associated to a basic attribute. Embeddable types are more common when combining several field mappings into a reusable group (the Embeddable being merged into the owning Entity mapping structure).

    Both basic types and Embeddables can be associated to an Entity through the @ElementCollection, in a one-Entity-many-non-Entity relationship.

    Testing time

    For the upcoming test cases we are going to use the following entity model:

    EllementCollectionPatchChange

    A Patch has a collection of Change Embeddable objects.

    @ElementCollection
    @CollectionTable(
        name="patch_change",
        joinColumns=@JoinColumn(name="patch_id")
    )
    private List<Change> changes = new ArrayList<>();
    

    The Change object is modeled as an Embeddable type and it can only be accessed through its owner Entity. The Embeddable has no identifier and it cannot be queried through JPQL. The Embeddable life-cycle is bound to that of its owner, so any Entity state transition is automatically propagated to the Embeddable collection.

    First, we need to add some test data:

    doInTransaction(session -> {
        Patch patch = new Patch();
        patch.getChanges().add(
            new Change("README.txt", "0a1,5...")
        );
        patch.getChanges().add(
            new Change("web.xml", "17c17...")
        );
        session.persist(patch);
    });
    

    Adding a new element

    Let’s see what happens when we add a new Change to an existing Patch:

    doInTransaction(session -> {
        Patch patch = (Patch) session.get(Patch.class, 1L);
        patch.getChanges().add(
            new Change("web.xml", "1d17...")
        );
    });
    

    This test generate the following SQL output:

    DELETE FROM patch_change 
    WHERE  patch_id = 1
    
    INSERT INTO patch_change (patch_id, diff, path)
    VALUES (1, '0a1,5...', 'README.txt') 
    			
    INSERT INTO patch_change(patch_id, diff, path)
    VALUES (1, '17c17...', 'web.xml') 
    
    INSERT INTO patch_change(patch_id, diff, path)
    VALUES (1, '1d17...', 'web.xml') 
    

    By default, any collection operation ends up recreating the whole data set. This behavior is only acceptable for an in-memory collection and it’s not suitable from a database perspective. The database has to delete all existing rows, only to re-add them afterwords. The more indexes we have on this table, the greater the performance penalty.

    Removing an element

    Removing an element is no different:

    doInTransaction(session -> {
        Patch patch = (Patch) session.get(Patch.class, 1L);
        patch.getChanges().remove(0);
    });
    

    This test case generates these SQL statements:

    DELETE FROM patch_change 
    WHERE  patch_id = 1
    
    INSERT INTO patch_change(patch_id, diff, path)
    VALUES (1, '17c17...', 'web.xml') 
    

    All table rows were removed and the remaining in-memory entries have been flushed to the database.

    The Java Persistence Wiki Book clearly documents this behavior:

    The JPA 2.0 specification does not provide a way to define the Id in the Embeddable. However, to delete or update a element of the ElementCollection mapping, some unique key is normally required. Otherwise, on every update the JPA provider would need to delete everything from the CollectionTable for the Entity, and then insert the values back. So, the JPA provider will most likely assume that the combination of all of the fields in the Embeddable are unique, in combination with the foreign key (JoinColumn(s)). This however could be inefficient, or just not feasible if the Embeddable is big, or complex.

    Some JPA providers may allow the Id to be specified in the Embeddable, to resolve this issue. Note in this case the Id only needs to be unique for the collection, not the table, as the foreign key is included. Some may also allow the unique option on the CollectionTable to be used for this. Otherwise, if your Embeddable is complex, you may consider making it an Entity and use a OneToMany instead.

    Adding an OrderColumn

    To optimize the ElementCollection behavior we need apply the same techniques that work for one-to-many associations. The collection of elements is like a unidirectional one-to-many relationship, and we already know that an idbag performs better than a unidirectional bag.

    Because an Embeddable cannot contain an identifier, we can at least add an order column so that each row can be uniquely identified. Let’s see what happens when we add an @OrderColumn to our element collection:

    @ElementCollection
    @CollectionTable(
        name="patch_change",
        joinColumns=@JoinColumn(name="patch_id")
    )
    @OrderColumn(name = "index_id")
    private List<Change> changes = new ArrayList<>();
    

    Removing an entity sees no improvement from the previous test results:

    DELETE FROM patch_change 
    WHERE  patch_id = 1
    
    INSERT INTO patch_change(patch_id, diff, path)
    VALUES (1, '17c17...', 'web.xml') 
    

    This is because the AbstractPersistentCollection checks for nullable columns, when preventing the collection from being recreated:

    @Override
    public boolean needsRecreate(CollectionPersister persister) {
        if (persister.getElementType() instanceof ComponentType) {
            ComponentType componentType = 
                (ComponentType) persister.getElementType();
            return !componentType.hasNotNullProperty();
        }
        return false;
    }
    

    We’ll now add the NOT NULL constraints and rerun our tests:

    @Column(name = "path", nullable = false)
    private String path;
    
    @Column(name = "diff", nullable = false)
    private String diff;
    

    Adding a new ordered element

    Adding an element to the end of the list generates the following statement:

    INSERT INTO patch_change(patch_id, index_id, diff, path)
    VALUES (1, 2, '1d17...', 'web.xml') 
    

    The index_id column is used to persist the in-memory collection order. Adding to the end of the collection doesn’t affect the existing elements order, hence only one INSERT statement is required.

    Adding a new first element

    If we add a new element at the beginning of the list:

    doInTransaction(session -> {
        Patch patch = (Patch) session.get(Patch.class, 1L);
        patch.getChanges().add(0, 
            new Change("web.xml", "1d17...")
        );
    });
    

    Generates the following SQL output:

    UPDATE patch_change
    SET    diff = '1d17...',
           path = 'web.xml'
    WHERE  patch_id = 1
           AND index_id = 0 
    
    UPDATE patch_change
    SET    diff = '0a1,5...',
           path = 'README.txt'
    WHERE  patch_id = 1
           AND index_id = 1
    
    INSERT INTO patch_change (patch_id, index_id, diff, path)
    VALUES (1, 2, '17c17...', 'web.xml') 
    

    The existing database entries are updated to reflect the new in-memory data structure. Because the newly added element is added at the beginning of the list, it will trigger an update to the first table row. All INSERT statements are issued at the end of the list and all existing elements are updated according to the new list order.

    This behavior is explained in the the @OrderColumn Java Persistence documentation:

    The persistence provider maintains a contiguous (non-sparse) ordering of the values of the order column when updating the association or element collection. The order column value for the first element is 0.

    Removing an ordered element

    If we delete the last entry:

    doInTransaction(session -> {
        Patch patch = (Patch) session.get(Patch.class, 1L);
        patch.getChanges().remove(patch.getChanges().size() - 1);
    });
    

    There’s only one DELETE statement being issued:

    DELETE FROM patch_change
    WHERE  patch_id = 1
           AND index_id = 1 
    

    Deleting the first element entry

    If we delete the first element the following statements are executed:

    DELETE FROM patch_change
    WHERE  patch_id = 1
           AND index_id = 1 
    
    UPDATE patch_change
    SET    diff = '17c17...',
           path = 'web.xml'
    WHERE  patch_id = 1
           AND index_id = 0 
    

    Hibernate deletes all extra rows and then it updates the remaining ones.

    Deleting from the middle

    If we delete an element from the middle of the list:

    doInTransaction(session -> {
        Patch patch = (Patch) session.get(Patch.class, 1L);
        patch.getChanges().add(new Change("web.xml", "1d17..."));
        patch.getChanges().add(new Change("server.xml", "3a5..."));
    });
    
    doInTransaction(session -> {
        Patch patch = (Patch) session.get(Patch.class, 1L);
        patch.getChanges().remove(1);
    });
    

    The following statements are executed:

    DELETE FROM patch_change
    WHERE  patch_id = 1
           AND index_id = 3
    
    UPDATE patch_change
    SET    diff = '1d17...',
           path = 'web.xml'
    WHERE  patch_id = 1
           AND index_id = 1 
    
    UPDATE patch_change
    SET    diff = '3a5...',
           path = 'server.xml'
    WHERE  patch_id = 1
           AND index_id = 2 
    

    An ordered ElementCollection is updated like this:

    • The database table size is adjusted, the DELETE statements removing the extra rows located at the end of the table. If the in-memory collection is larger than its database counterpart then all INSERT statements will be executed at the end of the list
    • All elements situated before the adding/removing entry are left untouched
    • The remaining elements located after the adding/removing one are updated to match the new in-memory collection state

    Conclusion

    Compared to an inverse one-to-many association, the ElementCollection is more difficult to optimize. If the collection is frequently updated then a collection of elements is better substituted by a one-to-many association. Element collections are more suitable for data that seldom changes, when we don’t want to add an extra Entity just for representing the foreign key side.

    Code available on GitHub.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

    How does Hibernate READ_ONLY CacheConcurrencyStrategy work

    Introduction

    As I previously explained, enterprise caching requires diligence. Because data is duplicated between the database (system of record) and the caching layer, we need to make sure the two separate data sources don’t drift apart.

    If the cached data is immutable (neither the database nor the cache are able modify it), we can safely cache it without worrying of any consistency issues. Read-only data is always a good candidate for application-level caching, improving read performance without having to relax consistency guarantees.

    Read-only second-level caching

    For testing the read-only second-level cache strategy, we going to use the following domain model:

    RepositoryCommitChangeReadOnlyCacheConcurrencyStrategy

    The Repository is the root entity, being the parent of any Commit entity. Each Commit has a list of Change components (embeddable value types).

    All entities are cached as read-only elements:

    @org.hibernate.annotations.Cache(
        usage = CacheConcurrencyStrategy.READ_ONLY
    )
    

    Persisting entities

    The read-only second-level cache uses a read-through caching strategy, entities being cached upon fetching.

    doInTransaction(session -> {
        Repository repository = 
            new Repository("Hibernate-Master-Class");
        session.persist(repository);
    });
    

    When an entity is persisted only the database contains a copy of this entity. The system of record is passed to the caching layer when the entity gets fetched for the first time.

    @Test
    public void testRepositoryEntityLoad() {
        LOGGER.info("Read-only entities are read-through");
    
        doInTransaction(session -> {
            Repository repository = (Repository) 
                session.get(Repository.class, 1L);
            assertNotNull(repository);
        });
    
        doInTransaction(session -> {
            LOGGER.info("Load Repository from cache");
            session.get(Repository.class, 1L);
        });
    }
    

    This test generates the output:

    --Read-only entities are read-through
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1 
    
    --JdbcTransaction - committed JDBC Connection
    
    --Load Repository from cache
    
    --JdbcTransaction - committed JDBC Connection
    

    Once the entity is loaded into the second-level cache, any subsequent call will be served by the cache, therefore bypassing the database.

    Updating entities

    Read-only cache entries are not allowed to be updated. Any such attempt ends up in an exception being thrown:

    @Test
    public void testReadOnlyEntityUpdate() {
        try {
            LOGGER.info("Read-only cache entries cannot be updated");
            doInTransaction(session -> {
                Repository repository = (Repository) 
                    session.get(Repository.class, 1L);
                repository.setName(
                    "High-Performance Hibernate"
                );
            });
        } catch (Exception e) {
            LOGGER.error("Expected", e);
        }
    }
    

    Running this test generates the following output:

    --Read-only cache entries cannot be updated
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1 
    
    UPDATE repository
    SET    NAME = 'High-Performance Hibernate'
    WHERE  id = 1 
    
    --JdbcTransaction - rolled JDBC Connection
    
    --ERROR Expected
    --java.lang.UnsupportedOperationException: Can't write to a readonly object
    

    Because read-only cache entities are practically immutable it’s good practice to attribute them the Hibernate specific @Immutable annotation.

    Deleting entities

    Read-only cache entries are removed when the associated entity is deleted as well:

    @Test
    public void testReadOnlyEntityDelete() {
        LOGGER.info("Read-only cache entries can be deleted");
        doInTransaction(session -> {
            Repository repository = (Repository) 
                session.get(Repository.class, 1L);
            assertNotNull(repository);
            session.delete(repository);
        });
        doInTransaction(session -> {
            Repository repository = (Repository) 
                session.get(Repository.class, 1L);
            assertNull(repository);
        });
    }
    

    Generating the following output:

    --Read-only cache entries can be deleted
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1;
    
    DELETE FROM repository
    WHERE  id = 1
    
    --JdbcTransaction - committed JDBC Connection
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1; 
    
    --JdbcTransaction - committed JDBC Connection
    

    The remove entity state transition is enqueued by PersistenceContext, and at flush time, both the database and the second-level cache will delete the associated entity record.

    Collection caching

    The Commit entity has a collection of Change components.

    @ElementCollection
    @CollectionTable(
        name="commit_change",
        joinColumns=@JoinColumn(name="commit_id")
    )
    private List<Change> changes = new ArrayList<>();
    

    Although the Commit entity is cached as a read-only element, the Change collection is ignored by the second-level cache.

    @Test
    public void testCollectionCache() {
        LOGGER.info("Collections require separate caching");
        doInTransaction(session -> {
            Repository repository = (Repository) 
                session.get(Repository.class, 1L);
            Commit commit = new Commit(repository);
            commit.getChanges().add(
                new Change("README.txt", "0a1,5...")
            );
            commit.getChanges().add(
                new Change("web.xml", "17c17...")
            );
            session.persist(commit);
        });
        doInTransaction(session -> {
            LOGGER.info("Load Commit from database");
            Commit commit = (Commit) 
                session.get(Commit.class, 1L);
            assertEquals(2, commit.getChanges().size());
        });
        doInTransaction(session -> {
            LOGGER.info("Load Commit from cache");
            Commit commit = (Commit) 
                session.get(Commit.class, 1L);
            assertEquals(2, commit.getChanges().size());
        });
    }
    

    Running this test generates the following output:

    --Collections require separate caching
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1;
    
    
    INSERT INTO commit
                (id, repository_id)
    VALUES      (DEFAULT, 1);
    			 
    INSERT INTO commit_change
                (commit_id, diff, path)
    VALUES      (1, '0a1,5...', 'README.txt');		 
    
    INSERT INTO commit_change
                (commit_id, diff, path)
    VALUES      (1, '17c17...', 'web.xml');
    			 
    --JdbcTransaction - committed JDBC Connection
    
    --Load Commit from database
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1;
    
    
    SELECT changes0_.commit_id AS commit_i1_0_0_,
           changes0_.diff      AS diff2_1_0_,
           changes0_.path      AS path3_1_0_
    FROM   commit_change changes0_
    WHERE  changes0_.commit_id = 1 
    
    --JdbcTransaction - committed JDBC Connection
    
    --Load Commit from cache
    
    SELECT changes0_.commit_id AS commit_i1_0_0_,
           changes0_.diff      AS diff2_1_0_,
           changes0_.path      AS path3_1_0_
    FROM   commit_change changes0_
    WHERE  changes0_.commit_id = 1 
    
    --JdbcTransaction - committed JDBC Connection
    

    Although the Commit entity is retrieved from the cache, the Change collection is always fetched from the database. Since the Changes are immutable too, we would like to cache them as well, to save unnecessary database round-trips.

    Enabling Collection cache support

    Collections are not cached by default, and to enable this behavior, we have to annotate them with the a cache concurrency strategy:

    @ElementCollection
    @CollectionTable(
        name="commit_change",
        joinColumns=@JoinColumn(name="commit_id")
    )
    @org.hibernate.annotations.Cache(
        usage = CacheConcurrencyStrategy.READ_ONLY
    )
    private List<Change> changes = new ArrayList<>();
    

    Re-running the previous test generate the following output:

    --Collections require separate caching
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1;
    
    
    INSERT INTO commit
                (id, repository_id)
    VALUES      (DEFAULT, 1);
    			 
    INSERT INTO commit_change
                (commit_id, diff, path)
    VALUES      (1, '0a1,5...', 'README.txt');		 
    
    INSERT INTO commit_change
                (commit_id, diff, path)
    VALUES      (1, '17c17...', 'web.xml');
    			 
    --JdbcTransaction - committed JDBC Connection
    
    --Load Commit from database
    
    SELECT readonlyca0_.id   AS id1_2_0_,
           readonlyca0_.NAME AS name2_2_0_
    FROM   repository readonlyca0_
    WHERE  readonlyca0_.id = 1;
    
    
    SELECT changes0_.commit_id AS commit_i1_0_0_,
           changes0_.diff      AS diff2_1_0_,
           changes0_.path      AS path3_1_0_
    FROM   commit_change changes0_
    WHERE  changes0_.commit_id = 1 
    
    --JdbcTransaction - committed JDBC Connection
    
    --Load Commit from cache
    
    --JdbcTransaction - committed JDBC Connection
    

    Once the collection is cached, we can fetch the Commit entity along with all its Changes without hitting the database.

    Conclusion

    Read-only entities are safe for caching and we can load an entire immutable entity graph using the second-level cache only. Because the cache is read-through, entities are cached upon being fetched from the database. The read-only cache is not write-through because persisting an entity only materializes into a new database row, without propagating to the cache as well.

    Code available on GitHub.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

    A beginner’s guide to Cache synchronization strategies

    Introduction

    A system of record is the authoritative data source when information is scattered among various data providers. When we introduce a caching solution, we automatically duplicate our data. To avoid inconsistent reads and data integrity issues, it’s very important to synchronize the database and the cache (whenever a change occurs into the system).

    There are various ways to keep the cache and the underlying database in sync and this article will present some of the most common cache synchronization strategies.

    Cache-aside

    The application code can manually manage both the database and the cache information. The application logic inspects the cache before hitting the database and it updates the cache after any database modification.

    CacheAside

    Mixing caching management and application is not very appealing, especially if we have to repeat these steps in every data retrieval method. Leveraging an Aspect-Oriented caching interceptor can mitigate the cache leaking into the application code, but it doesn’t exonerate us from making sure that both the database and the cache are properly synchronized.

    Read-through

    Instead of managing both the database and the cache, we can simply delegate the database synchronization to the cache provider. All data interactions is therefore done through the cache abstraction layer.

    CacheReadThrough

    Upon fetching a cache entry, the Cache verifies the cached element availability and loads the underlying resource on our behalf. The application uses the cache as the system of record and the cache is able to auto-populate on demand.

    Write-through

    Analogous to the read-through data fetching strategy, the cache can update the underlying database every time a cache entry is changed.

    CacheWriteThrough

    Although the database and the cache are updated synchronously, we have the liberty of choosing the transaction boundaries according to our current business requirements.

    • If strong consistency is mandatory and the cache provider offers an XAResource we can then enlist the cache and the database in the same global transaction. The database and the cache are therefore updated in a single atomic unit-of-work
    • If consistency can be weaken, we can update the cache and the database sequentially, without using a global transaction. Usually the cache is changed first and if the database update fails, the cache can use a compensating action to roll-back the current transaction changes

    Write-behind

    If strong consistency is not mandated, we can simply enqueue the cache changes and periodically flush them to the database.

    CacheWriteBehind

    This strategy is employed by the Java Persistence EntityManager (first-level cache), all entity state transitions being flushed towards the end of the current running transaction (or when a query is issued).

    Although it breaks transaction guarantees, the write-behind caching strategy can outperform the write-through policy, because database updates can be batched and the number of DML transactions is also reduced.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

    Things to consider before jumping to enterprise caching

    Introduction

    Relational database transactions are ACID and the strong consistency model simplifies application development. Because enabling Hibernate caching is one configurations away, it’s very appealing to turn to caching whenever the data access layer starts showing performance issues. Adding a caching layer can indeed improve application performance, but it has its price and you need to be aware of it.

    Database performance tuning

    The database is therefore the central part of any enterprise application, containing valuable business assets. A database server has limited resources and it can therefore serve a finite number of connections. The shorter the database transactions, the more transactions can be accommodated. The first performance tuning action is to reduce the query execution times by indexing properly and optimizing queries.

    When all queries and statements are optimized, we can either add more resources (scale up) or adding more database nodes (scale out). Horizontal scaling requires database replication, which implies synchronizing nodes. Synchronous replication preserve strong consistency, while asynchronous master-slave replication leads to eventual consistency.

    Analogous to database replication challenges, cache nodes induce data synchronization issues, especially for distributed enterprise applications.

    Caching

    Even if the database access patterns are properly optimized, higher loads might increase latency. To provide predictable and constant response times, we need to turn to caching. Caching allows us to reuse a database response for multiple user requests.

    The cache can therefore:

    • reduce CPU/Memory/IO resource consumption on the database side
    • reduce network traffic between application nodes and the database tier
    • provide constant result fetch time, insensitive to traffic bursts
    • provide a read-only view when the application is in maintenance mode (e.g. when upgrading the database schema)

    The downside of introducing a caching solution is that data is duplicated in two separate technologies that may easily desynchronise.

    In the simplest use case you have one database server and one cache node:

    SingleCacheNode

    The caching abstraction layer is aware of the database server, but the database knows nothing of the application-level cache. If some external process updates the database without touching the cache, the two data sources will get out of sync. Because few database servers support application-level notifications, the cache may break the strong consistency guarantees.

    To avoid eventual consistency, both the database and the cache need to be enrolled in a distributed XA transaction, so the affected cache entries are either updated or invalidated synchronously.

    Most often, there are more application nodes or multiple distinct applications (web-fronts, batch processors, schedulers) comprising the whole enterprise system:

    MultipleCacheNodes

    If each node has its own isolated cache node, we need to be aware of possible data synchronisation issues. If one node updates the database and its own cache without notifying the rest, then other cache nodes get out of sync.

    In a distributed environment, when multiple applications or application nodes use caching, we need to use a distributed caching solution, so that:

    • cache nodes communicate in a peer-to-peer topology
    • cache nodes communicate in a client-server topology and a central cache server takes care of data synchronization

    DistributedCacheNodes

    Conclusion

    Caching is a fine scaling technique but you have to be aware of possible consistency issues. Taking into consideration your current project data integrity requirements, you need to design your application to take advantage of caching without compromising critical data.

    Caching is not a cross-cutting concern, leaking into your application architecture and requiring a well-thought plan for compensating data integrity anomalies.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.