How does FlexyPool support the Dropwizard Metrics package renaming

Introduction

FlexyPool relies heavily on Dropwizard (previously Codahale) Metrics for monitoring the connection pool usage. Being integrated into Dropwizard, the package name was bound to be renamed.

So instead of com.codahale.metrics the 4.0.0 release will use the io.dropwizard.metrics package name.

The challenge

Apart from the obvious backward incompatibility, the most challenging aspect of this change is that the Maven dependency will only see a version incrementation. This means that you won’t be able to include both versions in the same Maven module, because the groupId and the artifactId will not change between the 3.x.x and the 4.x.x version change.

<dependency>
     <groupId>io.dropwizard.metrics</groupId>
     <artifactId>metrics-core</artifactId>
     <version>${codahale.metrics.version}</version>
</dependency>

<dependency>
     <groupId>io.dropwizard.metrics</groupId>
     <artifactId>metrics-core</artifactId>
     <version>${dropwizard.metrics.version}</version>
</dependency>

This change is manageable in an end-user application as you only have to migrate from one version to the other. An open source framework built on top of Dropwizard Metrics is much more difficult to refactor as you need to support two incompatible versions of the same library. After all, you don’t want to force your clients to migrate to a certain Metrics dependency.

Luckily, FlexyPool has had its own Metrics abstraction layer from the very beginning. Insulating a framework from external dependencies is a safe measure, allowing you to swap dependencies without much effort.

To support both Codahale and Dropwizard package names, FlexyPool metrics are build like this:

FlexyPoolMetricsCodahaleDropwizard

Because those classes cannot reside in one jar, there are three modules hosting this hierarchy:

  • flexy-pool-core: defines the FlexyPool Metrics abstraction
  • flexy-codahale-metrics: implements the FlexyPool metrics abstraction on top of Codahale Matrics
  • flexy-dropwizard-metrics: implements the FlexyPool metrics abstraction on top of Dropwizard Matrics

Each MetricsFactory is registered as a Service Provider:

public class CodahaleMetricsFactoryService 
    implements MetricsFactoryService {

    public static final String METRICS_CLASS_NAME = 
        "com.codahale.metrics.Metric";

    @Override
    public MetricsFactory load() {
        return ClassLoaderUtils
            .findClass(METRICS_CLASS_NAME) ? 
                CodahaleMetrics.FACTORY : null;
    }
}

public class DropwizardMetricsFactoryService 
    implements MetricsFactoryService {

    public static final String METRICS_CLASS_NAME = 
        "io.dropwizard.metrics.Metric";

    @Override
    public MetricsFactory load() {
        return ClassLoaderUtils
            .findClass(METRICS_CLASS_NAME) ? 
                DropwizardMetrics.FACTORY : null;
    }
}

and the services are resolved at runtime:

private ServiceLoader<MetricsFactoryService> 
    serviceLoader = ServiceLoader.load(
        MetricsFactoryService.class);

public MetricsFactory resolve() {
    for(MetricsFactoryService service : serviceLoader) {
        MetricsFactory metricsFactory = service.load();
        if(metricsFactory != null) {
            return metricsFactory;
        }
    }
    throw new IllegalStateException(
        "No MetricsFactory could be loaded!"
    );
}

Conclusion

This way FlexyPool can use both Metrics implementations and the decision is taken dynamically based on the currently available library. The Dropwizard metrics 4.0.0 is not yet released, but FlexyPool is ready for the upcoming changes.

The High-Performance Java Persistence book

A book in the making

It’s been a year since I started the quest for a highly-effective Data Knowledge Stack and the Hibernate Master Class contains over fifty articles already.

Now that I covered many aspects of database transactions, JDBC and Java Persistence, it’s time to assemble all the pieces together into the High-Performance Java Persistence book.

An Agile publishing experience

Writing a book is a very time-consuming and stressful process and the last thing I needed was a very tight schedule. After reading Antonio Goncalves’s story, I chose the self-publishing way.

In the end, I settled for Leanpub because it allows me to publish the book incrementally. This leads to a better engagement with readers, allowing me adapt the book content on the way.

The content

At its core, the book is about getting the most out of your persistence layer and that can only happen when your application resonates with the database system. Because concurrency is inherent to database processing, transactions play a very important role in this regard.

The first part will be about some basic performance-related database concepts such as: locking, batching, connection pooling.

In the second part, I will explain how an ORM can actually improve DML performance. This part will include the Hibernate Master Class findings.

The third part is about advance querying techniques with jOOQ.

Get involved

The Agile methodologies are not just for software development. Writing a book in a Lean style can shorten the feed-back period and readers can get involved on the way.

If you have any specific request or you are interested in this project, you can join my newsletter and follow my progress.

How to monitor a Java EE DataSource

Introduction

FlexyPool is an open-source framework that can monitor a DataSource connection usage. This tool come out of necessity, since we previously lacked support for provisioning connection pools.

FlexyPool was initially designed for stand-alone environments and the DataSource proxy configuration was done programmatically. Using Spring bean aliases, we could even substitute an already configured DataSource with the FlexyPool Metrics-aware proxy alternative.

Java EE support

Recently, I’ve been asked about supporting Java EE environments and in the true open-source spirit, I accepted the challenge. Supporting a managed environment is tricky because the DataSource is totally decoupled from the application-logic and made available through a JNDI lookup.

One drawback is that we can’t use automatic pool sizing strategies, since most Application Servers return a custom DataSource implementation (which is closely integrated with their in-house JTA transaction manager solution), that doesn’t offer access to reading/writing the connection pool size.

While the DataSource might not be adjustable, we can at least monitor the connection usage and that’s enough reason to support Java EE environments too.

Adding declarative configuration

Because we operate in a managed environment, we can no longer configure the DataSource programmatically, so we need to use the declarative configuration support.

By default, FlexyPool looks for the flexy-pool.properties file in the current Class-path. The location can be customized using the flexy.pool.properties.path System property , which can be a:

  • URL (e.g. file:/D:/wrk/vladmihalcea/flexy-pool/flexy-pool-core/target/test-classes/flexy-pool.properties)
  • File system path (e.g. D:\wrk\vladmihalcea\flexy-pool\flexy-pool-core\target\test-classes\flexy-pool.properties)
  • Class-path nested path (e.g. nested/fp.properties)

The properties file may contain the following configuration options:

Parameter name Description

flexy.pool.data.source.unique.name

Each FlexyPool instance requires a unique name so that JMX domains won’t clash

flexy.pool.data.source.jndi.name

The JNDI DataSource location

flexy.pool.data.source.jndi.lazy.lookup

Whether to lookup the DataSource lazily (useful when the target DataSource is not available when the FlexyPoolDataSource is instantiated)

flexy.pool.data.source.class.name

The DataSource can be instantiated at Runtime using this Class name

flexy.pool.data.source.property.*

If the DataSource is instantiated at Runtime, each flexy.pool.data.source.property.${java-bean-property} will set the java-bean-property of the newly instantiated DataSource (e.g. flexy.pool.data.source.property.user=sa)

flexy.pool.adapter.factory

Specifies the PoolAdaptorFactory, in case the DataSource supports dynamic sizing. By default it uses the generic DataSourcePoolAdapter which doesn’t support auto-scaling

flexy.pool.metrics.factory

Specifies the MetricsFactory used for creating Metrics

flexy.pool.metrics.reporter.log.millis

Specifies the metrics log reported interval

flexy.pool.metrics.reporter.jmx.enable

Specifies if the jmx reporting should be enabled

flexy.pool.metrics.reporter.jmx.auto.start

Specifies if the jmx service should be auto-started (set this to true in Java EE environments)

flexy.pool.strategies.factory.resolver

Specifies a ConnectionAcquiringStrategyFactoryResolver class to be used for obtaining a list of ConnectionAcquiringStrategyFactory objects. This should be set only if the PoolAdaptor supports accessing the DataSource pool size.

Hibernate ConnectionProvider

Most Java EE applications already use JPA and for those who happen to be using Hibernate, we can make use of the hibernate.connection.provider_class configuration property for injecting our proxy DataSource.

Hibernate provides many built-in extension points and the connection management is totally configurable. By providing a custom ConnectionProvider we can substitute the original DataSource with the FlexyPool proxy.

All we have to do is adding the following property to our persistence.xml file:

<property name="hibernate.connection.provider_class"
          value="com.vladmihalcea.flexypool.adaptor.FlexyPoolHibernateConnectionProvider"/>

Behind the scenes, this provider will configure a FlexyPoolDataSource and use it whenever a new connection is requested:

private FlexyPoolDataSource<DataSource> flexyPoolDataSource;

@Override
public void configure(Map props) {
    super.configure(props);
    LOGGER.debug(
        "Hibernate switched to using FlexyPoolDataSource
    ");
    flexyPoolDataSource = new FlexyPoolDataSource<DataSource>(
        getDataSource()
    );
}

@Override
public Connection getConnection() throws SQLException {
    return flexyPoolDataSource.getConnection();
}

Instantiating the actual DataSource at runtime

If you’re not using Hibernate, you need to have the FlexyPoolDataSource ready before the EntityManagerFactory finishes bootstrapping:

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.0" xmlns="http://java.sun.com/xml/ns/persistence"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="
        http://java.sun.com/xml/ns/persistence
        http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd">

    <persistence-unit name="persistenceUnit" transaction-type="JTA">

        <provider>org.hibernate.jpa.HibernatePersistenceProvider</provider>

        <jta-data-source>java:global/jdbc/flexypool</jta-data-source>

        <properties>
            <property 
                name="hibernate.hbm2ddl.auto" 
                value="update"/>

            <property 
                name="hibernate.show_sql" 
                value="true"/>

            <property 
                name="hibernate.dialect" 
                value="org.hibernate.dialect.HSQLDialect"/>

            <property 
                name="hibernate.transaction.jta.platform" 
                value="org.hibernate.service.jta.platform.internal.SunOneJtaPlatform"/>
        </properties>
    </persistence-unit>
</persistence>

While in a production Java EE environment we use an Application server specific DataSource configuration, for simplicity sake, I’m going to configure the FlexyPooldataSource using the DataSourceDefinition annotation:

@DataSourceDefinition(
    name = "java:global/jdbc/flexypool",
    className = "com.vladmihalcea.flexypool.FlexyPoolDataSource")
@Stateless
public class FlexyPoolDataSourceConfiguration {}

We now need to pass the actual DataSource properties to FlexyPool and this is done through the flexy-pool.properties configuration file:

flexy.pool.data.source.unique.name=unique-name
flexy.pool.data.source.class.name=org.hsqldb.jdbc.JDBCDataSource
flexy.pool.data.source.property.user=sa
flexy.pool.data.source.property.password=
flexy.pool.data.source.property.url=jdbc:hsqldb:mem:test
flexy.pool.metrics.reporter.jmx.auto.start=true

The actual DataSource is going to be created by the FlexyPoolDataSource on start-up.

Locating the actual DataSource from JNDI

If the actual DataSource is already configured by the Application Server, we can instruct FlexyPool to fetch it from JNDI. Let’s say we have the following DataSource configuration:

@DataSourceDefinition(
	name = "java:global/jdbc/default",
	className = "org.hsqldb.jdbc.JDBCDataSource",
	url = "jdbc:hsqldb:mem:test",
	initialPoolSize = 3,
	maxPoolSize = 5
)
@Stateless
public class DefaultDataSourceConfiguration {}

To proxy the JNDI DataSource, we need to configure FlexyPool like this:

flexy.pool.data.source.unique.name=unique-name
flexy.pool.data.source.jndi.name=java:global/jdbc/default
flexy.pool.metrics.reporter.jmx.auto.start=true

The FlexyPoolDataSource is defined alongside the actual DataSource:

@DataSourceDefinition(
	name = "java:global/jdbc/flexypool",
	className = "com.vladmihalcea.flexypool.FlexyPoolDataSource")
@Stateless
public class FlexyPoolDataSourceConfiguration {}

The JPA will have to fetch the FlexyPoolDataSource instead of the actual one:

<jta-data-source>java:global/jdbc/flexypool</jta-data-source>

In TomEE, because the DataSourceDefinitions are not lazily instantiated, the actual DataSource might not be available in the JNDI registry when the FlexyPoolDataSource definition is processed.

For this, we need to instruct FlexyPool to dely the JNDI lookup until the DataSource is actually requested:

flexy.pool.data.source.jndi.lazy.lookup=true

Conclusion

The last time I used Java EE was in 2008, on a project that was using Java EE 1.4 with EJB 2.1. After 7 years of using Spring exclusively, I’m pleasantly surprised by the Java EE experience. Arquillian is definitely my favourite add-on, since integration testing is of paramount importance in enterprise applications. CDI is both easy and powerful and I’m glad the dependency injection got standardised.

But the best asset of the Java EE platform is the community itself. Java EE has very strong community, willing to give you a hand when in need. I’d like to thank Steve Millidge (Founder of Payara and C2B2) for giving me some great tips on designing the FlexyPool Java EE integration, Alex Soto, Antonio Goncalves, Markus Eisele and all the other Java EE members whom I had some very interesting conversations on Twitter.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

How does Hibernate Query Cache work

Introduction

Now that I covered both Entity and Collection caching, it’s time to investigate how Query Caching works.

The Query Cache is strictly related to Entities and it draws an association between a search criteria and the Entities fulfilling that specific query filter. Like other Hibernate features, the Query Cache is not as trivial as one might think.

Entity model

For our test cases, we are going to use the following domain model:

PostAuthorQueryCache

The Post entity has a many-to-one association to an Author and both entities are stored in the second-level cache.

Enabling query cache

The Query Cache is disabled by default, and to activate it, we need to supply the following Hibernate property:

properties.put("hibernate.cache.use_query_cache", 
    Boolean.TRUE.toString());

For Hibernate to cache a given query result, we need to explicitly set the cachable query attribute when creating the Query.

Read-through caching

The Query Cache is read-through and like the NONSTRICT_READ_WRITE concurrency startegy, it can only invalidate stale entries.

In the next example, we are going to cache the following query:

private List<Post> getLatestPosts(Session session) {
    return (List<Post>) session.createQuery(
        "select p " +
        "from Post p " +
        "order by p.createdOn desc")
    .setMaxResults(10)
    .setCacheable(true)
    .list();
}

First, we are going to investigate the Query Cache internal structure using the following test case:

doInTransaction(session -> {
    LOGGER.info(
        "Evict regions and run query");
    session.getSessionFactory()
        .getCache().evictAllRegions();
    assertEquals(1, getLatestPosts(session).size());
});

doInTransaction(session -> {
    LOGGER.info(
        "Check get entity is cached");
    Post post = (Post) session.get(Post.class, 1L);
});

doInTransaction(session -> {
    LOGGER.info(
        "Check query result is cached");
    assertEquals(1, getLatestPosts(session).size());
});

This test generates the following output:

QueryCacheTest - Evict regions and run query

StandardQueryCache - Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache        
EhcacheGeneralDataRegion - Element for key sql: 
    select
       querycache0_.id as id1_1_,
       querycache0_.author_id as author_i4_1_,
       querycache0_.created_on as created_2_1_,
       querycache0_.name as name3_1_ 
    from
       Post querycache0_ 
    order by
       querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2 
is null
StandardQueryCache - Query results were not found in cache

select
   querycache0_.id as id1_1_,
   querycache0_.author_id as author_i4_1_,
   querycache0_.created_on as created_2_1_,
   querycache0_.name as name3_1_ 
from
   Post querycache0_ 
order by
   querycache0_.created_on desc limit 10
   
StandardQueryCache - Caching query results in region: org.hibernate.cache.internal.StandardQueryCache; timestamp=5872026465492992
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2
value: [5872026465492992, 1]

JdbcTransaction - committed JDBC Connection

------------------------------------------------------------

QueryCacheTest - Check get entity is cached

JdbcTransaction - committed JDBC Connection

------------------------------------------------------------

QueryCacheTest - Check query is cached

StandardQueryCache - Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
StandardQueryCache - Checking query spaces are up-to-date: [Post]

EhcacheGeneralDataRegion - key: Post
UpdateTimestampsCache - [Post] last update timestamp: 5872026465406976, result set timestamp: 5872026465492992
StandardQueryCache - Returning cached query results

JdbcTransaction - committed JDBC Connection
  • All cache regions are evicted, to make sure the Cache is empty
  • Upon running the Post query, the Query Cache checks for previously stored results
  • Because there is no Cache entry, the query goes to the database
  • Both the selected entities and the query result are being cached
  • We then verify that the Post entity was stored in the second-level cache
  • A subsequent query request will be resolved from the Cache, without hitting the database

Query parameters

Query parameters are embedded in the cache entry key as we can see in the following examples.

Basic types

First, we are going to use a basic type filtering:

private List<Post> getLatestPostsByAuthorId(Session session) {
    return (List<Post>) session.createQuery(
        "select p " +
        "from Post p " +
        "join p.author a " +
        "where a.id = :authorId " +
        "order by p.createdOn desc")
    .setParameter("authorId", 1L)
    .setMaxResults(10)
    .setCacheable(true)
    .list();
}
doInTransaction(session -> {
    LOGGER.info("Query cache with basic type parameter");
    List<Post> posts = getLatestPostsByAuthorId(session);
    assertEquals(1, posts.size());
});

The Query Cache entry looks like this:

EhcacheGeneralDataRegion - 
key: 
    sql: 
        select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        inner join
           Author querycache1_ 
              on querycache0_.author_id=querycache1_.id 
        where
           querycache1_.id=? 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {authorId=1}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2 
value: [5871781092679680, 1]

The parameter is stored in the cache entry key. The cache entry value first element is always the result set fetching timestamp. The following elements are the entity identifiers that were returned by this query.

Entity types

We can also use Entity types as query parameters:

private List<Post> getLatestPostsByAuthor(Session session) {
        Author author = (Author) session.get(Author.class, 1L);
    return (List<Post>) session.createQuery(
        "select p " +
        "from Post p " +
        "join p.author a " +
        "where a = :author " +
        "order by p.createdOn desc")
    .setParameter("author", author)
    .setMaxResults(10)
    .setCacheable(true)
    .list();
}
doInTransaction(session -> {
    LOGGER.info("Query cache with entity type parameter");
    List<Post> posts = getLatestPostsByAuthor(session);
    assertEquals(1, posts.size());
});

The cache entry is similar to our previous example, since Hibernate only stored the entity identifier in the cache entry key. This makes sense, since Hibernate already caches the Author entity.

EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        inner join
           Author querycache1_ 
              on querycache0_.author_id=querycache1_.id 
        where
           querycache1_.id=? 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {author=1}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2 
value: [5871781092777984, 1]

Consistency

HQL/JPQL Query Invalidation

Hibernate second-level cache favors strong-consistency and the Query Cache is no different. Like with flushing, the Query Cache can invalidate its entries whenever the associated table space changes. Every time we persist/remove/update an Entity, all Query Cache entries using that particular table will get invalidated.

doInTransaction(session -> {
    Author author = (Author) 
        session.get(Author.class, 1L);
    assertEquals(1, getLatestPosts(session).size());

    LOGGER.info("Insert a new Post");
    Post newPost = new Post("Hibernate Book", author);
    session.persist(newPost);
    session.flush();

    LOGGER.info("Query cache is invalidated");
    assertEquals(2, getLatestPosts(session).size());
});

doInTransaction(session -> {
    LOGGER.info("Check Query cache");
    assertEquals(2, getLatestPosts(session).size());
});

This test will add a new Post and then rerun the cacheable query. Running this test gives the following output:

QueryCacheTest - Insert a new Post

insert 
into
   Post
   (id, author_id, created_on, name) 
values
   (default, 1, '2015-06-06 17:29:59.909', 'Hibernate Book')

UpdateTimestampsCache - Pre-invalidating space [Post], timestamp: 5872029941395456
EhcacheGeneralDataRegion - key: Post value: 5872029941395456

QueryCacheTest - Query cache is invalidated
StandardQueryCache - Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2
    
StandardQueryCache - Checking query spaces are up-to-date: [Post]
EhcacheGeneralDataRegion - key: Post
UpdateTimestampsCache - [Post] last update timestamp: 5872029941395456, result set timestamp: 5872029695619072
StandardQueryCache - Cached query results were not up-to-date

select
   querycache0_.id as id1_1_,
   querycache0_.author_id as author_i4_1_,
   querycache0_.created_on as created_2_1_,
   querycache0_.name as name3_1_ 
from
   Post querycache0_ 
order by
   querycache0_.created_on desc limit 10
   
StandardQueryCache - Caching query results in region: org.hibernate.cache.internal.StandardQueryCache; timestamp=5872029695668224
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2 
value: [5872029695668224, 2, 1]

JdbcTransaction - committed JDBC Connection

UpdateTimestampsCache - Invalidating space [Post], timestamp: 5872029695680512
EhcacheGeneralDataRegion - key: Post value: 5872029695680512

------------------------------------------------------------

QueryCacheTest - Check Query cache

StandardQueryCache - Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2
        
StandardQueryCache - Checking query spaces are up-to-date: [Post]
EhcacheGeneralDataRegion - key: Post
UpdateTimestampsCache - [Post] last update timestamp: 5872029695680512, result set timestamp: 5872029695668224
StandardQueryCache - Cached query results were not up-to-date

select
   querycache0_.id as id1_1_,
   querycache0_.author_id as author_i4_1_,
   querycache0_.created_on as created_2_1_,
   querycache0_.name as name3_1_ 
from
   Post querycache0_ 
order by
   querycache0_.created_on desc limit 10

StandardQueryCache - Caching query results in region: org.hibernate.cache.internal.StandardQueryCache; timestamp=5872029695705088
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2 
value: [5872029695705088, 2, 1]

JdbcTransaction - committed JDBC Connection
  • Once Hibernate detects an Entity state transition, it preinvalidates the affected query cache regions
  • The Query Cache entry is not removed, but its associated timestamp is updated
  • The Query Cache always inspects an entry key timestamp, and it skips reading its value if the key timestamp is newer than the result set loading timestamp
  • If the current Session reruns this query, the result will be cached once more
  • The current database transaction commits and changes propagate from session-level isolation to general read consistency
  • The actual invalidation takes place and the cache entry timestamp is updated again

This approach can break the READ COMMITTED consistency guarantees, because Dirty reads are possible, since the current isolated changes are propagated to the Cache prior to committing the database transaction.

Native Query Invalidation

As I previously stated, native queries leave Hibernate in the dark, as it cannot know which tables the native query might modify eventually. In the following test, we are going to update the Author table, while checking the impact it has on the current Post Query Cache:

doInTransaction(session -> {
    assertEquals(1, getLatestPosts(session).size());

    LOGGER.info("Execute native query");
    assertEquals(1, session.createSQLQuery(
        "update Author set name = '\"'||name||'\"' "
    ).executeUpdate());

    LOGGER.info("Check query cache is invalidated");
    assertEquals(1, getLatestPosts(session).size());
});

The test generates the following output:

QueryCacheTest - Execute native query

UpdateTimestampsCache - Pre-invalidating space [Author], timestamp: 5872035446091776
EhcacheGeneralDataRegion - key: Author value: 5872035446091776
UpdateTimestampsCache - Pre-invalidating space [Post], timestamp: 5872035446091776
EhcacheGeneralDataRegion - key: Post value: 5872035446091776

update
   Author 
set
   name = '"'||name||'"'

QueryCacheTest - Check query cache is invalidated

StandardQueryCache - Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
            parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2
    
StandardQueryCache - Checking query spaces are up-to-date: [Post]
EhcacheGeneralDataRegion - key: Post
UpdateTimestampsCache - [Post] last update timestamp: 5872035446091776, result set timestamp: 5872035200290816
StandardQueryCache - Cached query results were not up-to-date

select
   querycache0_.id as id1_1_,
   querycache0_.author_id as author_i4_1_,
   querycache0_.created_on as created_2_1_,
   querycache0_.name as name3_1_ 
from
   Post querycache0_ 
order by
   querycache0_.created_on desc limit 10

StandardQueryCache - Caching query results in region: org.hibernate.cache.internal.StandardQueryCache; timestamp=5872035200364544
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2 
value: [5872035200364544, 1]

JdbcTransaction - committed JDBC Connection

UpdateTimestampsCache - Invalidating space [Post], timestamp: 5872035200372736
EhcacheGeneralDataRegion - key: Post value: 5872035200372736
UpdateTimestampsCache - Invalidating space [Author], timestamp: 5872035200372736
EhcacheGeneralDataRegion - key: Author value: 5872035200372736

Both the Author and the Post cache regions were invalidated, even if just the Author table was modified. To fix this, we need to let Hibernate know what tables we are going to alter.

Native Query Cache Region Synchronization

Hibernate allows us to define the query table space through query synchronization hints. When supplying this info, Hibernate can invalidate the requested cache regions:

doInTransaction(session -> {
    assertEquals(1, getLatestPosts(session).size());

    LOGGER.info("Execute native query with synchronization");
    assertEquals(1, session.createSQLQuery(
            "update Author set name = '\"'||name||'\"' "
    ).addSynchronizedEntityClass(Author.class)
    .executeUpdate());

    LOGGER.info("Check query cache is not invalidated");
    assertEquals(1, getLatestPosts(session).size());
});

The following output is being generated:

QueryCacheTest - Execute native query with synchronization

UpdateTimestampsCache - Pre-invalidating space [Author], timestamp: 5872036893995008
EhcacheGeneralDataRegion - key: Author value: 5872036893995008

update
   Author 
set
   name = '"'||name||'"'

QueryCacheTest - Check query cache is not invalidated

StandardQueryCache - Checking cached query results in region: org.hibernate.cache.internal.StandardQueryCache
EhcacheGeneralDataRegion - 
key: 
    sql: select
           querycache0_.id as id1_1_,
           querycache0_.author_id as author_i4_1_,
           querycache0_.created_on as created_2_1_,
           querycache0_.name as name3_1_ 
        from
           Post querycache0_ 
        order by
           querycache0_.created_on desc;
    parameters: ; 
    named parameters: {}; 
    max rows: 10; 
    transformer: org.hibernate.transform.CacheableResultTransformer@110f2

StandardQueryCache - Checking query spaces are up-to-date: [Post]
EhcacheGeneralDataRegion - key: Post
UpdateTimestampsCache - [Post] last update timestamp: 5872036648169472, result set timestamp: 5872036648226816
StandardQueryCache - Returning cached query results

JdbcTransaction - committed JDBC Connection

UpdateTimestampsCache - Invalidating space [Author], timestamp: 5872036648263680
EhcacheGeneralDataRegion - key: Author value: 5872036648263680

Only the provided table space was invalidated, leaving the Post Query Cache untouched. Mixing native queries and Query Caching is possible, but it requires a little bit of diligence.

Conclusion

The Query Cache can boost the application performance for frequently executed entity queries, but it’s not a free ride. It’s susceptible to consistency issues and without a proper memory management control mechanism, it can easily grow quite large.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

How does Hibernate TRANSACTIONAL CacheConcurrencyStrategy work

Introduction

In my previous post, I introduced the READ_WRITE second-level cache concurrency mechanism. In this article, I am going to continue this topic with the TRANSACTIONAL strategy.

Write-through caching

While the READ_WRITE CacheConcurrencyStartegy is an asynchronous write-though caching mechanism (since changes are being propagated only after the current database transaction is completed), the TRANSACTIONAL CacheConcurrencyStartegy is synchronized with the current XA transaction.

To enlist two sources of data (the database and the second-level cache) in the same global transaction, we need to use the Java Transaction API and a JTA transaction manager must coordinate the participating XA resources.

In the following example, I’m going to use Bitronix Transaction Manager, since it’s automatically discovered by EhCache and it also supports the one-phase commit (1PC) optimization.

The EhCache second-level cache implementation offers two failure recovery options: xa_strict and xa.

xa_strict

In this mode, the second-level cache exposes an XAResource interface, so it can participate in the two-phase commit (2PC) protocol.

TransactionalXAStrictCacheConcurrencyStrategy

The entity state is modified both in the database and in the cache, but these changes are isolated from other concurrent transactions and they become visible once the current XA transaction gets committed.

The database and the cache remain consistent even in case of an application crash.

xa

If only one data source participates in a globaltransaction, the transaction manager can apply the one-phase commit optimization. The second-level cache is managed through a Synchronization transaction callback. The second-level cache doesn’t actively participates in deciding the transaction outcome, as it merely executes according to the current database transaction outcome:

TransactionalXACacheConcurrencyStrategy

This mode trades durability for latency and in case of a server crash (happening in between the database transaction commit and the second-level cache transaction callback), the two data sources will drift apart. This issue can be mitigated if our entities employ an optimistic concurrency control mechanism, so even if we read stale data, we will not lose updates upon writing.

Isolation level

To the validate the TRANSACTIONAL concurrency strategy isolation level, we are going to use the following test case:

doInTransaction((entityManager) -> {
    Repository repository = entityManager.find(
        Repository.class, repositoryReference.getId());
        
    assertEquals("Hibernate-Master-Class", 
        repository.getName());
        
    executeSync(() -> {
        doInTransaction(_entityManager -> {
            Repository _repository = entityManager.find(
                Repository.class, 
                repositoryReference.getId());
            
            _repository.setName(
                "High-Performance Hibernate");
                
            LOGGER.info("Updating repository name to {}", 
                _repository.getName());
        });
    });

    repository = entityManager.find(
        Repository.class, 
        repositoryReference.getId());
        
    assertEquals("Hibernate-Master-Class", 
        repository.getName());

    LOGGER.info("Detaching repository");
    entityManager.detach(repository);
    assertFalse(entityManager.contains(repository));

    repository = entityManager.find(
        Repository.class, repositoryReference.getId());

    assertEquals("High-Performance Hibernate", 
        repository.getName());
});
  • Alice loads a Repository entity into its current Persistence Context
  • Bob loads the same Repository and then modifies it
  • After Bob’s transaction is committed, Alice still sees the old Repository data, because the Persistence Context provides application-level repeatable reads
  • When Alice evicts the Repository from the first-level cache and fetches it anew, she will see Bob’s changes

The second-level cache doesn’t offer repeatable reads guarantees, since the first-level cache already does this anyway.

Next, we’ll investigate if dirty reads or lost updates are possible and for this we are going to use the following test:

final AtomicReference<Future<?>> 
    bobTransactionOutcomeHolder = new AtomicReference<>();
    
doInTransaction((entityManager) -> {
    Repository repository = entityManager.find(
        Repository.class, repositoryReference.getId());
    
    repository.setName("High-Performance Hibernate");
    entityManager.flush();
    
    Future<?> bobTransactionOutcome = executeAsync(() -> {
        doInTransaction((_entityManager) -> {
            Repository _repository = entityManager.find(
                Repository.class, 
                repositoryReference.getId());
                
            _repository.setName(
                "High-Performance Hibernate Book");
            
            aliceLatch.countDown();
            awaitOnLatch(bobLatch);
        });
    });
    
    bobTransactionOutcomeHolder.set(
        bobTransactionOutcome);
    sleep(500);
    awaitOnLatch(aliceLatch);
});

doInTransaction((entityManager) -> {
    LOGGER.info("Reload entity after Alice's update");
    Repository repository = entityManager.find(
        Repository.class, repositoryReference.getId());
    assertEquals("High-Performance Hibernate", 
        repository.getName());
});

bobLatch.countDown();
bobTransactionOutcomeHolder.get().get();

doInTransaction((entityManager) -> {
    LOGGER.info("Reload entity after Bob's update");
    Repository repository = entityManager.find(
        Repository.class, repositoryReference.getId());
    assertEquals("High-Performance Hibernate Book", 
        repository.getName());
});

This test will emulate two concurrent transactions, trying to update the same Repository entity. This use case is run on PostgreSQL, using the default READ_COMMITTED transaction isolation level.

Running this test generates the following output:

  • Alice loads the Repository entity
    [Alice]: n.s.e.TransactionController - begun transaction 4
    [Alice]: n.s.e.t.l.LocalTransactionStore - get: cache [com.vladmihalcea.hibernate.model.cache.Repository] key [com.vladmihalcea.hibernate.model.cache.Repository#11] not soft locked, returning underlying element
    
  • Alice changes the Repository name
  • Alice flushes the current Persistent Context, so an UPDATE statement is executed. Because Alice’s transaction has not yet committed, a lock will prevent other concurrent transactions from modifying the same Repository row
    [Alice]: n.t.d.l.CommonsQueryLoggingListener - Name:, Time:1, Num:1, Query:{[update repository set name=? where id=?][High-Performance Hibernate,11]} 
    [Alice]: n.s.e.t.l.LocalTransactionStore - put: cache [com.vladmihalcea.hibernate.model.cache.Repository] key [com.vladmihalcea.hibernate.model.cache.Repository#11] was in, replaced with soft lock
    
  • Bob starts a new transaction and loads the same Repository entity
    [Bob]: n.s.e.TransactionController - begun transaction 5
    [Bob]: n.s.e.t.l.LocalTransactionStore - get: cache [com.vladmihalcea.hibernate.model.cache.Repository] key [com.vladmihalcea.hibernate.model.cache.Repository#11] soft locked, returning soft locked element
    
  • Bob also changes the Repository name.
  • The aliceLatch is used to demonstrate that Bob’s transaction is blocked, waiting for Alice’s to release the Repository row-level lock
    [Alice]: c.v.HibernateCacheTest - Wait 500 ms!
    
  • Alice’s thread wakes after having waited for 500 ms and her transaction is committed
    [Alice]: n.s.e.t.l.LocalTransactionContext - 1 participating cache(s), committing transaction 4
    [Alice]: n.s.e.t.l.LocalTransactionContext - committing soft locked values of cache com.vladmihalcea.hibernate.model.cache.Repository
    [Alice]: n.s.e.t.l.LocalTransactionStore - committing 1 soft lock(s) in cache com.vladmihalcea.hibernate.model.cache.Repository
    [Alice]: n.s.e.t.l.LocalTransactionContext - committed transaction 4
    [Alice]: n.s.e.t.l.LocalTransactionContext - unfreezing and unlocking 1 soft lock(s)
    [Alice]: n.s.e.t.l.LocalTransactionContext - unfroze Soft Lock [clustered: false, isolation: rc, key: com.vladmihalcea.hibernate.model.cache.Repository#11]
    [Alice]: n.s.e.t.l.LocalTransactionContext - unlocked Soft Lock [clustered: false, isolation: rc, key: com.vladmihalcea.hibernate.model.cache.Repository#11]
    
  • Alice starts a new transaction and checks that the Repository name is the one she’s just set
    [Alice]: c.v.HibernateCacheTest - Reload entity after Alice's update
    [Alice]: n.s.e.TransactionController - begun transaction 6
    [Alice]: n.s.e.t.l.LocalTransactionStore - get: cache [com.vladmihalcea.hibernate.model.cache.Repository] key [com.vladmihalcea.hibernate.model.cache.Repository#11] not soft locked, returning underlying element
    WARN  [Alice]: b.t.t.Preparer - executing transaction with 0 enlisted resource
    [Alice]: n.s.e.t.l.LocalTransactionContext - 0 participating cache(s), committing transaction 6
    [Alice]: n.s.e.t.l.LocalTransactionContext - committed transaction 6
    [Alice]: n.s.e.t.l.LocalTransactionContext - unfreezing and unlocking 0 soft lock(s)
    
  • Alice thread allows Bob’s thread to continue and she starts waiting on the bobLatch for Bob to finish his transaction
  • Bob can simply issue a database UPDATE and a second-level cache entry modification, without noticing that Alice has changed the Repository since he first loaded it
    [Bob]: n.t.d.l.CommonsQueryLoggingListener - Name:, Time:1, Num:1, Query:{[update repository set name=? where id=?][High-Performance Hibernate Book,11]} 
    [Bob]: n.s.e.t.l.LocalTransactionStore - put: cache [com.vladmihalcea.hibernate.model.cache.Repository] key [com.vladmihalcea.hibernate.model.cache.Repository#11] was in, replaced with soft lock
    [Bob]: n.s.e.t.l.LocalTransactionContext - 1 participating cache(s), committing transaction 5
    [Bob]: n.s.e.t.l.LocalTransactionContext - committing soft locked values of cache com.vladmihalcea.hibernate.model.cache.Repository
    [Bob]: n.s.e.t.l.LocalTransactionStore - committing 1 soft lock(s) in cache com.vladmihalcea.hibernate.model.cache.Repository
    [Bob]: n.s.e.t.l.LocalTransactionContext - committed transaction 5
    [Bob]: n.s.e.t.l.LocalTransactionContext - unfreezing and unlocking 1 soft lock(s)
    [Bob]: n.s.e.t.l.LocalTransactionContext - unfroze Soft Lock [clustered: false, isolation: rc, key: com.vladmihalcea.hibernate.model.cache.Repository#11]
    [Bob]: n.s.e.t.l.LocalTransactionContext - unlocked Soft Lock [clustered: false, isolation: rc, key: com.vladmihalcea.hibernate.model.cache.Repository#11]
    
  • After Bob manages to update the Repository database and cache records, Alice starts a new transaction and she can see Bob’s changes
    [Alice]: c.v.HibernateCacheTest - Reload entity after Bob's update
    [Alice]: o.h.e.t.i.TransactionCoordinatorImpl - Skipping JTA sync registration due to auto join checking
    [Alice]: o.h.e.t.i.TransactionCoordinatorImpl - successfully registered Synchronization
    [Alice]: n.s.e.TransactionController - begun transaction 7
    [Alice]: n.s.e.t.l.LocalTransactionStore - get: cache [com.vladmihalcea.hibernate.model.cache.Repository] key [com.vladmihalcea.hibernate.model.cache.Repository#11] not soft locked, returning underlying element
    WARN  [Alice]: b.t.t.Preparer - executing transaction with 0 enlisted resource
    [Alice]: n.s.e.t.l.LocalTransactionContext - 0 participating cache(s), committing transaction 7
    [Alice]: n.s.e.t.l.LocalTransactionContext - committed transaction 7
    [Alice]: n.s.e.t.l.LocalTransactionContext - unfreezing and unlocking 0 soft lock(s)
    

Conclusion

The TRANSACTIONAL CacheConcurrencyStrategy employes a READ_COMMITTED transaction isolation, preventing dirty reads while still allowing the lost updates phenomena. Adding optimistic locking can eliminate the lost update anomaly since the database transaction will rollback on version mismatches. Once the database transaction fails, the current XA transaction is rolled back, causing the cache to discard all uncommitted changes.

If the READ_WRITE concurrency strategy implies less overhead, the TRANSACTIONAL synchronization mechanism is appealing for higher write-read ratios (requiring less database hits compared to its READ_WRITE counterpart). The inherent performance penalty must be compared against the READ_WRITE extra database access, when deciding which mode is more suitable for a given data access pattern.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

How does Hibernate READ_WRITE CacheConcurrencyStrategy work

Introduction

In my previous post, I introduced the NONSTRICT_READ_WRITE second-level cache concurrency mechanism. In this article, I am going to continue this topic with the READ_WRITE strategy.

Write-through caching

NONSTRICT_READ_WRITE is a read-through caching strategy and updates end-up invalidating cache entries. As simple as this strategy may be, the performance drops with the increase of write operations. A write-through cache strategy is better choice for write-intensive applications, since cache entries can be undated rather than being discarded.

Because the database is the system of record and database operations are wrapped inside physical transactions the cache can either be updated synchronously (like it’s the case of the TRANSACTIONAL cache concurrency strategy) or asynchronously (right after the database transaction is committed).

The READ_WRITE strategy is an asynchronous cache concurrency mechanism and to prevent data integrity issues (e.g. stale cache entries), it uses a locking mechanism that provides unit-of-work isolation guarantees.

Inserting data

Because persisted entities are uniquely identified (each entity being assigned to a distinct database row), the newly created entities get cached right after the database transaction is committed:

@Override
public boolean afterInsert(
    Object key, Object value, Object version) 
        throws CacheException {
    region().writeLock( key );
    try {
        final Lockable item = 
            (Lockable) region().get( key );
        if ( item == null ) {
            region().put( key, 
                new Item( value, version, 
                    region().nextTimestamp() 
                ) 
            );
            return true;
        }
        else {
            return false;
        }
    }
    finally {
        region().writeUnlock( key );
    }
}

For an entity to be cached upon insertion, it must use a SEQUENCE generator, the cache being populated by the EntityInsertAction:

@Override
public void doAfterTransactionCompletion(boolean success, 
    SessionImplementor session) 
    throws HibernateException {

    final EntityPersister persister = getPersister();
    if ( success && isCachePutEnabled( persister, 
        getSession() ) ) {
            final CacheKey ck = getSession()
               .generateCacheKey( 
                    getId(), 
                    persister.getIdentifierType(), 
                    persister.getRootEntityName() );
                
            final boolean put = cacheAfterInsert( 
                persister, ck );
        }
    }
    postCommitInsert( success );
}

The IDENTITY generator doesn’t play well with the transactional write-behind first-level cache design, so the associated EntityIdentityInsertAction doesn’t cache newly inserted entries (at least until HHH-7964 is fixed).

Theoretically, between the database transaction commit and the second-level cache insert, one concurrent transaction might load the newly created entity, therefore triggering a cache insert. Although possible, the cache synchronization lag is very short and if a concurrent transaction is interleaved, it only makes the other transaction hit the database instead of loading the entity from the cache.

Updating data

While inserting entities is a rather simple operation, for updates, we need to synchronize both the database and the cache entry. The READ_WRITE concurrency strategy employs a locking mechanism to ensure data integrity:

ReadWriteCacheConcurrencyStrategy_Update

  1. The Hibernate Transaction commit procedure triggers a Session flush
  2. The EntityUpdateAction replaces the current cache entry with a Lock object
  3. The update method is used for synchronous cache updates so it doesn’t do anything when using an asynchronous cache concurrency strategy, like READ_WRITE
  4. After the database transaction is committed, the after-transaction-completion callbacks are called
  5. The EntityUpdateAction calls the afterUpdate method of the EntityRegionAccessStrategy
  6. The ReadWriteEhcacheEntityRegionAccessStrategy replaces the Lock entry with an actual Item, encapsulating the entity dissembled state

Deleting data

Deleting entities is similar to the update process, as we can see from the following sequence diagram:

ReadWriteCacheConcurrencyStrategy_Delete

  • The Hibernate Transaction commit procedure triggers a Session flush
  • The EntityDeleteAction replaces the current cache entry with a Lock object
  • The remove method call doesn’t do anything, since READ_WRITE is an asynchronous cache concurrency strategy
  • After the database transaction is committed, the after-transaction-completion callbacks are called
  • The EntityDeleteAction calls the unlockItem method of the EntityRegionAccessStrategy
  • The ReadWriteEhcacheEntityRegionAccessStrategy replaces the Lock entry with another Lock object whose timeout period is increased
  • After an entity is deleted, its associated second-level cache entry will be replaced by a Lock object, that’s making any subsequent request to read from the database instead of using the cache entry.

    Locking constructs

    Both the Item and the Lock classes inherit from the Lockable type and each of these two has a specific policy for allowing a cache entry to be read or written.

    The READ_WRITE Lock object

    The Lock class defines the following methods:

    @Override
    public boolean isReadable(long txTimestamp) {
        return false;
    }
    
    @Override
    public boolean isWriteable(long txTimestamp, 
        Object newVersion, Comparator versionComparator) {
        if ( txTimestamp > timeout ) {
            // if timedout then allow write
            return true;
        }
        if ( multiplicity > 0 ) {
            // if still locked then disallow write
            return false;
        }
        return version == null
            ? txTimestamp > unlockTimestamp
            : versionComparator.compare( version, 
                newVersion ) < 0;
    }
    
    • A Lock object doesn’t allow reading the cache entry, so any subsequent request must go to the database
    • If the current Session creation timestamp is greater than the Lock timeout threshold, the cache entry is allowed to be written
    • If at least one Session has managed to lock this entry, any write operation is forbidden
    • A Lock entry allows writing if the incoming entity state has incremented its version or the current Session creation timestamp is greater than the current entry unlocking timestamp

    The READ_WRITE Item object

    The Item class defines the following read/write access policy:

    @Override
    public boolean isReadable(long txTimestamp) {
        return txTimestamp > timestamp;
    }
    
    @Override
    public boolean isWriteable(long txTimestamp, 
        Object newVersion, Comparator versionComparator) {
        return version != null && versionComparator
            .compare( version, newVersion ) < 0;
    }
    
    • An Item is readable only from a Session that’s been started after the cache entry creation time
    • A Item entry allows writing only if the incoming entity state has incremented its version

    Cache entry concurrency control

    These concurrency control mechanism are invoked when saving and reading the underlying cache entries.

    The cache entry is read when the ReadWriteEhcacheEntityRegionAccessStrategy get method is called:

    public final Object get(Object key, long txTimestamp) 
        throws CacheException {
        readLockIfNeeded( key );
        try {
            final Lockable item = 
                (Lockable) region().get( key );
    
            final boolean readable = 
                item != null && 
                item.isReadable( txTimestamp );
                
            if ( readable ) {
                return item.getValue();
            }
            else {
                return null;
            }
        }
        finally {
            readUnlockIfNeeded( key );
        }
    }
    

    The cache entry is written by the ReadWriteEhcacheEntityRegionAccessStrategy putFromLoad method:

    public final boolean putFromLoad(
            Object key,
            Object value,
            long txTimestamp,
            Object version,
            boolean minimalPutOverride)
            throws CacheException {
        region().writeLock( key );
        try {
            final Lockable item = 
                (Lockable) region().get( key );
                
            final boolean writeable = 
                item == null || 
                item.isWriteable( 
                    txTimestamp, 
                    version, 
                    versionComparator );
                    
            if ( writeable ) {
                region().put( 
                    key, 
                    new Item( 
                        value, 
                        version, 
                        region().nextTimestamp() 
                    ) 
                );
                return true;
            }
            else {
                return false;
            }
        }
        finally {
            region().writeUnlock( key );
        }
    }
    

    Timing out

    If the database operation fails, the current cache entry holds a Lock object and it cannot rollback to its previous Item state. For this reason, the Lock must timeout to allow the cache entry to be replaced by an actual Item object. The EhcacheDataRegion defines the following timeout property:

    private static final String CACHE_LOCK_TIMEOUT_PROPERTY = 
        "net.sf.ehcache.hibernate.cache_lock_timeout";
    private static final int DEFAULT_CACHE_LOCK_TIMEOUT = 60000;
    

    Unless we override the net.sf.ehcache.hibernate.cache_lock_timeout property, the default timeout is 60 seconds:

    final String timeout = properties.getProperty(
        CACHE_LOCK_TIMEOUT_PROPERTY,
        Integer.toString( DEFAULT_CACHE_LOCK_TIMEOUT )
    );
    

    The following test will emulate a failing database transaction, so we can observe how the READ_WRITE cache only allows writing after the timeout threshold expires. First we are going to lower the timeout value, to reduce the cache freezing period:

    properties.put(
        "net.sf.ehcache.hibernate.cache_lock_timeout", 
        String.valueOf(250));
    

    We’ll use a custom interceptor to manually rollback the currently running transaction:

    @Override
    protected Interceptor interceptor() {
        return new EmptyInterceptor() {
            @Override
            public void beforeTransactionCompletion(
                Transaction tx) {
                if(applyInterceptor.get()) {
                    tx.rollback();
                }
            }
        };
    }
    

    The following routine will test the lock timeout behavior:

    try {
        doInTransaction(session -> {
            Repository repository = (Repository)
                session.get(Repository.class, 1L);
            repository.setName("High-Performance Hibernate");
            applyInterceptor.set(true);
        });
    } catch (Exception e) {
        LOGGER.info("Expected", e);
    }
    applyInterceptor.set(false);
    
    AtomicReference<Object> previousCacheEntryReference =
            new AtomicReference<>();
    AtomicBoolean cacheEntryChanged = new AtomicBoolean();
    
    while (!cacheEntryChanged.get()) {
        doInTransaction(session -> {
            boolean entryChange;
            session.get(Repository.class, 1L);
            
            try {
                Object previousCacheEntry = 
                    previousCacheEntryReference.get();
                Object cacheEntry = 
                    getCacheEntry(Repository.class, 1L);
                
                entryChange = previousCacheEntry != null &&
                    previousCacheEntry != cacheEntry;
                previousCacheEntryReference.set(cacheEntry);
                LOGGER.info("Cache entry {}", 
                    ToStringBuilder.reflectionToString(
                        cacheEntry));
                        
                if(!entryChange) {
                    sleep(100);
                } else {
                    cacheEntryChanged.set(true);
                }
            } catch (IllegalAccessException e) {
                LOGGER.error("Error accessing Cache", e);
            }
        });
    }
    

    Running this test generates the following output:

    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id=1
       
    update
       repository 
    set
       name='High-Performance Hibernate',
       version=1 
    where
       id=1 
       and version=0
    
    JdbcTransaction - rolled JDBC Connection
    
    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id = 1
    
    Cache entry net.sf.ehcache.Element@3f9a0805[
        key=ReadWriteCacheConcurrencyStrategyWithLockTimeoutTest$Repository#1,
        value=Lock Source-UUID:ac775350-3930-4042-84b8-362b64c47e4b Lock-ID:0,
            version=1,
            hitCount=3,
            timeToLive=120,
            timeToIdle=120,
            lastUpdateTime=1432280657865,
            cacheDefaultLifespan=true,id=0
    ]
    Wait 100 ms!
    JdbcTransaction - committed JDBC Connection
    
    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id = 1
       
    Cache entry net.sf.ehcache.Element@3f9a0805[
        key=ReadWriteCacheConcurrencyStrategyWithLockTimeoutTest$Repository#1,
        value=Lock Source-UUID:ac775350-3930-4042-84b8-362b64c47e4b Lock-ID:0,
            version=1,
            hitCount=3,
            timeToLive=120,
            timeToIdle=120,
            lastUpdateTime=1432280657865,
            cacheDefaultLifespan=true,
            id=0
    ]
    Wait 100 ms!
    JdbcTransaction - committed JDBC Connection
    
    select
       readwritec0_.id as id1_0_0_,
       readwritec0_.name as name2_0_0_,
       readwritec0_.version as version3_0_0_ 
    from
       repository readwritec0_ 
    where
       readwritec0_.id = 1
    Cache entry net.sf.ehcache.Element@305f031[
        key=ReadWriteCacheConcurrencyStrategyWithLockTimeoutTest$Repository#1,
        value=org.hibernate.cache.ehcache.internal.strategy.AbstractReadWriteEhcacheAccessStrategy$Item@592e843a,
            version=1,
            hitCount=1,
            timeToLive=120,
            timeToIdle=120,
            lastUpdateTime=1432280658322,
            cacheDefaultLifespan=true,
            id=0
    ]
    JdbcTransaction - committed JDBC Connection
    
    • The first transaction tries to update an entity, so the associated second-level cache entry is locked prior to committing the transaction.
    • The first transaction fails and it gets rolled back
    • The lock is being held, so the next two successive transactions are going to the database, without replacing the Lock entry with the current loaded database entity state
    • After the Lock timeout period expires, the third transaction can finally replace the Lock with an Item cache entry (holding the entity disassembled hydrated state)

    Conclusion

    The READ_WRITE concurrency strategy offers the benefits of a write-through caching mechanism, but you need to understand it’s inner workings to decide if it’s good fit for your current project data access requirements.

    For heavy write contention scenarios, the locking constructs will make other concurrent transactions hit the database, so you must decide if a synchronous cache concurrency strategy is better suited in this situation.

    Code available on GitHub.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

    How does Hibernate NONSTRICT_READ_WRITE CacheConcurrencyStrategy work

    Introduction

    In my previous post, I introduced the READ_ONLY CacheConcurrencyStrategy, which is the obvious choice for immutable entity graphs. When cached data is changeable, we need to use a read-write caching strategy and this post will describe how NONSTRICT_READ_WRITE second-level cache works.

    Inner workings

    When the Hibernate transaction is committed, the following sequence of operations is executed:

    NonStrictReadWriteCacheConcurrencyStrategy

    First, the cache is invalidated before the database transaction gets committed, during flush time:

    1. The current Hibernate Transaction (e.g. JdbcTransaction, JtaTransaction) is flushed
    2. The DefaultFlushEventListener executes the current ActionQueue
    3. The EntityUpdateAction calls the update method of the EntityRegionAccessStrategy
    4. The NonStrictReadWriteEhcacheCollectionRegionAccessStrategy removes the cache entry from the underlying EhcacheEntityRegion

    After the database transaction is committed, the cache entry is removed once more:

    1. The current Hibernate Transaction after completion callback is called
    2. The current Session propagates this event to its internal ActionQueue
    3. The EntityUpdateAction calls the afterUpdate method on the EntityRegionAccessStrategy
    4. The NonStrictReadWriteEhcacheCollectionRegionAccessStrategy calls the remove method on the underlying EhcacheEntityRegion

    Inconsistency warning

    The NONSTRICT_READ_WRITE mode is not a write-though caching strategy because cache entries are invalidated, instead of being updated. Tthe cache invalidation is not synchronized with the current database transaction. Even if the associated Cache region entry gets invalidated twice (before and after transaction completion), there’s still a tiny time window when the cache and the database might drift apart.

    The following test will demonstrate this issue. First we are going to define Alice transaction logic:

    doInTransaction(session -> {
        LOGGER.info("Load and modify Repository");
        Repository repository = (Repository)
            session.get(Repository.class, 1L);
        assertTrue(getSessionFactory().getCache()
            .containsEntity(Repository.class, 1L));
        repository.setName("High-Performance Hibernate");
        applyInterceptor.set(true);
    });
    
    endLatch.await();
    
    assertFalse(getSessionFactory().getCache()
        .containsEntity(Repository.class, 1L));
    
    doInTransaction(session -> {
        applyInterceptor.set(false);
        Repository repository = (Repository)
            session.get(Repository.class, 1L);
        LOGGER.info("Cached Repository {}", repository);
    });
    

    Alice loads a Repository entity and modifies it in her first database transaction.
    To spawn another concurrent transaction right when Alice prepares to commit, we are going to use the following Hibernate Interceptor:

    private AtomicBoolean applyInterceptor = 
        new AtomicBoolean();
    
    private final CountDownLatch endLatch = 
        new CountDownLatch(1);
    
    private class BobTransaction extends EmptyInterceptor {
        @Override
        public void beforeTransactionCompletion(Transaction tx) {
            if(applyInterceptor.get()) {
                LOGGER.info("Fetch Repository");
    
                assertFalse(getSessionFactory().getCache()
                    .containsEntity(Repository.class, 1L));
    
                executeSync(() -> {
                    Session _session = getSessionFactory()
                        .openSession();
                    Repository repository = (Repository) 
                        _session.get(Repository.class, 1L);
                    LOGGER.info("Cached Repository {}", 
                        repository);
                    _session.close();
                    endLatch.countDown();
                });
    
                assertTrue(getSessionFactory().getCache()
                    .containsEntity(Repository.class, 1L));
            }
        }
    }
    

    Running this code generates the following output:

    [Alice]: Load and modify Repository
    [Alice]: select nonstrictr0_.id as id1_0_0_, nonstrictr0_.name as name2_0_0_ from repository nonstrictr0_ where nonstrictr0_.id=1
    [Alice]: update repository set name='High-Performance Hibernate' where id=1
    
    [Alice]: Fetch Repository from another transaction
    [Bob]: select nonstrictr0_.id as id1_0_0_, nonstrictr0_.name as name2_0_0_ from repository nonstrictr0_ where nonstrictr0_.id=1
    [Bob]: Cached Repository from Bob's transaction Repository{id=1, name='Hibernate-Master-Class'}
    
    [Alice]: committed JDBC Connection
    
    [Alice]: select nonstrictr0_.id as id1_0_0_, nonstrictr0_.name as name2_0_0_ from repository nonstrictr0_ where nonstrictr0_.id=1
    [Alice]: Cached Repository Repository{id=1, name='High-Performance Hibernate'}
    
    1. Alice fetches a Repository and updates its name
    2. The custom Hibernate Interceptor is invoked and Bob’s transaction is started
    3. Because the Repository was evicted from the Cache, Bob will load the 2nd level cache with the current database snapshot
    4. Alice transaction commits, but now the Cache contains the previous database snapshot that Bob’s just loaded
    5. If a third user will now fetch the Repository entity, he will also see a stale entity version which is different from the current database snapshot
    6. After Alice transaction is committed, the Cache entry is evicted again and any subsequent entity load request will populate the Cache with the current database snapshot

    Stale data vs lost updates

    The NONSTRICT_READ_WRITE concurrency strategy introduces a tiny window of inconsistency when the database and the second-level cache can go out of sync. While this might sound terrible, in reality we should always design our applications to cope with these situations even if we don’t use a second-level cache. Hibernate offers application-level repeatable reads through its transactional write-behind first-level cache and all managed entities are subject to becoming stale. Right after an entity is loaded into the current Persistence Context, another concurrent transaction might update it and so, we need to prevent stale data from escalating to losing updates.

    Optimistic concurrency control is an effective way of dealing with lost updates in long conversations and this technique can mitigate the NONSTRICT_READ_WRITE inconsistency issue as well.

    Conclusion

    The NONSTRICT_READ_WRITE concurrency strategy is a good choice for read-mostly applications (if backed-up by the optimistic locking mechanism). For write-intensive scenarios, the cache invalidation mechanism would increase the cache miss rate, therefore rendering this technique inefficient.

    Code available on GitHub.

    If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.