Things to consider before jumping to enterprise caching

Introduction

Relational database transactions are ACID and the strong consistency model simplifies application development. Because enabling Hibernate caching is one configurations away, it’s very appealing to turn to caching whenever the data access layer starts showing performance issues. Adding a caching layer can indeed improve application performance, but it has its price and you need to be aware of it.

Database performance tuning

The database is therefore the central part of any enterprise application, containing valuable business assets. A database server has limited resources and it can therefore serve a finite number of connections. The shorter the database transactions, the more transactions can be accommodated. The first performance tuning action is to reduce the query execution times by indexing properly and optimizing queries.

When all queries and statements are optimized, we can either add more resources (scale up) or adding more database nodes (scale out). Horizontal scaling requires database replication, which implies synchronizing nodes. Synchronous replication preserve strong consistency, while asynchronous master-slave replication leads to eventual consistency.

Analogous to database replication challenges, cache nodes induce data synchronization issues, especially for distributed enterprise applications.

Caching

Even if the database access patterns are properly optimized, higher loads might increase latency. To provide predictable and constant response times, we need to turn to caching. Caching allows us to reuse a database response for multiple user requests.

The cache can therefore:

  • reduce CPU/Memory/IO resource consumption on the database side
  • reduce network traffic between application nodes and the database tier
  • provide constant result fetch time, insensitive to traffic bursts
  • provide a read-only view when the application is in maintenance mode (e.g. when upgrading the database schema)

The downside of introducing a caching solution is that data is duplicated in two separate technologies that may easily desynchronise.

In the simplest use case you have one database server and one cache node:

SingleCacheNode

The caching abstraction layer is aware of the database server, but the database knows nothing of the application-level cache. If some external process updates the database without touching the cache, the two data sources will get out of sync. Because few database servers support application-level notifications, the cache may break the strong consistency guarantees.

To avoid eventual consistency, both the database and the cache need to be enrolled in a distributed XA transaction, so the affected cache entries are either updated or invalidated synchronously.

Most often, there are more application nodes or multiple distinct applications (web-fronts, batch processors, schedulers) comprising the whole enterprise system:

MultipleCacheNodes

If each node has its own isolated cache node, we need to be aware of possible data synchronisation issues. If one node updates the database and its own cache without notifying the rest, then other cache nodes get out of sync.

In a distributed environment, when multiple applications or application nodes use caching, we need to use a distributed caching solution, so that:

  • cache nodes communicate in a peer-to-peer topology
  • cache nodes communicate in a client-server topology and a central cache server takes care of data synchronization

DistributedCacheNodes

Conclusion

Caching is a fine scaling technique but you have to be aware of possible consistency issues. Taking into consideration your current project data integrity requirements, you need to design your application to take advantage of caching without compromising critical data.

Caching is not a cross-cutting concern, leaking into your application architecture and requiring a well-thought plan for compensating data integrity anomalies.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

How does Hibernate store second-level cache entries

Introduction

The benefit of using a database access abstraction layer is that caching can be implemented transparently, without leaking into the business logic code. Hibernate Persistence Context acts as a transactional write-behind cache, translating entity state transitions into DML statements.

The Persistence Context acts as a logical transaction storage, and each Entity instance can have at-most one managed reference. No matter how many times we try to load the same Entity, the Hibernate Session will always return the same object reference. This behavior is generally depicted as the first-level cache.

The Hibernate Persistence Context is not a caching solution per se, serving a different purpose than boosting application read operation performance. Because the Hibernate Session is bound to the currently running logical transaction, once the transaction is over, the Session is being destroyed.

The second-level cache

A proper caching solution would have to span across multiple Hibernate Sessions and that’s the reason Hibernate supports an additional second-level cache as well. The second-level cache is bound to the SessionFactory life-cycle, so it’s destroyed only when the SessionFactory is closed (topically when the application is shutting down). The second-level cache is primarily entity-based oriented, although it supports an optional query-caching solution as well.

By default, the second-level cache is disabled and to activate it, we have to set the following Hibernate properties:

properties.put("hibernate.cache.use_second_level_cache", 
    Boolean.TRUE.toString());
properties.put("hibernate.cache.region.factory_class", 
    "org.hibernate.cache.ehcache.EhCacheRegionFactory");

The RegionFactory defines the second-level cache implementation provider, and the hibernate.cache.region.factory_class configuration is mandatory, once the hibernate.cache.use_second_level_cache property is set to true.

To enable entity-level caching, we need to annotate our cacheable entities as follows:

@Entity
@org.hibernate.annotations.Cache(usage = 
    CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)

JPA also defines the @Cacheable annotation, but it doesn’t support setting the concurrency strategy on entity-level.

The entity loading flow

Whenever an entity is to be loaded, a LoadEevent is fired and the DefaultLoadEventListener handles it as follows:

Object entity = loadFromSessionCache( event, 
    keyToLoad, options );
if ( entity == REMOVED_ENTITY_MARKER ) {
    LOG.debug("Load request found matching entity 
        in context, but it is scheduled for removal;
        returning null" );
    return null;
}
if ( entity == INCONSISTENT_RTN_CLASS_MARKER ) {
    LOG.debug("Load request found matching entity 
        in context, but the matched entity was of
        an inconsistent return type;
        returning null"
    );
    return null;
}
if ( entity != null ) {
    if ( traceEnabled ) {
        LOG.tracev("Resolved object in "
            + "session cache: {0}",
            MessageHelper.infoString( persister,
                event.getEntityId(),
                event.getSession().getFactory() )
        );
    }
    return entity;
}

entity = loadFromSecondLevelCache( event, 
    persister, options );
if ( entity != null ) {
    if ( traceEnabled ) {
        LOG.tracev("Resolved object in "
            + "second-level cache: {0}",
            MessageHelper.infoString( persister,
                event.getEntityId(),
                event.getSession().getFactory() )
        );
    }
}
else {
    if ( traceEnabled ) {
        LOG.tracev("Object not resolved in "
            + "any cache: {0}",
            MessageHelper.infoString( persister,
                event.getEntityId(),
                event.getSession().getFactory() )
        );
    }
    entity = loadFromDatasource( event, persister, 
        keyToLoad, options );
}

The Session is always inspected first, because it might already contain a managed entity instance. The second-level cache is verified before hitting the database, so its main purpose is to reduce the number of database accesses.

Second-level cache internals

Every entity is stored as a CacheEntry, and the entity hydrated state is used for creating the cache entry value.

Hydration

In Hibernate nomenclature, hydration is when a JDBC ResultSet is transformed to an array of raw values:

final Object[] values = persister.hydrate(
    rs, id, object, 
    rootPersister, cols, eagerPropertyFetch, session
);

The hydrated state is saved in the currently running Persistence Context as an EntityEntry object, which encapsulated the loading-time entity snapshot. The hydrated state is then used by:

  • the default dirty checking mechanism, which compares the current entity data against the loading-time snapshot
  • the second-level cache, whose cache entries are built from the the loading-time entity snapshot

The inverse operation is called dehydration and it copies the entity state into an INSERT or UPDATE statement.

The second-level cache elements

Although Hibernate allows us to manipulate entity graphs, the second-level cache uses a disassembled hydrated state instead:

final CacheEntry entry = persister.buildCacheEntry( 
    entity, hydratedState, version, session );

The hydrated state is disassembled prior to being stored in the CacheEntry:

this.disassembledState = TypeHelper.disassemble(
    state, persister.getPropertyTypes(),
    persister.isLazyPropertiesCacheable() 
        ? null : persister.getPropertyLaziness(),
    session, owner
);

Starting from the following entity model diagram:

PostCommentDetailsSecondLevelCache

We’ll insert the following entities:

Post post = new Post();
post.setName("Hibernate Master Class");

post.addDetails(new PostDetails());
post.addComment(new Comment("Good post!"));
post.addComment(new Comment("Nice post!"));

session.persist(post);

Now, we are going to check each individual entity cache element.

The Post entity cache element

The Post entity has a one-to-many association to the Comment entity and an inverse one-to-one association to a PostDetails:

@OneToMany(cascade = CascadeType.ALL, 
    mappedBy = "post")
private List<Comment> comments = new ArrayList<>();

@OneToOne(cascade = CascadeType.ALL, 
    mappedBy = "post", optional = true)
private PostDetails details;

When fetching a Post entity:

Post post = (Post) session.get(Post.class, 1L);

The associated cache element looks like this:

key = {org.hibernate.cache.spi.CacheKey@3855}
    key = {java.lang.Long@3860} "1"
    type = {org.hibernate.type.LongType@3861} 
    entityOrRoleName = {java.lang.String@3862} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.SecondLevelCacheTest$Post"
    tenantId = null
    hashCode = 31
value = {org.hibernate.cache.spi.entry.StandardCacheEntryImpl@3856}
    disassembledState = {java.io.Serializable[3]@3864} 
        0 = {java.lang.Long@3860} "1"
        1 = {java.lang.String@3865} "Hibernate Master Class"
    subclass = {java.lang.String@3862} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.SecondLevelCacheTest$Post"
    lazyPropertiesAreUnfetched = false
    version = null

The CacheKey contains the entity identifier and the CacheEntry contains the entity disassembled hydrated state.

The Post entry cache value consists of the name column and the id, which is set by the one-to-many Comment association.

Neither the one-to-many nor the inverse one-to-one associations are embedded in the Post CacheEntry.

The PostDetails entity cache element

The PostDetails entity Primary Key is referencing the associated Post entity Primary Key, and it therefore has a one-to-one association with the Post entity.

@OneToOne
@JoinColumn(name = "id")
@MapsId
private Post post;

When fetching a PostDetails entity:

PostDetails postDetails = 
    (PostDetails) session.get(PostDetails.class, 1L);

The second-level cache generate the following cache element:

key = {org.hibernate.cache.spi.CacheKey@3927}
    key = {java.lang.Long@3897} "1"
    type = {org.hibernate.type.LongType@3898} 
    entityOrRoleName = {java.lang.String@3932} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.SecondLevelCacheTest$PostDetails"
    tenantId = null
    hashCode = 31
value = {org.hibernate.cache.spi.entry.StandardCacheEntryImpl@3928}
    disassembledState = {java.io.Serializable[2]@3933} 
        0 = {java.sql.Timestamp@3935} "2015-04-06 15:36:13.626"
    subclass = {java.lang.String@3932} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.SecondLevelCacheTest$PostDetails"
    lazyPropertiesAreUnfetched = false
    version = null

The disassembled state contains only the createdOn entity property, since the entity identifier is embedded in the CacheKey.

The Comment entity cache element

The Comment entity has a many-to-one association to a Post:

@ManyToOne
private Post post;

When we fetch a Comment entity:

Comment comments = 
    (Comment) session.get(Comment.class, 1L);

Hibernate generates the following second-level cache element:

key = {org.hibernate.cache.spi.CacheKey@3857}
    key = {java.lang.Long@3864} "2"
    type = {org.hibernate.type.LongType@3865} 
    entityOrRoleName = {java.lang.String@3863} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.SecondLevelCacheTest$Comment"
    tenantId = null
    hashCode = 62
value = {org.hibernate.cache.spi.entry.StandardCacheEntryImpl@3858}
    disassembledState = {java.io.Serializable[2]@3862} 
        0 = {java.lang.Long@3867} "1"
        1 = {java.lang.String@3868} "Good post!"
    subclass = {java.lang.String@3863} "com.vladmihalcea.hibernate.masterclass.laboratory.cache.SecondLevelCacheTest$Comment"
    lazyPropertiesAreUnfetched = false
    version = null

The disassembled state contains the Post.id Foreign Key reference and the review column, therefore mirroring the associated database table definition.

Conclusion

The second-level cache is a relational data cache, so it stores data in a normalized form and each entity update affects only one cache entry. Reading a whole entity graph is not possible, since the entity associations are not materialized in the second-level cache entries.

An aggregated entity graph yields better performance for read operations at the cost of complicating write operations. If the cached data is not normalized and scattered across various aggregated models, an entity update would have to modify multiple cache entries, therefore affecting the write operations performance.

Because it mirrors the underlying relation data, the second-level cache offers various concurrency strategy mechanisms so we can balance read performance and strong consistency guarantees.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

SELECT statements batch fetching with JDBC and Hibernate

Introduction

Now that I covered Hibernate batch support for INSERT, UPDATE and DELETE statements, it’s time to analyze SELECT statements result set batch fetching.

JDBC ResultSet fetching

The JDBC ResultSet offers a client-side Proxy cursor for fetching the current statement return data. When the statement gets executed, the result must be transferred from the database cursor to the client-side one. This operation can either be done at once or on demand.

There are three types of ResultSet cursors:

Cursor Type Description
TYPE_FORWARD_ONLY This is the default ResultSet cursor type. The result set can only be moved forward and the resulted data can either be fetched at once or retrieved while the cursor is being iterated. The database can decide to fetch the data as it was available at the time the query started or as it is upon fetching.
TYPE_SCROLL_INSENSITIVE The result set can be scrolled both forward and backward and the resulted data is insensitive to concurrent changes occurring while the cursor is still open
TYPE_SCROLL_SENSITIVE The result set can be scrolled both forward and backward and the resulted data is sensitive to concurrent changes occurring while the cursor is still open. The data is therefore fetched on demand as opposed to being retrieved from a database cursor cache

Not all database drivers implement all cursor types and the batch fetching behavior is controlled through the JDBC Statement fetchSize property, which according to the Javadoc:

Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement. If the value specified is zero, then the hint is ignored. The default value is zero.

The default fetching strategy is therefore database specific and from the application performance point of view, this aspect is very important when tuning the data access layer:

  • Oracle

    By default, when Oracle JDBC runs a query, it retrieves a result set of 10 rows at a time from the database cursor.

  • MySQL

    By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement.

  • SQL Server

    Normally, when the Microsoft JDBC Driver for SQL Server executes a query, the driver retrieves all of the results from the server into application memory. Although this approach minimizes resource consumption on the SQL Server, it can throw an OutOfMemoryError in the JDBC application for the queries that produce very large results.

  • PostgreSQL

    By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

  • DB2

    By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows. The fetchSize property differs from the queryDataSize property. fetchSize affects the number of rows that are returned, and queryDataSize affects the number of bytes that are returned.

    For example, if the result set size is 50 KB, and the value of queryDataSize is 32767 (32KB), two trips to the database server are required to retrieve the result set. However, if queryDataSize is set to 65535 (64 KB), only one trip to the data source is required to retrieve the result set.

Hibernate ResultSet traversing options

The Java Persistence Query interface offers only full-result retrievals, through the Query.getResultList() method call.

Hibernate
also supports scrollable ResultSet cursors through its specific Query.scroll() API.

The only apparent advantage of scrollable ResultSets is that we can avoid memory issues on the client-side, since data is being fetched on demand. This might sound like a natural choice, but in reality, you shouldn’t fetch large result sets for the following reasons:

  • Large result sets impose significant database server resources and because a database is a highly concurrent environment, it might hinder availability and scalability
  • Tables tend to grow in size and a moderate result set might easily turn into a very large one. This kind of situation happens in production systems, long after the application code was shipped. Because users can only browse a relatively small portion of the whole result set, pagination is a more scalable data fetching alternative
  • The overly common offset paging is not suitable for large result sets (because the response time increases linearly with the page number) and you should consider keyset pagination when traversing large result sets. The keyset pagination offers a constant response time insensitive to the relative position of the page being fetched
  • Even for batch processing jobs, it’s always safer to restrict processing items to a moderate batch size. Large batches can lead to memory issues or cause long-running transactions, which increase the undo/redo transaction log size

Testing time

Our domain entity model looks like this:

PostCommentFetchSize

The following test will be use for validating various result set fetching behaviors:

@Test
public void testFetchSize() {
    doInTransaction(session -> {
        int batchSize = batchSize();
        for(int i = 0; i < itemsCount(); i++) {
            Post post = new Post(String.format(
                "Post no. %d", i));
            int j = 0;
            post.addComment(new Comment(
                    String.format(
                "Post comment %d:%d", i, j++)));
            post.addComment(new Comment(
                    String.format(
                "Post comment %d:%d", i, j++)));
            session.persist(post);
            if(i % batchSize == 0 && i > 0) {
                session.flush();
                session.clear();
            }
        }
    });

    long startNanos = System.nanoTime();
    LOGGER.info("Test fetch size");
    doInTransaction(session -> {
        List posts = session.createQuery(
            "select p " +
            "from Post p " +
            "join fetch p.comments ")
        .list();
        LOGGER.info("{}.fetched {} entities",
            getClass().getSimpleName(),
            posts.size());

    });
    LOGGER.info("{}.testFetch took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos
    ));
}

To configure Hibernate to use an explicit Statement fetchSize, we need to set the following Hibernate property:

properties.put("hibernate.jdbc.fetch_size", fetchSize());

Every test will insert 5000 Post entities, each one having 2 Comments.

Oracle

The first tests are run against Oracle XE 11.2.0.4 with the following results:

Fetch Size Duration [millis]
1 1190
10 640
100 481
1000 459
10000 449
Default (10) 545

The larger the fetch size, the less round trips are required for fetching the whole result set. If the returned rows contain many columns, a larger fetch size would require proportionally larger database buffers. According to Oracle JDBC Driver documentation:

What is “reasonable” depends on the details of the application. Oracle suggests the fetchSize be no more than 100, although in some cases larger size may be appropriate. A fetchSize of 100 may be inappropriately large for some queries even when many rows are returned.

This guideline is validated by our test result, as response time doesn’t improve significantly for fetch sizes larger than 100.

PostgreSQL

The second test round is run against PostgreSQL 9.4 with the following results:

Fetch Size Duration [millis]
1 1181
10 572
100 485
1000 458
10000 437
Default (all) 396

The default fetch size yields the best result, even when the fetchSize is equal to the total number of rows being returned. Since there is no upper-bound buffer limit, the default fetch size can cause OutOfMemoryError issues when retrieving large result sets.

Conclusion

While most database serves don’t impose a default upper limit for the result set fetch size, it’s a good practice to limit the whole result set (if requirements allow it). A limited size result set should address the unbounded fetch size shortcoming, while ensuring predictable response times even when the queried data grows gradually. The shorter the queries, the quicker the row-level locks are released and the more scalable the data access layer becomes.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

How to batch DELETE statements with Hibernate

Introduction

In my previous post, I explained the Hibernate configurations required for batching INSERT and UPDATE statements. This post will continue this topic with DELETE statements batching.

Domain model entities

We’ll start with the following entity model:

PostCommentDetailsBatchDelete

The Post entity has a one-to-many association to a Comment and a one-to-one relationship with the PostDetails entity:

@OneToMany(cascade = CascadeType.ALL, mappedBy = "post",
        orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();

@OneToOne(cascade = CascadeType.ALL, mappedBy = "post",
        orphanRemoval = true, fetch = FetchType.LAZY)
private PostDetails details;

The up-coming tests will be run against the following data:

doInTransaction(session -> {
    int batchSize = batchSize();
    for(int i = 0; i < itemsCount(); i++) {
        int j = 0;

        Post post = new Post(String.format(
            "Post no. %d", i));        
        post.addComment(new Comment( String.format(
            "Post comment %d:%d", i, j++)));
        post.addComment(new Comment(String.format(
            "Post comment %d:%d", i, j++)));
        post.addDetails(new PostDetails());

        session.persist(post);
        if(i % batchSize == 0 && i > 0) {
            session.flush();
            session.clear();
        }
    }
});

Hibernate Configuration

As already explained, the following properties are required for batching INSERT and UPDATE statements:

properties.put("hibernate.jdbc.batch_size", 
    String.valueOf(batchSize()));
properties.put("hibernate.order_inserts", 
    "true");
properties.put("hibernate.order_updates", 
    "true");
properties.put("hibernate.jdbc.batch_versioned_data", 
    "true");

Next, we are going to check if DELETE statements are batched as well.

JPA Cascade Delete

Because cascading entity state transitions is convenient, I’m going to prove that CascadeType.DELETE and JDBC batching don’t mix well.

The following tests is going to:

  • Select some Posts along with Comments and PostDetails
  • Delete the Posts, while propagating the delete event to Comments and PostDetails as well
@Test
public void testCascadeDelete() {
    LOGGER.info("Test batch delete with cascade");
    final AtomicReference<Long> startNanos = 
        new AtomicReference<>();
    addDeleteBatchingRows();
    doInTransaction(session -> {
        List<Post> posts = session.createQuery(
            "select distinct p " +
            "from Post p " +
            "join fetch p.details d " +
            "join fetch p.comments c")
        .list();
        startNanos.set(System.nanoTime());
        for (Post post : posts) {
            session.delete(post);
        }
    });
    LOGGER.info("{}.testCascadeDelete took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos.get()
    ));
}

Running this test gives the following output:

Query:{[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][56,0]} 
Query:{[delete from PostDetails where id=?][3]} 
Query:{[delete from Post where id=? and version=?][3,0]} 
Query:{[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][53,0]} 
Query:{[delete from PostDetails where id=?][2]} 
Query:{[delete from Post where id=? and version=?][2,0]} 
Query:{[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,0]} 
Query:{[delete from PostDetails where id=?][1]} 
Query:{[delete from Post where id=? and version=?][1,0]}

Only the Comment DELETE statements were batched, the other entities being deleted in separate database round-trips.

The reason for this behaviour is given by the ActionQueue sorting implementation:

if ( session.getFactory().getSettings().isOrderUpdatesEnabled() ) {
    // sort the updates by pk
    updates.sort();
}
if ( session.getFactory().getSettings().isOrderInsertsEnabled() ) {
    insertions.sort();
}

While INSERTS and UPDATES are covered, DELETE statements are not sorted at all. A JDBC batch can only be reused when all statements belong to the same database table. When an incoming statement targets a different database table, the current batch has to be released, so that the new batch matches the current statement database table:

public Batch getBatch(BatchKey key) {
    if ( currentBatch != null ) {
        if ( currentBatch.getKey().equals( key ) ) {
            return currentBatch;
        }
        else {
            currentBatch.execute();
            currentBatch.release();
        }
    }
    currentBatch = batchBuilder().buildBatch(key, this);
    return currentBatch;
}

Orphan removal and manual flushing

A work-around is to dissociate all Child entities while manually flushing the Hibernate Session before advancing to a new Child association:

@Test
public void testOrphanRemoval() {
    LOGGER.info("Test batch delete with orphan removal");
    final AtomicReference<Long> startNanos = 
        new AtomicReference<>();

    addDeleteBatchingRows();

    doInTransaction(session -> {
        List<Post> posts = session.createQuery(
            "select distinct p " +
            "from Post p " +
            "join fetch p.details d " +
            "join fetch p.comments c")
        .list();

        startNanos.set(System.nanoTime());

        posts.forEach(Post::removeDetails);
        session.flush();

        posts.forEach(post -> {
            for (Iterator<Comment> commentIterator = 
                     post.getComments().iterator(); 
                     commentIterator.hasNext(); ) {
                Comment comment =  commentIterator.next();
                comment.post = null;
                commentIterator.remove();
            }
        });
        session.flush();

        posts.forEach(session::delete);
    });
    LOGGER.info("{}.testOrphanRemoval took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos.get()
    ));
}

This time all DELETE statements are properly batched:

Query:{[delete from PostDetails where id=?][2]} {[delete from PostDetails where id=?][3]} {[delete from PostDetails where id=?][1]} 
Query:{[delete from Comment where id=? and version=?][53,0]} {[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][56,0]} {[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,
Query:{[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]} {[delete from Post where id=? and version=?][1,0]} 

SQL Cascade Delete

A better solution is to use SQL cascade deletion, instead of JPA entity state propagation mechanism. This way, we can also reduce the DML statements count. Because Hibernate Session acts as a transactional write-behind cache, we must be extra cautious when mixing entity state transitions with database-side automatic actions, as the Persistence Context might not reflect the latest database changes.

The Post entity one-to-many Comment association is marked with the Hibernate specific @OnDelete annotation, so that the auto-generated database schema includes the ON DELETE CASCADE directive:

@OneToMany(cascade = {
       CascadeType.PERSIST, 
       CascadeType.MERGE}, 
    mappedBy = "post")
@OnDelete(action = OnDeleteAction.CASCADE)
private List<Comment> comments = new ArrayList<>();

Generating the following DDL:

alter table Comment add constraint 
FK_apirq8ka64iidc18f3k6x5tc5 foreign key (post_id) 
references Post on delete cascade

The same is done with the PostDetails entity one-to-one Post association:

@OneToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "id")
@MapsId
@OnDelete(action = OnDeleteAction.CASCADE)
private Post post;

And the associated DDL:

alter table PostDetails add constraint 
FK_h14un5v94coafqonc6medfpv8 foreign key (id) 
references Post on delete cascade

The CascadeType.ALL and orphanRemoval were replaced with CascadeType.PERSIST and CascadeType.MERGE, because we no longer want Hibernate to propagate the entity removal event.

The test only deletes the Post entities.

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select p from Post p")
    .list();
    startNanos.set(System.nanoTime());
    for (Post post : posts) {
        session.delete(post);
    }
});

The DELETE statements are properly batched as there’s only one target table.

Query:{[delete from Post where id=? and version=?][1,0]} {[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]}

Conclusion

If INSERT and UPDATE statements batching is just a matter of configuration, DELETE statements require some additional steps, which may increase the data access layer complexity.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

How to batch INSERT and UPDATE statements with Hibernate

Introduction

JDBC has long been offering support for DML statement batching. By default, all statements are sent one after the other, each one in a separate network round-trip. Batching allows us to send multiple statements in one-shot, saving unnecessary socket stream flushing.

Hibernate hides the database statements behind a transactional write-behind abstraction layer. An intermediate layer allows us to hide the JDBC batching semantics from the persistence layer logic. This way, we can change the JDBC batching strategy without altering the data access code.

Configuring Hibernate to support JDBC batching is not as easy as it should be, so I’m going to explain everything you need to do in order to make it work.

Testing time

We’ll start with the following entity model:

PostCommentJdbcBatch

The Post has a one-to-many association with the Comment entity:

@OneToMany(
    cascade = CascadeType.ALL, 
    mappedBy = "post", 
    orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();

Or test scenario issues both INSERT and UPDATE statements, so we can validate if JDBC batching is being used:

LOGGER.info("Test batch insert");
long startNanos = System.nanoTime();
doInTransaction(session -> {
    int batchSize = batchSize();
    for(int i = 0; i < itemsCount(); i++) {
        Post post = new Post(
            String.format("Post no. %d", i)
        );
        int j = 0;
        post.addComment(new Comment(
                String.format(
                    "Post comment %d:%d", i, j++
        )));
        post.addComment(new Comment(
                String.format(
                     "Post comment %d:%d", i, j++
        )));
        session.persist(post);
        if(i % batchSize == 0 && i > 0) {
            session.flush();
            session.clear();
        }
    }
});
LOGGER.info("{}.testInsert took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos
    ));

LOGGER.info("Test batch update");
startNanos = System.nanoTime();

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select distinct p " +
        "from Post p " +
        "join fetch p.comments c")
    .list();

    for(Post post : posts) {
        post.title = "Blog " + post.title;
        for(Comment comment : post.comments) {
            comment.review = "Blog " + comment.review;
        }
    }
});

LOGGER.info("{}.testUpdate took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos
    ));

This test will persist a configurable number of Post entities, each one containing two Comments. For the sake of brevity, we are going to persist 3 Posts and the Dialect default batch size:

protected int itemsCount() {
    return 3;
}

protected int batchSize() {
    return Integer.valueOf(Dialect.DEFAULT_BATCH_SIZE);
}

Default batch support

Hibernate doesn’t implicitly employ JDBC batching and each INSERT and UPDATE statement is executed separately:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} 
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} 
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]} 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]} 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]} 

Configuring hibernate.jdbc.batch_size

To enable JDBC batching, we have to configure the hibernate.jdbc.batch_size property:

A non-zero value enables use of JDBC2 batch updates by Hibernate (e.g. recommended values between 5 and 30)

We’ll set this property and rerun our test:

properties.put("hibernate.jdbc.batch_size", 
    String.valueOf(batchSize()));

This time, the Comment INSERT statements are batched, while the UPDATE statements are left untouched:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} 
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} 
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}

A JDBC batch can target one table only, so every new DML statement targeting a different table ends up the current batch and initiates a new one. Mixing different table statements is therefore undesirable when using SQL batch processing.

Ordering statements

Hibernate can sort INSERT and UPDATE statements using the following configuration options:

properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");

While the Post and Comment INSERT statements are batched accordingly, the UPDATE statements are still executed separately:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 

Adding version data batch support

There’s the hibernate.jdbc.batch_versioned_data configuration property we need to set, in order to enable UPDATE batching:

Set this property to true if your JDBC driver returns correct row counts from executeBatch(). It is usually safe to turn this option on. Hibernate will then use batched DML for automatically versioned data. Defaults to false.

We will rerun our test with this property set too:

properties.put("hibernate.jdbc.batch_versioned_data", "true");

Now both the INSERT and the UPDATE statements are properly batched:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 

Benchmark

Now that we managed to configure Hibernate for JDBC batching, we can benchmark the performance gain of statement grouping.

  • the test case uses a PostgreSQL database installed on the same machine with the currently running JVM
  • a batch size of 50 was chosen and each test iteration increases the statement count by an order of magnitude
  • all durations are expressed in milliseconds
Statement count No batch Insert duration No batch Update duration Batch Insert duration Batch Update duration
30 218 178 191 144
300 311 327 208 217
3000 1047 1089 556 478
30000 5889 6032 2640 2301
300000 51785 57869 16052 20954

The more rows we INSERT or UPDATE, the more we can benefit from JDBC batching. For write-most applications (e.g enterprise enterprise batch processors), we should definitely enable JDBC batching as the performance benefits might be staggering.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

Hibernate CascadeType.LOCK gotchas

Introduction

Having introduced Hibernate explicit locking support, as well as Cascade Types, it’s time to analyze the CascadeType.LOCK behavior.

A Hibernate lock request triggers an internal LockEvent. The associated DefaultLockEventListener may cascade the lock request to the locking entity children.

Since CascadeType.ALL includes CascadeType.LOCK too, it’s worth understanding when a lock request propagates from a Parent to a Child entity.

Testing time

We’ll start with the following entity model:

PostCommentDetailsLockCascade

The Post is the Parent entity of both the PostDetail one-to-one association and the Comment one-to-many relationship, and these associations are marked with CascadeType.ALL:

@OneToMany(
    cascade = CascadeType.ALL, 
    mappedBy = "post", 
    orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();

@OneToOne(
    cascade = CascadeType.ALL, 
    mappedBy = "post",
    fetch = FetchType.LAZY)
private PostDetails details;

All the up-coming test cases will use the following entity model graph:

doInTransaction(session -> {
    Post post = new Post();
    post.setName("Hibernate Master Class");

    post.addDetails(new PostDetails());
    post.addComment(new Comment("Good post!"));
    post.addComment(new Comment("Nice post!"));

    session.persist(post);
});

Locking managed entities

A managed entity is loaded in the current running Persistence Context and all entity state changes are translated to DML statements.

When a managed Parent entity is being locked:

doInTransaction(session -> {
    Post post = (Post) session.createQuery(
        "select p " +
        "from Post p " +
        "join fetch p.details " +
        "where " +
        "   p.id = :id")
    .setParameter("id", 1L)
    .uniqueResult();
    session.buildLockRequest(
        new LockOptions(LockMode.PESSIMISTIC_WRITE))
    .lock(post);
});

Only the Parent entity gets locked, the cascade being therefore prevented:

select id from Post where id = 1 for update

Hibernate defines a scope LockOption, which (according to JavaDocs) should allow a lock request to be propagated to Child entities:

“scope” is a JPA defined term. It is basically a cascading of the lock to associations.

session.buildLockRequest(
    new LockOptions(LockMode.PESSIMISTIC_WRITE))
.setScope(true)
.lock(post);

Setting the scope flag doesn’t change anything, only the managed entity being locked:

select id from Post where id = 1 for update

Locking detached entities

Apart from entity locking, the lock request can reassociate detached entities too. To prove this, we are going to check the Post entity graph before and after the lock entity request:

void containsPost(Session session, 
    Post post, boolean expected) {
    assertEquals(expected, 
        session.contains(post));
    assertEquals(expected, 
        session.contains(post.getDetails()));
    for(Comment comment : post.getComments()) {
        assertEquals(expected, 
            session.contains(comment));
    }
}

The following test demonstrates how CascadeType.LOCK works for detached entities:

//Load the Post entity, which will become detached
Post post = doInTransaction(session -> 
   (Post) session.createQuery(
        "select p " +
        "from Post p " +
        "join fetch p.details " +
        "join fetch p.comments " +
        "where " +
        "   p.id = :id")
.setParameter("id", 1L)
.uniqueResult());

//Change the detached entity state
post.setName("Hibernate Training");
doInTransaction(session -> {
    //The Post entity graph is detached
    containsPost(session, post, false);

    //The Lock request associates 
    //the entity graph and locks the requested entity
    session.buildLockRequest(
        new LockOptions(LockMode.PESSIMISTIC_WRITE))
    .lock(post);
    
    //Hibernate doesn't know if the entity is dirty
    assertEquals("Hibernate Training", 
        post.getName());

    //The Post entity graph is attached
    containsPost(session, post, true);
});
doInTransaction(session -> {
    //The detached Post entity changes have been lost
    Post _post = (Post) session.get(Post.class, 1L);
    assertEquals("Hibernate Master Class", 
        _post.getName());
});

The lock request reassociates the entity graph, but the current running Hibernate Session is unaware the entity became dirty, while in detached state. The entity is just reattached without forcing an UPDATE, or selecting the current database state for further comparison.

Once the entity becomes managed, any further change will be detected by the dirty checking mechanism and the flush will propagate the ante-reattachment changes as well. If no change happens while the entity is managed, the entity will not be scheduled for flushing.

If we want to make sure, the detached entity state is always synchronized with the database, we need to use merge or update.

The detached entities propagate the lock options, when the scope option is set to true:

session.buildLockRequest(
    new LockOptions(LockMode.PESSIMISTIC_WRITE))
.setScope(true)
.lock(post);

The Post entity lock event is propagated to all Child entities (since we are using CascadeType.ALL):

select id from Comment where id = 1 for update
select id from Comment where id = 2 for update
select id from PostDetails where id = 1 for update
select id from Post where id = 1 for update

Conclusion

The lock cascading is far from being straight-forward or intuitive. Explicit locking requires diligence (the more locks we acquire, the greater the chance of dead-locking) and you are better off retaining full-control over Child entity lock propagation anyway. Analogous to concurrency programming best practices, manual locking is therefore preferred over automatic lock propagation.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

A beginner’s guide to JPA and Hibernate Cascade Types

Introduction

JPA translates entity state transitions to database DML statements. Because it’s common to operate on entity graphs, JPA allows us to propagate entity state changes from Parents to Child entities.

This behavior is configured through the CascadeType mappings.

JPA vs Hibernate Cascade Types

Hibernate supports all JPA Cascade Types and some additional legacy cascading styles. The following table draws an association between JPA Cascade Types and their Hibernate native API equivalent:

JPA EntityManager action JPA CascadeType Hibernate native Session action Hibernate native CascadeType Event Listener
detach(entity) DETACH evict(entity) DETACH or EVICT Default Evict Event Listener
merge(entity) MERGE merge(entity) MERGE Default Merge Event Listener
persist(entity) PERSIST persist(entity) PERSIST Default Persist Event Listener
refresh(entity) REFRESH refresh(entity) REFRESH Default Refresh Event Listener
remove(entity) REMOVE delete(entity) REMOVE or DELETE Default Delete Event Listener
saveOrUpdate(entity) SAVE_UPDATE Default Save Or Update Event Listener
replicate(entity, replicationMode) REPLICATE Default Replicate Event Listener
lock(entity, lockModeType) buildLockRequest(entity, lockOptions) LOCK Default Lock Event Listener
All the above EntityManager methods ALL All the above Hibernate Session methods ALL

From this table we can conclude that:

  • There’s no difference between calling persist, merge or refresh on the JPA EntityManager or the Hibernate Session.
  • The JPA remove and detach calls are delegated to Hibernate delete and evict native operations.
  • Only Hibernate supports replicate and saveOrUpdate. While replicate is useful for some very specific scenarios (when the exact entity state needs to be mirrored between two distinct DataSources), the persist and merge combo is always a better alternative than the native saveOrUpdate operation.

    As a rule of thumb, you should always use persist for TRANSIENT entities and merge for DETACHED ones.

    The saveOrUpdate shortcomings (when passing a detached entity snapshot to a Session already managing this entity) had lead to the merge operation predecessor: the now extinct saveOrUpdateCopy operation.

  • The JPA lock method shares the same behaviour with Hibernate lock request method.
  • The JPA CascadeType.ALL doesn’t only apply to EntityManager state change operations, but to all Hibernate CascadeTypes as well.

    So if you mapped your associations with CascadeType.ALL, you can still cascade Hibernate specific events. For example, you can cascade the JPA lock operation (although it behaves as reattaching, instead of an actual lock request propagation), even if JPA doesn’t define a CascadeType.LOCK.

Cascading best practices

Cascading only makes sense only for ParentChild associations (the Parent entity state transition being cascaded to its Child entities). Cascading from Child to Parent is not very useful and usually, it’s a mapping code smell.

Next, I’m going to take analyse the cascading behaviour of all JPA ParentChild associations.

One-To-One

The most common One-To-One bidirectional association looks like this:

@Entity
public class Post {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private String name;

    @OneToOne(mappedBy = "post",
        cascade = CascadeType.ALL, orphanRemoval = true)
    private PostDetails details;

    public Long getId() {
        return id;
    }

    public PostDetails getDetails() {
        return details;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void addDetails(PostDetails details) {
        this.details = details;
        details.setPost(this);
    }

    public void removeDetails() {
        if (details != null) {
            details.setPost(null);
        }
        this.details = null;
    }
}

@Entity
public class PostDetails {

    @Id
    private Long id;

    @Column(name = "created_on")
    @Temporal(TemporalType.TIMESTAMP)
    private Date createdOn = new Date();

    private boolean visible;

    @OneToOne
    @JoinColumn(name = "id")
    @MapsId
    private Post post;

    public Long getId() {
        return id;
    }

    public void setVisible(boolean visible) {
        this.visible = visible;
    }

    public void setPost(Post post) {
        this.post = post;
    }
}

The Post entity plays the Parent role and the PostDetails is the Child.

The bidirectional associations should always be updated on both sides, therefore the Parent side should contain the addChild and removeChild combo. These methods ensure we always synchronize both sides of the association, to avoid object or relational data corruption issues.

In this particular case, the CascadeType.ALL and orphan removal make sense because the PostDetails life-cycle is bound to that of its Post Parent entity.

Cascading the one-to-one persist operation

The CascadeType.PERSIST comes along with the CascadeType.ALL configuration, so we only have to persist the Post entity, and the associated PostDetails entity is persisted as well:

Post post = new Post();
post.setName("Hibernate Master Class");

PostDetails details = new PostDetails();

post.addDetails(details);

session.persist(post);

Generating the following output:

INSERT INTO post(id, NAME) 
VALUES (DEFAULT, Hibernate Master Class'')

insert into PostDetails (id, created_on, visible) 
values (1, '2015-03-03 10:17:19.14', false)

Cascading the one-to-one merge operation

The CascadeType.MERGE is inherited from the CascadeType.ALL setting, so we only have to merge the Post entity and the associated PostDetails is merged as well:

Post post = newPost();
post.setName("Hibernate Master Class Training Material");
post.getDetails().setVisible(true);

doInTransaction(session -> {
    session.merge(post);
});

The merge operation generates the following output:

SELECT onetooneca0_.id     AS id1_3_1_,
   onetooneca0_.NAME       AS name2_3_1_,
   onetooneca1_.id         AS id1_4_0_,
   onetooneca1_.created_on AS created_2_4_0_,
   onetooneca1_.visible    AS visible3_4_0_
FROM   post onetooneca0_
LEFT OUTER JOIN postdetails onetooneca1_ 
    ON onetooneca0_.id = onetooneca1_.id
WHERE  onetooneca0_.id = 1

UPDATE postdetails SET 
    created_on = '2015-03-03 10:20:53.874', visible = true 
WHERE  id = 1

UPDATE post SET 
    NAME = 'Hibernate Master Class Training Material' 
WHERE  id = 1

Cascading the one-to-one delete operation

The CascadeType.REMOVE is also inherited from the CascadeType.ALL configuration, so the Post entity deletion triggers a PostDetails entity removal too:

Post post = newPost();

doInTransaction(session -> {
    session.delete(post);
});

Generating the following output:

delete from PostDetails where id = 1
delete from Post where id = 1

The one-to-one delete orphan cascading operation

If a Child entity is dissociated from its Parent, the Child Foreign Key is set to NULL. If we want to have the Child row deleted as well, we have to use the orphan removal support.

doInTransaction(session -> {
    Post post = (Post) session.get(Post.class, 1L);
    post.removeDetails();
});

The orphan removal generates this output:

SELECT onetooneca0_.id         AS id1_3_0_,
       onetooneca0_.NAME       AS name2_3_0_,
       onetooneca1_.id         AS id1_4_1_,
       onetooneca1_.created_on AS created_2_4_1_,
       onetooneca1_.visible    AS visible3_4_1_
FROM   post onetooneca0_
LEFT OUTER JOIN postdetails onetooneca1_
    ON onetooneca0_.id = onetooneca1_.id
WHERE  onetooneca0_.id = 1

delete from PostDetails where id = 1

Unidirectional one-to-one association

Most often, the Parent entity is the inverse side (e.g. mappedBy), the Child controlling the association through its Foreign Key. But the cascade is not limited to bidirectional associations, we can also use it for unidirectional relationships:

@Entity
public class Commit {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private String comment;

    @OneToOne(cascade = CascadeType.ALL)
    @JoinTable(
        name = "Branch_Merge_Commit",
        joinColumns = @JoinColumn(
            name = "commit_id", 
            referencedColumnName = "id"),
        inverseJoinColumns = @JoinColumn(
            name = "branch_merge_id", 
            referencedColumnName = "id")
    )
    private BranchMerge branchMerge;

    public Commit() {
    }

    public Commit(String comment) {
        this.comment = comment;
    }

    public Long getId() {
        return id;
    }

    public void addBranchMerge(
        String fromBranch, String toBranch) {
        this.branchMerge = new BranchMerge(
             fromBranch, toBranch);
    }

    public void removeBranchMerge() {
        this.branchMerge = null;
    }
}

@Entity
public class BranchMerge {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private String fromBranch;

    private String toBranch;

    public BranchMerge() {
    }

    public BranchMerge(
        String fromBranch, String toBranch) {
        this.fromBranch = fromBranch;
        this.toBranch = toBranch;
    }

    public Long getId() {
        return id;
    }
}

Cascading consists in propagating the Parent entity state transition to one or more Child entities, and it can be used for both unidirectional and bidirectional associations.

One-To-Many

The most common ParentChild association consists of a one-to-many and a many-to-one relationship, where the cascade being useful for the one-to-many side only:

@Entity
public class Post {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private String name;

    @OneToMany(cascade = CascadeType.ALL, 
        mappedBy = "post", orphanRemoval = true)
    private List<Comment> comments = new ArrayList<>();

    public void setName(String name) {
        this.name = name;
    }

    public List<Comment> getComments() {
        return comments;
    }

    public void addComment(Comment comment) {
        comments.add(comment);
        comment.setPost(this);
    }

    public void removeComment(Comment comment) {
        comment.setPost(null);
        this.comments.remove(comment);
    }
}

@Entity
public class Comment {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    @ManyToOne
    private Post post;

    private String review;

    public void setPost(Post post) {
        this.post = post;
    }

    public String getReview() {
        return review;
    }

    public void setReview(String review) {
        this.review = review;
    }
}

Like in the one-to-one example, the CascadeType.ALL and orphan removal are suitable because the Comment life-cycle is bound to that of its Post Parent entity.

Cascading the one-to-many persist operation

We only have to persist the Post entity and all the associated Comment entities are persisted as well:

Post post = new Post();
post.setName("Hibernate Master Class");

Comment comment1 = new Comment();
comment1.setReview("Good post!");
Comment comment2 = new Comment();
comment2.setReview("Nice post!");

post.addComment(comment1);
post.addComment(comment2);

session.persist(post);

The persist operation generates the following output:

insert into Post (id, name) 
values (default, 'Hibernate Master Class')

insert into Comment (id, post_id, review) 
values (default, 1, 'Good post!')

insert into Comment (id, post_id, review) 
values (default, 1, 'Nice post!')

Cascading the one-to-many merge operation

Merging the Post entity is going to merge all Comment entities as well:

Post post = newPost();
post.setName("Hibernate Master Class Training Material");

post.getComments()
    .stream()
    .filter(comment -> comment.getReview().toLowerCase()
         .contains("nice"))
    .findAny()
    .ifPresent(comment -> 
        comment.setReview("Keep up the good work!")
);

doInTransaction(session -> {
    session.merge(post);
});

Generating the following output:

SELECT onetomanyc0_.id    AS id1_1_1_,
       onetomanyc0_.NAME  AS name2_1_1_,
       comments1_.post_id AS post_id3_1_3_,
       comments1_.id      AS id1_0_3_,
       comments1_.id      AS id1_0_0_,
       comments1_.post_id AS post_id3_0_0_,
       comments1_.review  AS review2_0_0_
FROM   post onetomanyc0_
LEFT OUTER JOIN comment comments1_
    ON onetomanyc0_.id = comments1_.post_id
WHERE  onetomanyc0_.id = 1

update Post set 
    name = 'Hibernate Master Class Training Material' 
where id = 1

update Comment set 
    post_id = 1, 
    review='Keep up the good work!' 
where id = 2

Cascading the one-to-many delete operation

When the Post entity is deleted, the associated Comment entities are deleted as well:

Post post = newPost();

doInTransaction(session -> {
    session.delete(post);
});

Generating the following output:

delete from Comment where id = 1
delete from Comment where id = 2
delete from Post where id = 1

The one-to-many delete orphan cascading operation

The orphan-removal allows us to remove the Child entity whenever it’s no longer referenced by its Parent:

newPost();

doInTransaction(session -> {
    Post post = (Post) session.createQuery(
        "select p " +
                "from Post p " +
                "join fetch p.comments " +
                "where p.id = :id")
        .setParameter("id", 1L)
        .uniqueResult();
    post.removeComment(post.getComments().get(0));
});

The Comment is deleted, as we can see in the following output:

SELECT onetomanyc0_.id    AS id1_1_0_,
       comments1_.id      AS id1_0_1_,
       onetomanyc0_.NAME  AS name2_1_0_,
       comments1_.post_id AS post_id3_0_1_,
       comments1_.review  AS review2_0_1_,
       comments1_.post_id AS post_id3_1_0__,
       comments1_.id      AS id1_0_0__
FROM   post onetomanyc0_
INNER JOIN comment comments1_
    ON onetomanyc0_.id = comments1_.post_id
WHERE  onetomanyc0_.id = 1

delete from Comment where id = 1

Many-To-Many

The many-to-many relationship is tricky because each side of this association plays both the Parent and the Child role. Still, we can identify one side from where we’d like to propagate the entity state changes.

We shouldn’t default to CascadeType.ALL, because the CascadeTpe.REMOVE might end-up deleting more than we’re expecting (as you’ll soon find out):

@Entity
public class Author {

    @Id
    @GeneratedValue(strategy=GenerationType.AUTO)
    private Long id;

    @Column(name = "full_name", nullable = false)
    private String fullName;

    @ManyToMany(mappedBy = "authors", 
        cascade = {CascadeType.PERSIST, CascadeType.MERGE})
    private List<Book> books = new ArrayList<>();

    private Author() {}

    public Author(String fullName) {
        this.fullName = fullName;
    }

    public Long getId() {
        return id;
    }

    public void addBook(Book book) {
        books.add(book);
        book.authors.add(this);
    }

    public void removeBook(Book book) {
        books.remove(book);
        book.authors.remove(this);
    }

    public void remove() {
        for(Book book : new ArrayList<>(books)) {
            removeBook(book);
        }
    }
}

@Entity
public class Book {

    @Id
    @GeneratedValue(strategy=GenerationType.AUTO)
    private Long id;

    @Column(name = "title", nullable = false)
    private String title;

    @ManyToMany(cascade = 
        {CascadeType.PERSIST, CascadeType.MERGE})
    @JoinTable(name = "Book_Author",
        joinColumns = {
            @JoinColumn(
                name = "book_id", 
                referencedColumnName = "id"
            )
        },
        inverseJoinColumns = {
            @JoinColumn(
                name = "author_id", 
                referencedColumnName = "id"
            )
        }
    )
    private List<Author> authors = new ArrayList<>();

    private Book() {}

    public Book(String title) {
        this.title = title;
    }
}

Cascading the many-to-many persist operation

Persisting the Author entities will persist the Books as well:

Author _John_Smith = new Author("John Smith");
Author _Michelle_Diangello = 
    new Author("Michelle Diangello");
Author _Mark_Armstrong = 
    new Author("Mark Armstrong");

Book _Day_Dreaming = new Book("Day Dreaming");
Book _Day_Dreaming_2nd = 
    new Book("Day Dreaming, Second Edition");

_John_Smith.addBook(_Day_Dreaming);
_Michelle_Diangello.addBook(_Day_Dreaming);

_John_Smith.addBook(_Day_Dreaming_2nd);
_Michelle_Diangello.addBook(_Day_Dreaming_2nd);
_Mark_Armstrong.addBook(_Day_Dreaming_2nd);

session.persist(_John_Smith);
session.persist(_Michelle_Diangello);
session.persist(_Mark_Armstrong);

The Book and the Book_Author rows are inserted along with the Authors:

insert into Author (id, full_name) 
values (default, 'John Smith')

insert into Book (id, title) 
values (default, 'Day Dreaming')

insert into Author (id, full_name) 
values (default, 'Michelle Diangello')

insert into Book (id, title) 
values (default, 'Day Dreaming, Second Edition')

insert into Author (id, full_name) 
values (default, 'Mark Armstrong')

insert into Book_Author (book_id, author_id) values (1, 1)
insert into Book_Author (book_id, author_id) values (1, 2)
insert into Book_Author (book_id, author_id) values (2, 1)
insert into Book_Author (book_id, author_id) values (2, 2)
insert into Book_Author (book_id, author_id) values (3, 1)

Dissociating one side of the many-to-many association

To delete an Author, we need to dissociate all Book_Author relations belonging to the removable entity:

doInTransaction(session -> {
    Author _Mark_Armstrong =
        getByName(session, "Mark Armstrong");
    _Mark_Armstrong.remove();
    session.delete(_Mark_Armstrong);
});

This use case generates the following output:

SELECT manytomany0_.id        AS id1_0_0_,
       manytomany2_.id        AS id1_1_1_,
       manytomany0_.full_name AS full_nam2_0_0_,
       manytomany2_.title     AS title2_1_1_,
       books1_.author_id      AS author_i2_0_0__,
       books1_.book_id        AS book_id1_2_0__
FROM   author manytomany0_
INNER JOIN book_author books1_
	ON manytomany0_.id = books1_.author_id
INNER JOIN book manytomany2_
	ON books1_.book_id = manytomany2_.id
WHERE  manytomany0_.full_name = 'Mark Armstrong'

SELECT books0_.author_id  AS author_i2_0_0_,
       books0_.book_id    AS book_id1_2_0_,
       manytomany1_.id    AS id1_1_1_,
       manytomany1_.title AS title2_1_1_
FROM   book_author books0_
INNER JOIN book manytomany1_
    ON books0_.book_id = manytomany1_.id
WHERE  books0_.author_id = 2

delete from Book_Author where book_id = 2

insert into Book_Author (book_id, author_id) values (2, 1)
insert into Book_Author (book_id, author_id) values (2, 2)

delete from Author where id = 3 

The many-to-many association generates way too many redundant SQL statements and often, they are very difficult to tune. Next, I’m going to demonstrate the many-to-many CascadeType.REMOVE hidden dangers.

The many-to-many CascadeType.REMOVE gotchas

The many-to-many CascadeType.ALL is another code smell, I often bump into while reviewing code. The CascadeType.REMOVE is automatically inherited when using CascadeType.ALL, but the entity removal is not only applied to the link table, but to the other side of the association as well.

Let’s change the Author entity books many-to-many association to use the CascadeType.ALL instead:

@ManyToMany(mappedBy = "authors", 
    cascade = CascadeType.ALL)
private List<Book> books = new ArrayList<>();

When deleting one Author:

doInTransaction(session -> {
    Author _Mark_Armstrong = 
        getByName(session, "Mark Armstrong");
    session.delete(_Mark_Armstrong);
    Author _John_Smith = 
        getByName(session, "John Smith");
    assertEquals(1, _John_Smith.books.size());
});

All books belonging to the deleted Author are getting deleted, even if other Authors we’re still associated to the deleted Books:

SELECT manytomany0_.id        AS id1_0_,
       manytomany0_.full_name AS full_nam2_0_
FROM   author manytomany0_
WHERE  manytomany0_.full_name = 'Mark Armstrong'  

SELECT books0_.author_id  AS author_i2_0_0_,
       books0_.book_id    AS book_id1_2_0_,
       manytomany1_.id    AS id1_1_1_,
       manytomany1_.title AS title2_1_1_
FROM   book_author books0_
INNER JOIN book manytomany1_ ON 
       books0_.book_id = manytomany1_.id
WHERE  books0_.author_id = 3  

delete from Book_Author where book_id=2
delete from Book where id=2
delete from Author where id=3

Most often, this behavior doesn’t match the business logic expectations, only being discovered upon the first entity removal.

We can push this issue even further, if we set the CascadeType.ALL to the Book entity side as well:

@ManyToMany(cascade = CascadeType.ALL)
@JoinTable(name = "Book_Author",
    joinColumns = {
        @JoinColumn(
            name = "book_id", 
            referencedColumnName = "id"
        )
    },
    inverseJoinColumns = {
        @JoinColumn(
            name = "author_id", 
            referencedColumnName = "id"
        )
    }
)

This time, not only the Books are being deleted, but Authors are deleted as well:

doInTransaction(session -> {
    Author _Mark_Armstrong = 
        getByName(session, "Mark Armstrong");
    session.delete(_Mark_Armstrong);
    Author _John_Smith = 
        getByName(session, "John Smith");
    assertNull(_John_Smith);
});

The Author removal triggers the deletion of all associated Books, which further triggers the removal of all associated Authors. This is a very dangerous operation, resulting in a massive entity deletion that’s rarely the expected behavior.

SELECT manytomany0_.id        AS id1_0_,
       manytomany0_.full_name AS full_nam2_0_
FROM   author manytomany0_
WHERE  manytomany0_.full_name = 'Mark Armstrong'  

SELECT books0_.author_id  AS author_i2_0_0_,
       books0_.book_id    AS book_id1_2_0_,
       manytomany1_.id    AS id1_1_1_,
       manytomany1_.title AS title2_1_1_
FROM   book_author books0_
INNER JOIN book manytomany1_
   ON books0_.book_id = manytomany1_.id
WHERE  books0_.author_id = 3  

SELECT authors0_.book_id      AS book_id1_1_0_,
       authors0_.author_id    AS author_i2_2_0_,
       manytomany1_.id        AS id1_0_1_,
       manytomany1_.full_name AS full_nam2_0_1_
FROM   book_author authors0_
INNER JOIN author manytomany1_
   ON authors0_.author_id = manytomany1_.id
WHERE  authors0_.book_id = 2  

SELECT books0_.author_id  AS author_i2_0_0_,
       books0_.book_id    AS book_id1_2_0_,
       manytomany1_.id    AS id1_1_1_,
       manytomany1_.title AS title2_1_1_
FROM   book_author books0_
INNER JOIN book manytomany1_
   ON books0_.book_id = manytomany1_.id
WHERE  books0_.author_id = 1 

SELECT authors0_.book_id      AS book_id1_1_0_,
       authors0_.author_id    AS author_i2_2_0_,
       manytomany1_.id        AS id1_0_1_,
       manytomany1_.full_name AS full_nam2_0_1_
FROM   book_author authors0_
INNER JOIN author manytomany1_
   ON authors0_.author_id = manytomany1_.id
WHERE  authors0_.book_id = 1  

SELECT books0_.author_id  AS author_i2_0_0_,
       books0_.book_id    AS book_id1_2_0_,
       manytomany1_.id    AS id1_1_1_,
       manytomany1_.title AS title2_1_1_
FROM   book_author books0_
INNER JOIN book manytomany1_
   ON books0_.book_id = manytomany1_.id
WHERE  books0_.author_id = 2  

delete from Book_Author where book_id=2
delete from Book_Author where book_id=1
delete from Author where id=2
delete from Book where id=1
delete from Author where id=1 
delete from Book where id=2
delete from Author where id=3

This use case is wrong in so many ways. There are a plethora of unnecessary SELECT statements and eventually we end up deleting all Authors and all their Books. That’s why CascadeType.ALL should raise your eyebrow, whenever you spot it on a many-to-many association.

When it comes to Hibernate mappings, you should always strive for simplicity. The Hibernate documentation confirms this assumption as well:

Practical test cases for real many-to-many associations are rare. Most of the time you need additional information stored in the link table. In this case, it is much better to use two one-to-many associations to an intermediate link class. In fact, most associations are one-to-many and many-to-one. For this reason, you should proceed cautiously when using any other association style.

Conclusion

Cascading is a handy ORM feature, but it’s not free of issues. You should only cascade from Parent entities to Children and not the other way around. You should always use only the casacde operations that are demanded by your business logic requirements, and not turn the CascadeType.ALL into a default ParentChild association entity state propagation configuration.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.