How to batch DELETE statements with Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In my previous post, I explained the Hibernate configurations required for batching INSERT and UPDATE statements. This post will continue this topic with DELETE statements batching.

Domain model entities

We’ll start with the following entity model:

PostCommentDetails Batch Delete

The Post entity has a one-to-many association to a Comment and a one-to-one relationship with the PostDetails entity:

@OneToMany(
    mappedBy = "post",
    cascade = CascadeType.ALL, 
    orphanRemoval = true
)
private List<Comment> comments = new ArrayList<>();

@OneToOne(
    mappedBy = "post",
    fetch = FetchType.LAZY,
    cascade = CascadeType.ALL, 
    orphanRemoval = true
)
private PostDetails details;

The upcoming tests will be run against the following data:

int batchSize = batchSize();

for(int i = 0; i < itemsCount(); i++) {
    int j = 0;

    Post post = new Post(
        String.format("Post no. %d", i)
    );        
    
    post.addComment(
        new Comment(
            String.format("Post comment %d:%d", i, j++)
        )
    );
    post.addComment(
        new Comment(
            String.format("Post comment %d:%d", i, j++)
        )
    );
    
    post.addDetails(new PostDetails());

    session.persist(post);
    
    if(i % batchSize == 0 && i > 0) {
        session.flush();
        session.clear();
    }
}

Hibernate Configuration

As already explained, the following properties are required for batching INSERT and UPDATE statements:

<property name="hibernate.jdbc.batch_size"
          value="25"
/>

<property name="hibernate.order_inserts"
          value="true"
/>

<property name="hibernate.order_updates"
          value="true"
/>

<property name="hibernate.jdbc.batch_versioned_data"
          value="true"
/>

Next, we are going to check if DELETE statements are batched as well.

JPA Cascade Delete

Because cascading entity state transitions is convenient, I’m going to prove that CascadeType.DELETE and JDBC batching don’t mix well.

The following tests is going to:

  • Select some Posts along with Comments and PostDetails
  • Delete the Posts, while propagating the delete event to Comments and PostDetails as well
LOGGER.info("Test batch delete with cascade");

final AtomicReference<Long> startNanos = new AtomicReference<>();

addDeleteBatchingRows();

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select distinct p " +
        "from Post p " +
        "join fetch p.details d " +
        "join fetch p.comments c")
    .list();
    
    startNanos.set(System.nanoTime());
    
    for (Post post : posts) {
        session.delete(post);
    }
});

LOGGER.info("{}.testCascadeDelete took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos.get()
));

Running this test gives the following output:

Query:{[delete from Comment where id=? and version=?][55,0]} 
      {[delete from Comment where id=? and version=?][56,0]} 
Query:{[delete from PostDetails where id=?][3]} 
Query:{[delete from Post where id=? and version=?][3,0]} 

Query:{[delete from Comment where id=? and version=?][54,0]} 
      {[delete from Comment where id=? and version=?][53,0]} 
Query:{[delete from PostDetails where id=?][2]} 
Query:{[delete from Post where id=? and version=?][2,0]} 

Query:{[delete from Comment where id=? and version=?][52,0]} 
      {[delete from Comment where id=? and version=?][51,0]} 
Query:{[delete from PostDetails where id=?][1]} 
Query:{[delete from Post where id=? and version=?][1,0]}

Only the Comment DELETE statements were batched, the other entities being deleted in separate database round-trips.

The reason for this behaviour is given by the ActionQueue sorting implementation:

if ( session.getFactory().getSettings().isOrderUpdatesEnabled() ) {
    // sort the updates by pk
    updates.sort();
}
if ( session.getFactory().getSettings().isOrderInsertsEnabled() ) {
    insertions.sort();
}

While INSERTS and UPDATES are covered, DELETE statements are not sorted at all. A JDBC batch can only be reused when all statements belong to the same database table. When an incoming statement targets a different database table, the current batch has to be released, so that the new batch matches the current statement database table:

public Batch getBatch(BatchKey key) {
    if ( currentBatch != null ) {
        if ( currentBatch.getKey().equals( key ) ) {
            return currentBatch;
        }
        else {
            currentBatch.execute();
            currentBatch.release();
        }
    }
    currentBatch = batchBuilder().buildBatch(key, this);
    return currentBatch;
}

Orphan removal and manual flushing

A work-around is to dissociate all Child entities while manually flushing the Hibernate Session before advancing to a new Child association:

LOGGER.info("Test batch delete with orphan removal");

final AtomicReference<Long> startNanos = new AtomicReference<>();

addDeleteBatchingRows();

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select distinct p " +
        "from Post p " +
        "join fetch p.details d " +
        "join fetch p.comments c")
    .list();

    startNanos.set(System.nanoTime());

    posts.forEach(Post::removeDetails);
    session.flush();

    posts.forEach(post -> {
        for (Iterator<Comment> commentIterator = 
                post.getComments().iterator(); commentIterator.hasNext(); ) {
            Comment comment =  commentIterator.next();
            comment.post = null;
            commentIterator.remove();
        }
    });
    session.flush();

    posts.forEach(session::delete);
});

LOGGER.info("{}.testOrphanRemoval took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos.get()
));

This time all DELETE statements are properly batched:

Query:{[delete from PostDetails where id=?][2]} 
      {[delete from PostDetails where id=?][3]} 
      {[delete from PostDetails where id=?][1]} 
      
Query:{[delete from Comment where id=? and version=?][53,0]} 
      {[delete from Comment where id=? and version=?][54,0]} 
      {[delete from Comment where id=? and version=?][56,0]} 
      {[delete from Comment where id=? and version=?][55,0]} 
      {[delete from Comment where id=? and version=?][52,0]} 
      {[delete from Comment where id=? and version=?][51,0]}
      
Query:{[delete from Post where id=? and version=?][2,0]} 
      {[delete from Post where id=? and version=?][3,0]} 
      {[delete from Post where id=? and version=?][1,0]} 

SQL Cascade Delete

A better solution is to use SQL cascade deletion, instead of JPA entity state propagation mechanism. This way, we can also reduce the DML statements count. Because Hibernate Session acts as a transactional write-behind cache, we must be extra cautious when mixing entity state transitions with database-side automatic actions, as the Persistence Context might not reflect the latest database changes.

The Post entity one-to-many Comment association is marked with the Hibernate specific @OnDelete annotation, so that the auto-generated database schema includes the ON DELETE CASCADE directive:

@OneToMany(
    mappedBy = "post"
    cascade = {
       CascadeType.PERSIST, 
       CascadeType.MERGE
    }, 
)
@OnDelete(action = OnDeleteAction.CASCADE)
private List<Comment> comments = new ArrayList<>();

Generating the following DDL:

ALTER TABLE Comment ADD CONSTRAINT FK_COMMENT_POST_ID 
FOREIGN KEY (post_id) REFERENCES Post ON DELETE CASCADE

The same is done with the PostDetails entity one-to-one Post association:

@OneToOne(fetch = FetchType.LAZY)
@MapsId
@OnDelete(action = OnDeleteAction.CASCADE)
private Post post;

And the associated DDL:

ALTER TABLE PostDetails ADD CONSTRAINT FK_POST_DETAILS_ID 

The CascadeType.ALL and orphanRemoval were replaced with CascadeType.PERSIST and CascadeType.MERGE, because we no longer want Hibernate to propagate the entity removal event.

The test only deletes the Post entities.

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select p from Post p")
    .list();
    
    startNanos.set(System.nanoTime());
    
    for (Post post : posts) {
        session.delete(post);
    }
});

The DELETE statements are properly batched as there’s only one target table.

Query:{[delete from Post where id=? and version=?][1,0]} 
      {[delete from Post where id=? and version=?][2,0]} 
      {[delete from Post where id=? and version=?][3,0]}

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

If INSERT and UPDATE statements batching is just a matter of configuration, DELETE statements require some additional steps, which may increase the data access layer complexity.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.