How to batch DELETE statements with Hibernate

(Last Updated On: January 29, 2018)

Introduction

In my previous post, I explained the Hibernate configurations required for batching INSERT and UPDATE statements. This post will continue this topic with DELETE statements batching.

Domain model entities

We’ll start with the following entity model:

PostCommentDetailsBatchDelete

The Post entity has a one-to-many association to a Comment and a one-to-one relationship with the PostDetails entity:

@OneToMany(cascade = CascadeType.ALL, mappedBy = "post",
        orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();

@OneToOne(cascade = CascadeType.ALL, mappedBy = "post",
        orphanRemoval = true, fetch = FetchType.LAZY)
private PostDetails details;

The upcoming tests will be run against the following data:

doInTransaction(session -> {
    int batchSize = batchSize();
    for(int i = 0; i < itemsCount(); i++) {
        int j = 0;

        Post post = new Post(String.format(
            "Post no. %d", i));        
        post.addComment(new Comment( String.format(
            "Post comment %d:%d", i, j++)));
        post.addComment(new Comment(String.format(
            "Post comment %d:%d", i, j++)));
        post.addDetails(new PostDetails());

        session.persist(post);
        if(i % batchSize == 0 && i > 0) {
            session.flush();
            session.clear();
        }
    }
});

Hibernate Configuration

As already explained, the following properties are required for batching INSERT and UPDATE statements:

properties.put("hibernate.jdbc.batch_size", 
    String.valueOf(batchSize()));
properties.put("hibernate.order_inserts", 
    "true");
properties.put("hibernate.order_updates", 
    "true");
properties.put("hibernate.jdbc.batch_versioned_data", 
    "true");

Next, we are going to check if DELETE statements are batched as well.

JPA Cascade Delete

Because cascading entity state transitions is convenient, I’m going to prove that CascadeType.DELETE and JDBC batching don’t mix well.

The following tests is going to:

  • Select some Posts along with Comments and PostDetails
  • Delete the Posts, while propagating the delete event to Comments and PostDetails as well
@Test
public void testCascadeDelete() {
    LOGGER.info("Test batch delete with cascade");
    final AtomicReference<Long> startNanos = 
        new AtomicReference<>();
    addDeleteBatchingRows();
    doInTransaction(session -> {
        List<Post> posts = session.createQuery(
            "select distinct p " +
            "from Post p " +
            "join fetch p.details d " +
            "join fetch p.comments c")
        .list();
        startNanos.set(System.nanoTime());
        for (Post post : posts) {
            session.delete(post);
        }
    });
    LOGGER.info("{}.testCascadeDelete took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos.get()
    ));
}

Running this test gives the following output:

Query:{[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][56,0]} 
Query:{[delete from PostDetails where id=?][3]} 
Query:{[delete from Post where id=? and version=?][3,0]} 
Query:{[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][53,0]} 
Query:{[delete from PostDetails where id=?][2]} 
Query:{[delete from Post where id=? and version=?][2,0]} 
Query:{[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,0]} 
Query:{[delete from PostDetails where id=?][1]} 
Query:{[delete from Post where id=? and version=?][1,0]}

Only the Comment DELETE statements were batched, the other entities being deleted in separate database round-trips.

The reason for this behaviour is given by the ActionQueue sorting implementation:

if ( session.getFactory().getSettings().isOrderUpdatesEnabled() ) {
    // sort the updates by pk
    updates.sort();
}
if ( session.getFactory().getSettings().isOrderInsertsEnabled() ) {
    insertions.sort();
}

While INSERTS and UPDATES are covered, DELETE statements are not sorted at all. A JDBC batch can only be reused when all statements belong to the same database table. When an incoming statement targets a different database table, the current batch has to be released, so that the new batch matches the current statement database table:

public Batch getBatch(BatchKey key) {
    if ( currentBatch != null ) {
        if ( currentBatch.getKey().equals( key ) ) {
            return currentBatch;
        }
        else {
            currentBatch.execute();
            currentBatch.release();
        }
    }
    currentBatch = batchBuilder().buildBatch(key, this);
    return currentBatch;
}

Orphan removal and manual flushing

A work-around is to dissociate all Child entities while manually flushing the Hibernate Session before advancing to a new Child association:

@Test
public void testOrphanRemoval() {
    LOGGER.info("Test batch delete with orphan removal");
    final AtomicReference<Long> startNanos = 
        new AtomicReference<>();

    addDeleteBatchingRows();

    doInTransaction(session -> {
        List<Post> posts = session.createQuery(
            "select distinct p " +
            "from Post p " +
            "join fetch p.details d " +
            "join fetch p.comments c")
        .list();

        startNanos.set(System.nanoTime());

        posts.forEach(Post::removeDetails);
        session.flush();

        posts.forEach(post -> {
            for (Iterator<Comment> commentIterator = 
                     post.getComments().iterator(); 
                     commentIterator.hasNext(); ) {
                Comment comment =  commentIterator.next();
                comment.post = null;
                commentIterator.remove();
            }
        });
        session.flush();

        posts.forEach(session::delete);
    });
    LOGGER.info("{}.testOrphanRemoval took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos.get()
    ));
}

This time all DELETE statements are properly batched:

Query:{[delete from PostDetails where id=?][2]} {[delete from PostDetails where id=?][3]} {[delete from PostDetails where id=?][1]} 
Query:{[delete from Comment where id=? and version=?][53,0]} {[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][56,0]} {[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,
Query:{[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]} {[delete from Post where id=? and version=?][1,0]} 

SQL Cascade Delete

A better solution is to use SQL cascade deletion, instead of JPA entity state propagation mechanism. This way, we can also reduce the DML statements count. Because Hibernate Session acts as a transactional write-behind cache, we must be extra cautious when mixing entity state transitions with database-side automatic actions, as the Persistence Context might not reflect the latest database changes.

The Post entity one-to-many Comment association is marked with the Hibernate specific @OnDelete annotation, so that the auto-generated database schema includes the ON DELETE CASCADE directive:

@OneToMany(cascade = {
       CascadeType.PERSIST, 
       CascadeType.MERGE}, 
    mappedBy = "post")
@OnDelete(action = OnDeleteAction.CASCADE)
private List<Comment> comments = new ArrayList<>();

Generating the following DDL:

alter table Comment add constraint 
FK_apirq8ka64iidc18f3k6x5tc5 foreign key (post_id) 
references Post on delete cascade

The same is done with the PostDetails entity one-to-one Post association:

@OneToOne(fetch = FetchType.LAZY)
@MapsId
@OnDelete(action = OnDeleteAction.CASCADE)
private Post post;

And the associated DDL:

alter table PostDetails add constraint 
FK_h14un5v94coafqonc6medfpv8 foreign key (id) 
references Post on delete cascade

The CascadeType.ALL and orphanRemoval were replaced with CascadeType.PERSIST and CascadeType.MERGE, because we no longer want Hibernate to propagate the entity removal event.

The test only deletes the Post entities.

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select p from Post p")
    .list();
    startNanos.set(System.nanoTime());
    for (Post post : posts) {
        session.delete(post);
    }
});

The DELETE statements are properly batched as there’s only one target table.

Query:{[delete from Post where id=? and version=?][1,0]} {[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]}

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

If INSERT and UPDATE statements batching is just a matter of configuration, DELETE statements require some additional steps, which may increase the data access layer complexity.

Code available on GitHub.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.