How to batch DELETE statements with Hibernate

Introduction

In my previous post, I explained the Hibernate configurations required for batching INSERT and UPDATE statements. This post will continue this topic with DELETE statements batching.

Domain model entities

We’ll start with the following entity model:

PostCommentDetailsBatchDelete

The Post entity has a one-to-many association to a Comment and a one-to-one relationship with the PostDetails entity:

@OneToMany(cascade = CascadeType.ALL, mappedBy = "post",
        orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();

@OneToOne(cascade = CascadeType.ALL, mappedBy = "post",
        orphanRemoval = true, fetch = FetchType.LAZY)
private PostDetails details;

The up-coming tests will be run against the following data:

doInTransaction(session -> {
    int batchSize = batchSize();
    for(int i = 0; i < itemsCount(); i++) {
        int j = 0;

        Post post = new Post(String.format(
            "Post no. %d", i));        
        post.addComment(new Comment( String.format(
            "Post comment %d:%d", i, j++)));
        post.addComment(new Comment(String.format(
            "Post comment %d:%d", i, j++)));
        post.addDetails(new PostDetails());

        session.persist(post);
        if(i % batchSize == 0 && i > 0) {
            session.flush();
            session.clear();
        }
    }
});

Hibernate Configuration

As already explained, the following properties are required for batching INSERT and UPDATE statements:

properties.put("hibernate.jdbc.batch_size", 
    String.valueOf(batchSize()));
properties.put("hibernate.order_inserts", 
    "true");
properties.put("hibernate.order_updates", 
    "true");
properties.put("hibernate.jdbc.batch_versioned_data", 
    "true");

Next, we are going to check if DELETE statements are batched as well.

JPA Cascade Delete

Because cascading entity state transitions is convenient, I’m going to prove that CascadeType.DELETE and JDBC batching don’t mix well.

The following tests is going to:

  • Select some Posts along with Comments and PostDetails
  • Delete the Posts, while propagating the delete event to Comments and PostDetails as well
@Test
public void testCascadeDelete() {
    LOGGER.info("Test batch delete with cascade");
    final AtomicReference<Long> startNanos = 
        new AtomicReference<>();
    addDeleteBatchingRows();
    doInTransaction(session -> {
        List<Post> posts = session.createQuery(
            "select distinct p " +
            "from Post p " +
            "join fetch p.details d " +
            "join fetch p.comments c")
        .list();
        startNanos.set(System.nanoTime());
        for (Post post : posts) {
            session.delete(post);
        }
    });
    LOGGER.info("{}.testCascadeDelete took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos.get()
    ));
}

Running this test gives the following output:

Query:{[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][56,0]} 
Query:{[delete from PostDetails where id=?][3]} 
Query:{[delete from Post where id=? and version=?][3,0]} 
Query:{[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][53,0]} 
Query:{[delete from PostDetails where id=?][2]} 
Query:{[delete from Post where id=? and version=?][2,0]} 
Query:{[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,0]} 
Query:{[delete from PostDetails where id=?][1]} 
Query:{[delete from Post where id=? and version=?][1,0]}

Only the Comment DELETE statements were batched, the other entities being deleted in separate database round-trips.

The reason for this behaviour is given by the ActionQueue sorting implementation:

if ( session.getFactory().getSettings().isOrderUpdatesEnabled() ) {
    // sort the updates by pk
    updates.sort();
}
if ( session.getFactory().getSettings().isOrderInsertsEnabled() ) {
    insertions.sort();
}

While INSERTS and UPDATES are covered, DELETE statements are not sorted at all. A JDBC batch can only be reused when all statements belong to the same database table. When an incoming statement targets a different database table, the current batch has to be released, so that the new batch matches the current statement database table:

public Batch getBatch(BatchKey key) {
    if ( currentBatch != null ) {
        if ( currentBatch.getKey().equals( key ) ) {
            return currentBatch;
        }
        else {
            currentBatch.execute();
            currentBatch.release();
        }
    }
    currentBatch = batchBuilder().buildBatch(key, this);
    return currentBatch;
}

Orphan removal and manual flushing

A work-around is to dissociate all Child entities while manually flushing the Hibernate Session before advancing to a new Child association:

@Test
public void testOrphanRemoval() {
    LOGGER.info("Test batch delete with orphan removal");
    final AtomicReference<Long> startNanos = 
        new AtomicReference<>();

    addDeleteBatchingRows();

    doInTransaction(session -> {
        List<Post> posts = session.createQuery(
            "select distinct p " +
            "from Post p " +
            "join fetch p.details d " +
            "join fetch p.comments c")
        .list();

        startNanos.set(System.nanoTime());

        posts.forEach(Post::removeDetails);
        session.flush();

        posts.forEach(post -> {
            for (Iterator<Comment> commentIterator = 
                     post.getComments().iterator(); 
                     commentIterator.hasNext(); ) {
                Comment comment =  commentIterator.next();
                comment.post = null;
                commentIterator.remove();
            }
        });
        session.flush();

        posts.forEach(session::delete);
    });
    LOGGER.info("{}.testOrphanRemoval took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos.get()
    ));
}

This time all DELETE statements are properly batched:

Query:{[delete from PostDetails where id=?][2]} {[delete from PostDetails where id=?][3]} {[delete from PostDetails where id=?][1]} 
Query:{[delete from Comment where id=? and version=?][53,0]} {[delete from Comment where id=? and version=?][54,0]} {[delete from Comment where id=? and version=?][56,0]} {[delete from Comment where id=? and version=?][55,0]} {[delete from Comment where id=? and version=?][52,0]} {[delete from Comment where id=? and version=?][51,
Query:{[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]} {[delete from Post where id=? and version=?][1,0]} 

SQL Cascade Delete

A better solution is to use SQL cascade deletion, instead of JPA entity state propagation mechanism. This way, we can also reduce the DML statements count. Because Hibernate Session acts as a transactional write-behind cache, we must be extra cautious when mixing entity state transitions with database-side automatic actions, as the Persistence Context might not reflect the latest database changes.

The Post entity one-to-many Comment association is marked with the Hibernate specific @OnDelete annotation, so that the auto-generated database schema includes the ON DELETE CASCADE directive:

@OneToMany(cascade = {
       CascadeType.PERSIST, 
       CascadeType.MERGE}, 
    mappedBy = "post")
@OnDelete(action = OnDeleteAction.CASCADE)
private List<Comment> comments = new ArrayList<>();

Generating the following DDL:

alter table Comment add constraint 
FK_apirq8ka64iidc18f3k6x5tc5 foreign key (post_id) 
references Post on delete cascade

The same is done with the PostDetails entity one-to-one Post association:

@OneToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "id")
@MapsId
@OnDelete(action = OnDeleteAction.CASCADE)
private Post post;

And the associated DDL:

alter table PostDetails add constraint 
FK_h14un5v94coafqonc6medfpv8 foreign key (id) 
references Post on delete cascade

The CascadeType.ALL and orphanRemoval were replaced with CascadeType.PERSIST and CascadeType.MERGE, because we no longer want Hibernate to propagate the entity removal event.

The test only deletes the Post entities.

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select p from Post p")
    .list();
    startNanos.set(System.nanoTime());
    for (Post post : posts) {
        session.delete(post);
    }
});

The DELETE statements are properly batched as there’s only one target table.

Query:{[delete from Post where id=? and version=?][1,0]} {[delete from Post where id=? and version=?][2,0]} {[delete from Post where id=? and version=?][3,0]}

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

If INSERT and UPDATE statements batching is just a matter of configuration, DELETE statements require some additional steps, which may increase the data access layer complexity.

Code available on GitHub.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

3 thoughts on “How to batch DELETE statements with Hibernate

  1. Hi Vlad,

    Is there any technical reason why deletes cannot be batched or sorted in hibernate? If not, wouldn’t it be worth adding that in hibernate may be with similar property like “hibernate.order_deletes” or any future plans?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s