How to optimize the merge operation using update while batching with JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

One of my readers has recently asked me about optimizing the merge entity state transition, and, because this is a great question, I decided to turn it into a blog post.

In this article, you are going to see a shortcoming of the merge entity state transition and how you can deal with it using Hibernate.

Domain Model

For the upcoming test cases, we are going to use the following entities:

Post and PostComment merge vs update

The Post entity has a bidirectional @OneToMany relationship with the PostComment entity.

The PostComment entity is the owner of the bidirectional association, and the @ManyToOne relationship is fetched lazily because EAGER fetching is a code smell.

@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {

    @Id
    @GeneratedValue
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    private Post post;

    private String review;

    //Constructors, getters and setters omitted for brevity
}

So, the Post entity features a mappedBy @OneToMany association:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @OneToMany(
        mappedBy = "post",
        cascade = CascadeType.ALL,
        orphanRemoval = true
    )
    private List<PostComment> comments = new ArrayList<>();

    //Constructors, getters, and setters omitted for brevity

    public void addComment(PostComment comment) {
        comments.add(comment);
        comment.setPost(this);
    }
}

The addComment utility is needed so that you can make sure both sides of the bidirectional association stay in sync.

Test data

Let’s assume we persist the following entities in our database:

for (int i = 0; i < 3; i++) {
    Post post = new Post(
        String.format(
            "High-Performance Java Persistence, Part no. %d", 
            i
        )
    );
    post.addComment(
        new PostComment("Excellent")
    );
    entityManager.persist(post);
}

If we enable batch updates at the Hibernate configuration level:

properties.put("hibernate.jdbc.batch_size", "5");
properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");
properties.put("hibernate.jdbc.batch_versioned_data", "true");

Then, Hibernate issues the following SQL INSERT statements:

Query:[
    "insert into post (title, id) values (?, ?)"
], 
Params:[
    (High-Performance Java Persistence, Part no. 0, 1), 
    (High-Performance Java Persistence, Part no. 1, 3), 
    (High-Performance Java Persistence, Part no. 2, 5)
]

Query:[
    "insert into post_comment (post_id, review, id) values (?, ?, ?)"
], 
Params:[
    (1, Excellent, 2), 
    (3, Excellent, 4), 
    (5, Excellent, 6)
]

As you can see, we activated Hibernate batch updates which work for INSERT, UPDATE, and DELETE statements as well.

Fetching entities for processing

Now, let’s fetch our entities so that the user can modify them while the entities are in the detached state:

List<Post> posts = doInJPA(entityManager -> {
    return entityManager.createQuery(
        "select distinct p " +
        "from Post p " +
        "join fetch p.comments ", Post.class)
    .setHint( QueryHints.PASS_DISTINCT_THROUGH, false )
    .getResultList();
});

for ( Post post: posts ) {
    post.setTitle( 
        "Vlad Mihalcea's " + post.getTitle() 
    );

    for ( PostComment comment: post.getComments() ) {
        comment.setReview( 
            comment.getReview() + " read!" 
        );
    }
}

The PASS_DISTINCT_THROUGH query hint tells Hibernate to use the distinct JPQL keyword for entity reference deduplication while avoiding passing it to the actual SQL SELECT query:

SELECT  p.id AS id1_0_0_,
        c.id AS id1_1_1_,
        p.title AS title2_0_0_,
        c.post_id AS post_id3_1_1_,
        c.review AS review2_1_1_,
        c.post_id AS post_id3_1_0__,
        c.id AS id1_1_0__
FROM    post p
INNER JOIN 
        post_comment c ON p.id = c.post_id

Even if the Post and PostComment entities were modified, no SQL statement is issued unless the entities are reattached to an active Persistence Context. For this, we have to options:

  • We can call the JPA merge operation which selects the latest entity snapshot and copies the detached entity state onto the newly selected entity.
  • Or we can call the Hibernate-specific update operation which aims to reattach the entity without requiring a secondary SELECT query.

Using EntityManager merge operation

When trying the JPA merge operation:

doInJPA(entityManager -> {
    for ( Post post: posts ) {
        entityManager.merge( post );
    }
});

Hibernate generates the following statements:

SELECT  p.id AS id1_0_1_,
        p.title AS title2_0_1_,
        c.post_id AS post_id3_1_3_,
        c.id AS id1_1_3_,
        c.id AS id1_1_0_,
        c.post_id AS post_id3_1_0_,
        c.review AS review2_1_0_
FROM    post p
LEFT OUTER JOIN 
        post_comment c ON p.id = c.post_id
WHERE   p.id = 1

SELECT  p.id AS id1_0_1_,
        p.title AS title2_0_1_,
        c.post_id AS post_id3_1_3_,
        c.id AS id1_1_3_,
        c.id AS id1_1_0_,
        c.post_id AS post_id3_1_0_,
        c.review AS review2_1_0_
FROM    post p
LEFT OUTER JOIN 
        post_comment c ON p.id = c.post_id
WHERE   p.id = 3

SELECT  p.id AS id1_0_1_,
        p.title AS title2_0_1_,
        c.post_id AS post_id3_1_3_,
        c.id AS id1_1_3_,
        c.id AS id1_1_0_,
        c.post_id AS post_id3_1_0_,
        c.review AS review2_1_0_
FROM    post p
LEFT OUTER JOIN 
        post_comment c ON p.id = c.post_id
WHERE   p.id = 5

Query:[
    "update post set title=? where id=?"], 
Params:[
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 0, 1), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 1, 3), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 2, 5)
]

Query:[
    "update post_comment set post_id=?, review=? where id=?"
], 
Params:[
    (1, Excellent read!, 2), 
    (3, Excellent read!, 4), 
    (5, Excellent read!, 6)
]

Apart from the expected UPDATE statements that were batched properly, we can see 3 additional SELECT statements with a LEFT OUTER JOIN between Post and PostComment table rows.

This is undesirable since we could have hundreds of such entities, and each one would require a separate SQL query for the merge operation.

Using Session update operation

When using the Hibernate-specific Session update operation:

doInJPA(entityManager -> {
    Session session = entityManager.unwrap( Session.class );
    for ( Post post: posts ) {
        session.update( post );
    }
});

Hibernate generates only the UPDATE SQL statements:

Query:[
    "update post set title=? where id=?"], 
Params:[
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 0, 1), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 1, 3), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 2, 5)
]

Query:[
    "update post_comment set post_id=?, review=? where id=?"
], 
Params:[
    (1, Excellent read!, 2), 
    (3, Excellent read!, 4), 
    (5, Excellent read!, 6)
]

Much better!

I'm running an online workshop on the 11th of October about High-Performance SQL.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While the merge operation is safer since it introspects the 1st-level transactional write-behind cache and will work even if we already have that entity attached in the currently running Persistence Context, the update operation is much more efficient for entity batch processing.

One caveat where update is not suitable is when using versionelss optimistic locking since the SELECT statement is going to be issued anyway. This is because Hibernate needs the disassembled state in order to provide the load-time property values used as WHERE clause predicates.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.