Posted on August 2, 2017 by vladmihalcea

How to optimize the merge operation using update while batching with JPA and Hibernate

Last modified:

Are you struggling with performance issues in your Spring, Jakarta EE, or Java EE application?

What if there were a tool that could automatically detect what caused performance issues in your JPA and Hibernate data access layer?

Wouldn’t it be awesome to have such a tool to watch your application and prevent performance issues during development, long before they affect production systems?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, Micronaut, or Play Framework.

So, rather than fixing performance issues in your production system on a Saturday night, you are better off using Hypersistence Optimizer to help you prevent those issues so that you can spend your time on the things that you love!

Introduction

One of my readers has recently asked me about optimizing the merge entity state transition, and, because this is a great question, I decided to turn it into a blog post.

In this article, you are going to see a shortcoming of the merge entity state transition and how you can deal with it using Hibernate.

Domain Model

For the upcoming test cases, we are going to use the following entities:

The Post entity has a bidirectional @OneToMany relationship with the PostComment entity.

The PostComment entity is the owner of the bidirectional association, and the @ManyToOne relationship is fetched lazily because EAGER fetching is a code smell.

@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {

    @Id
    @GeneratedValue
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    private Post post;

    private String review;

    //Constructors, getters and setters omitted for brevity
}

So, the Post entity features a mappedBy @OneToMany association:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @OneToMany(
        mappedBy = "post",
        cascade = CascadeType.ALL,
        orphanRemoval = true
    )
    private List<PostComment> comments = new ArrayList<>();

    //Constructors, getters, and setters omitted for brevity

    public void addComment(PostComment comment) {
        comments.add(comment);
        comment.setPost(this);
    }
}

The addComment utility is needed so that you can make sure both sides of the bidirectional association stay in sync.

Test data

Let’s assume we persist the following entities in our database:

for (int i = 0; i < 3; i++) {
    Post post = new Post(
        String.format(
            "High-Performance Java Persistence, Part no. %d", 
            i
        )
    );
    post.addComment(
        new PostComment("Excellent")
    );
    entityManager.persist(post);
}

If we enable batch updates at the Hibernate configuration level:

properties.put("hibernate.jdbc.batch_size", "5");
properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");
properties.put("hibernate.jdbc.batch_versioned_data", "true");

Then, Hibernate issues the following SQL INSERT statements:

Query:[
    "insert into post (title, id) values (?, ?)"
], 
Params:[
    (High-Performance Java Persistence, Part no. 0, 1), 
    (High-Performance Java Persistence, Part no. 1, 3), 
    (High-Performance Java Persistence, Part no. 2, 5)
]

Query:[
    "insert into post_comment (post_id, review, id) values (?, ?, ?)"
], 
Params:[
    (1, Excellent, 2), 
    (3, Excellent, 4), 
    (5, Excellent, 6)
]

As you can see, we activated Hibernate batch updates which work for INSERT, UPDATE, and DELETE statements as well.

Fetching entities for processing

Now, let’s fetch our entities so that the user can modify them while the entities are in the detached state:

List<Post> posts = doInJPA(entityManager -> {
    return entityManager.createQuery(
        "select distinct p " +
        "from Post p " +
        "join fetch p.comments ", Post.class)
    .setHint( QueryHints.PASS_DISTINCT_THROUGH, false )
    .getResultList();
});

for ( Post post: posts ) {
    post.setTitle( 
        "Vlad Mihalcea's " + post.getTitle() 
    );

    for ( PostComment comment: post.getComments() ) {
        comment.setReview( 
            comment.getReview() + " read!" 
        );
    }
}

The PASS_DISTINCT_THROUGH query hint tells Hibernate to use the distinct JPQL keyword for entity reference deduplication while avoiding passing it to the actual SQL SELECT query:

SELECT  p.id AS id1_0_0_,
        c.id AS id1_1_1_,
        p.title AS title2_0_0_,
        c.post_id AS post_id3_1_1_,
        c.review AS review2_1_1_,
        c.post_id AS post_id3_1_0__,
        c.id AS id1_1_0__
FROM    post p
INNER JOIN 
        post_comment c ON p.id = c.post_id

Even if the Post and PostComment entities were modified, no SQL statement is issued unless the entities are reattached to an active Persistence Context. For this, we have to options:

We can call the JPA merge operation which selects the latest entity snapshot and copies the detached entity state onto the newly selected entity.
Or we can call the Hibernate-specific update operation which aims to reattach the entity without requiring a secondary SELECT query.

Using EntityManager merge operation

When trying the JPA merge operation:

doInJPA(entityManager -> {
    for ( Post post: posts ) {
        entityManager.merge( post );
    }
});

Hibernate generates the following statements:

SELECT  p.id AS id1_0_1_,
        p.title AS title2_0_1_,
        c.post_id AS post_id3_1_3_,
        c.id AS id1_1_3_,
        c.id AS id1_1_0_,
        c.post_id AS post_id3_1_0_,
        c.review AS review2_1_0_
FROM    post p
LEFT OUTER JOIN 
        post_comment c ON p.id = c.post_id
WHERE   p.id = 1

SELECT  p.id AS id1_0_1_,
        p.title AS title2_0_1_,
        c.post_id AS post_id3_1_3_,
        c.id AS id1_1_3_,
        c.id AS id1_1_0_,
        c.post_id AS post_id3_1_0_,
        c.review AS review2_1_0_
FROM    post p
LEFT OUTER JOIN 
        post_comment c ON p.id = c.post_id
WHERE   p.id = 3

SELECT  p.id AS id1_0_1_,
        p.title AS title2_0_1_,
        c.post_id AS post_id3_1_3_,
        c.id AS id1_1_3_,
        c.id AS id1_1_0_,
        c.post_id AS post_id3_1_0_,
        c.review AS review2_1_0_
FROM    post p
LEFT OUTER JOIN 
        post_comment c ON p.id = c.post_id
WHERE   p.id = 5

Query:[
    "update post set title=? where id=?"], 
Params:[
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 0, 1), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 1, 3), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 2, 5)
]

Query:[
    "update post_comment set post_id=?, review=? where id=?"
], 
Params:[
    (1, Excellent read!, 2), 
    (3, Excellent read!, 4), 
    (5, Excellent read!, 6)
]

Apart from the expected UPDATE statements that were batched properly, we can see 3 additional SELECT statements with a LEFT OUTER JOIN between Post and PostComment table rows.

This is undesirable since we could have hundreds of such entities, and each one would require a separate SQL query for the merge operation.

Using Session update operation

When using the Hibernate-specific Session update operation:

doInJPA(entityManager -> {
    Session session = entityManager.unwrap( Session.class );
    for ( Post post: posts ) {
        session.update( post );
    }
});

Hibernate generates only the UPDATE SQL statements:

Query:[
    "update post set title=? where id=?"], 
Params:[
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 0, 1), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 1, 3), 
    (Vlad Mihalcea's High-Performance Java Persistence, Part no. 2, 5)
]

Query:[
    "update post_comment set post_id=?, review=? where id=?"
], 
Params:[
    (1, Excellent read!, 2), 
    (3, Excellent read!, 4), 
    (5, Excellent read!, 6)
]

Much better!

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While the merge operation is safer since it introspects the 1st-level transactional write-behind cache and will work even if we already have that entity attached in the currently running Persistence Context, the update operation is much more efficient for entity batch processing.

One caveat where update is not suitable is when using versionless optimistic locking since the SELECT statement is going to be issued anyway. This is because Hibernate needs the disassembled state in order to provide the load-time property values used as WHERE clause predicates.

Follow @vlad_mihalcea

Transactions and Concurrency Control eBook

High-Performance Java Persistence rocks!

Category: Hibernate Tags: batch, hibernate, merge, PASS_DISTINCT_THROUGH, tips, Training, Tutorial, update

6 Comments on “How to optimize the merge operation using update while batching with JPA and Hibernate”

Ciprian Brebu
January 27, 2024

Can be SessionStateless update used instead? With a more complex mechanism for obtaining transaction and rollback?

Reply
- vladmihalcea
  January 28, 2024
  
  That could be a solution since the StatelessSession has now support for batching.
  
  Check out this article for more details.
  
  Reply
Victor Rentea
January 26, 2024

session.update( post ); is deprecated in Hibernate 6. Any workarounds?

Reply
- vladmihalcea
  January 26, 2024
  
  It shouldn’t have been deprecated since it’s very useful and merge is not really an alternative as it works differently.
  
  I created HHH-15915 for this purpose, you may want to upvote it.
  
  Reply
  - Brebu Ciprian
    January 27, 2024
    
    Thanks guys! Let hope still exist a sollution for this!
  - vladmihalcea
    January 27, 2024
    
    You’re welcome.