How to optimize the merge operation using update while batching with JPA and Hibernate
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
One of my readers has recently asked me about optimizing the merge
entity state transition, and, because this is a great question, I decided to turn it into a blog post.
In this article, you are going to see a shortcoming of the merge
entity state transition and how you can deal with it using Hibernate.
Domain Model
For the upcoming test cases, we are going to use the following entities:
The Post
entity has a bidirectional @OneToMany
relationship with the PostComment
entity.
The PostComment
entity is the owner of the bidirectional association, and the @ManyToOne
relationship is fetched lazily because EAGER fetching is a code smell.
@Entity(name = "PostComment") @Table(name = "post_comment") public class PostComment { @Id @GeneratedValue private Long id; @ManyToOne(fetch = FetchType.LAZY) private Post post; private String review; //Constructors, getters and setters omitted for brevity }
So, the Post
entity features a mappedBy
@OneToMany
association:
@Entity(name = "Post") @Table(name = "post") public class Post { @Id @GeneratedValue private Long id; private String title; @OneToMany( mappedBy = "post", cascade = CascadeType.ALL, orphanRemoval = true ) private List<PostComment> comments = new ArrayList<>(); //Constructors, getters, and setters omitted for brevity public void addComment(PostComment comment) { comments.add(comment); comment.setPost(this); } }
The addComment
utility is needed so that you can make sure both sides of the bidirectional association stay in sync.
Test data
Let’s assume we persist the following entities in our database:
for (int i = 0; i < 3; i++) { Post post = new Post( String.format( "High-Performance Java Persistence, Part no. %d", i ) ); post.addComment( new PostComment("Excellent") ); entityManager.persist(post); }
If we enable batch updates at the Hibernate configuration level:
properties.put("hibernate.jdbc.batch_size", "5"); properties.put("hibernate.order_inserts", "true"); properties.put("hibernate.order_updates", "true"); properties.put("hibernate.jdbc.batch_versioned_data", "true");
Then, Hibernate issues the following SQL INSERT statements:
Query:[ "insert into post (title, id) values (?, ?)" ], Params:[ (High-Performance Java Persistence, Part no. 0, 1), (High-Performance Java Persistence, Part no. 1, 3), (High-Performance Java Persistence, Part no. 2, 5) ] Query:[ "insert into post_comment (post_id, review, id) values (?, ?, ?)" ], Params:[ (1, Excellent, 2), (3, Excellent, 4), (5, Excellent, 6) ]
As you can see, we activated Hibernate batch updates which work for INSERT, UPDATE, and DELETE statements as well.
Fetching entities for processing
Now, let’s fetch our entities so that the user can modify them while the entities are in the detached
state:
List<Post> posts = doInJPA(entityManager -> { return entityManager.createQuery( "select distinct p " + "from Post p " + "join fetch p.comments ", Post.class) .setHint( QueryHints.PASS_DISTINCT_THROUGH, false ) .getResultList(); }); for ( Post post: posts ) { post.setTitle( "Vlad Mihalcea's " + post.getTitle() ); for ( PostComment comment: post.getComments() ) { comment.setReview( comment.getReview() + " read!" ); } }
The PASS_DISTINCT_THROUGH
query hint tells Hibernate to use the distinct
JPQL keyword for entity reference deduplication while avoiding passing it to the actual SQL SELECT query:
SELECT p.id AS id1_0_0_, c.id AS id1_1_1_, p.title AS title2_0_0_, c.post_id AS post_id3_1_1_, c.review AS review2_1_1_, c.post_id AS post_id3_1_0__, c.id AS id1_1_0__ FROM post p INNER JOIN post_comment c ON p.id = c.post_id
Even if the Post
and PostComment
entities were modified, no SQL statement is issued unless the entities are reattached to an active Persistence Context. For this, we have to options:
- We can call the JPA
merge
operation which selects the latest entity snapshot and copies the detached entity state onto the newly selected entity. - Or we can call the Hibernate-specific
update
operation which aims to reattach the entity without requiring a secondary SELECT query.
Using EntityManager merge operation
When trying the JPA merge
operation:
doInJPA(entityManager -> { for ( Post post: posts ) { entityManager.merge( post ); } });
Hibernate generates the following statements:
SELECT p.id AS id1_0_1_, p.title AS title2_0_1_, c.post_id AS post_id3_1_3_, c.id AS id1_1_3_, c.id AS id1_1_0_, c.post_id AS post_id3_1_0_, c.review AS review2_1_0_ FROM post p LEFT OUTER JOIN post_comment c ON p.id = c.post_id WHERE p.id = 1 SELECT p.id AS id1_0_1_, p.title AS title2_0_1_, c.post_id AS post_id3_1_3_, c.id AS id1_1_3_, c.id AS id1_1_0_, c.post_id AS post_id3_1_0_, c.review AS review2_1_0_ FROM post p LEFT OUTER JOIN post_comment c ON p.id = c.post_id WHERE p.id = 3 SELECT p.id AS id1_0_1_, p.title AS title2_0_1_, c.post_id AS post_id3_1_3_, c.id AS id1_1_3_, c.id AS id1_1_0_, c.post_id AS post_id3_1_0_, c.review AS review2_1_0_ FROM post p LEFT OUTER JOIN post_comment c ON p.id = c.post_id WHERE p.id = 5 Query:[ "update post set title=? where id=?"], Params:[ (Vlad Mihalcea's High-Performance Java Persistence, Part no. 0, 1), (Vlad Mihalcea's High-Performance Java Persistence, Part no. 1, 3), (Vlad Mihalcea's High-Performance Java Persistence, Part no. 2, 5) ] Query:[ "update post_comment set post_id=?, review=? where id=?" ], Params:[ (1, Excellent read!, 2), (3, Excellent read!, 4), (5, Excellent read!, 6) ]
Apart from the expected UPDATE
statements that were batched properly, we can see 3 additional SELECT statements with a LEFT OUTER JOIN between Post
and PostComment
table rows.
This is undesirable since we could have hundreds of such entities, and each one would require a separate SQL query for the merge
operation.
Using Session update operation
When using the Hibernate-specific Session
update
operation:
doInJPA(entityManager -> { Session session = entityManager.unwrap( Session.class ); for ( Post post: posts ) { session.update( post ); } });
Hibernate generates only the UPDATE SQL statements:
Query:[ "update post set title=? where id=?"], Params:[ (Vlad Mihalcea's High-Performance Java Persistence, Part no. 0, 1), (Vlad Mihalcea's High-Performance Java Persistence, Part no. 1, 3), (Vlad Mihalcea's High-Performance Java Persistence, Part no. 2, 5) ] Query:[ "update post_comment set post_id=?, review=? where id=?" ], Params:[ (1, Excellent read!, 2), (3, Excellent read!, 4), (5, Excellent read!, 6) ]
Much better!
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
While the merge
operation is safer since it introspects the 1st-level transactional write-behind cache and will work even if we already have that entity attached in the currently running Persistence Context, the update
operation is much more efficient for entity batch processing.
One caveat where update
is not suitable is when using versionelss optimistic locking since the SELECT
statement is going to be issued anyway. This is because Hibernate needs the disassembled state
in order to provide the load-time property values used as WHERE clause predicates.
