How to merge entity collections with JPA and Hibernate
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, you are going to learn why overwriting entity collections is an anti-pattern and how you can merge collections both effectively and efficiently with JPA and Hibernate.
The reason I decided to write this article is that I’ve been this question asked over and over and over again.
Why overwriting entity collections is an anti-pattern and how you can merge collections both effectively and efficiently with #JPA and #Hibernate.https://t.co/XV8PNJML8Q
— Java (@java) November 1, 2018
Domain Model
For the upcoming tests, we are going to use the following Post
and PostComment
entities which have a bidirectional @OneToMany
relationship:
As explained in this article, you should prefer bidirectional associations since they are more efficient than unidirectional ones in terms of SQL performance.
The Post
entity is mapped as follows:
@Entity(name = "Post") @Table(name = "post") public class Post { @Id private Long id; private String title; @OneToMany( mappedBy = "post", cascade = CascadeType.ALL, orphanRemoval = true ) private List<PostComment> comments = new ArrayList<>(); public Long getId() { return id; } public Post setId(Long id) { this.id = id; return this; } public String getTitle() { return title; } public Post setTitle(String title) { this.title = title; return this; } public List<PostComment> getComments() { return comments; } private Post setComments(List<PostComment> comments) { this.comments = comments; return this; } public Post addComment(PostComment comment) { comments.add(comment); comment.setPost(this); return this; } public Post removeComment(PostComment comment) { comments.remove(comment); comment.setPost(null); return this; } }
What’s worth noting is that we employed a Fluent API for our entities since Hibernate supports just fine.
The PostComment
looks this:
@Entity(name = "PostComment") @Table(name = "post_comment") public class PostComment { @Id @GeneratedValue private Long id; private String review; @ManyToOne(fetch = FetchType.LAZY) private Post post; public PostComment() { } public PostComment(String review) { this.review = review; } public Long getId() { return id; } public PostComment setId(Long id) { this.id = id; return this; } public String getReview() { return review; } public PostComment setReview(String review) { this.review = review; return this; } public Post getPost() { return post; } public PostComment setPost(Post post) { this.post = post; return this; } @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof PostComment)) return false; return id != null && id.equals(((PostComment) o).getId()); } @Override public int hashCode() { return getClass().hashCode(); } }
The
@ManyToOne
association uses lazy loading because the default EAGER fetching strategy is almost always a bad idea.While implementing
equals
andhashCode
based on a natural identifier is the best approach, in case your entity lacks a natural identifier, you can implementequals
andhashCode
based on the entity identifier but only if thehashCode
yields a constant value and the entity identifier is checked for equality only when it’s not null. For more details, check out this article.
Modifying a collection of entities
Now, let’s assume we have persisted the following entities in the database:
doInJPA(entityManager -> { entityManager.persist( new Post() .setId(1L) .setTitle("High-Performance Java Persistence") ); }); doInJPA(entityManager -> { entityManager .find(Post.class, 1L) .addComment( new PostComment() .setReview("JDBC section is a must read!") ) .addComment( new PostComment() .setReview("The book size is larger than usual.") ) .addComment( new PostComment() .setReview("Just half-way through.") ) .addComment( new PostComment() .setReview("The book has over 450 pages.") ); });
As you can see, the fluent API is really convenient and worth considering.
Testing time
Now, assuming the client needs to fetch a List
of PostComment
entities associated with a given Post
record:
List<PostComment> comments = fetchPostComments(1L);
The fetchPostComments
method looks like this:
public List<PostComment> fetchPostComments(Long postId) { return doInJPA(entityManager -> { return entityManager.createQuery(""" select pc from PostComment pc join pc.post p where p.id = :postId order by pc.id """, PostComment.class) .setParameter("postId", postId) .getResultList(); }); }
After fetching the comments
, the client needs to apply some modifications to the PostComment
entities:
comments.get(0).setReview("The JDBC part is a must-have!"); comments.remove(2); comments.add( new PostComment() .setReview( "The last part is about jOOQ and " + "how to get the most of your relational database." ) );
The review
property of the first PostComment
is going to be changed. The third comment is removed from the comments
collection, and a new PostComment
is added at the end of the List
.
Now, the user wants to push these changes to the database. However, the PostComment
entities in the comments
collection are in the detached state so we need to merge them on a managed Post
entity.
The collection overwrite anti-pattern
The first solution that comes to many developers is to overwrite the comments
property of the Post
entity with the detached one sent by the client:
Post post = entityManager.createQuery(""" select p from Post p join fetch p.comments where p.id = :id """, Post.class) .setParameter("id", 1L) .getSingleResult(); post.setComments(comments);
However, when running this test case above, Hibernate throws the following exception:
Caused by: org.hibernate.PersistentObjectException: detached entity passed to persist: com.vladmihalcea.book.hpjp.hibernate.association.BidirectionalOneToManyMergeTest$PostComment
When realizing that mixing managed entities (e.g. Post
) with detached ones (e.g. incoming comments
) is a bad idea, the developer will attempt something like this:
Post post = entityManager.createQuery(""" select p from Post p join fetch p.comments where p.id = :id """, Post.class) .setParameter("id", 1L) .getSingleResult(); entityManager.detach(post); post.setComments(comments); entityManager.merge(post);
And Hibernate is going to execute the following SQL INSERT statements:
SELECT p.id as id1_0_0_, pc.id as id1_1_1_, p.title as title2_0_0_, pc.post_id as post_id3_1_1_, pc.review as review2_1_1_, pc.post_id as post_id3_1_0__, pc.id as id1_1_0__ FROM post p INNER JOIN post_comment pc ON p.id = pc.post_id WHERE p.id = 1 SELECT p.id as id1_0_1_, p.title as title2_0_1_, pc.post_id as post_id3_1_3_, pc.id as id1_1_3_, pc.id as id1_1_0_, pc.post_id as post_id3_1_0_, pc.review as review2_1_0_ FROM post p LEFT OUTER JOIN post_comment pc ON p.id = pc.post_id WHERE p.id = 1 INSERT INTO post_comment ( post_id, review, id ) VALUES ( NULL(BIGINT), 'The last part is about jOOQ and how to get the most of your relational database.', 5 )] UPDATE post_comment SET post_id = 1, review = 'The JDBC part is a must-have!' WHERE id = 1 DELETE FROM post_comment WHERE id = 3
However, the verifyResults
method will fail because, instead of 4 PostComment
entities, only 3 will be returned from the database:
Post post = entityManager.createQuery(""" select p from Post p join fetch p.comments c where p.id = :id order by c.id """, Post.class) .setParameter("id", 1L) .getSingleResult(); assertEquals(4, post.getComments().size()); assertEquals( "The JDBC part is a must-have!", post.getComments().get(0).getReview() ); assertEquals( "The book size is larger than usual.", post.getComments().get(1).getReview() ); assertEquals( "The book has over 450 pages.", post.getComments().get(2).getReview() ); assertEquals( "The last part is about jOOQ and how to get the most of your relational database.", post.getComments().get(3).getReview() );
If you take a look on the previously executed SQL statement, you will notice that the post_id
was set to NULL when inserting the new post_comment
row, hence Hibernate is going to skip this record hen fetching the PostComment
entities associated to our parent Post
entity.
Unfortunately, without tests, this issue will be probably caught in production, and it will also require fixing production data.
A simple way to merge entity collections with JPA and Hibernate
The previous example issue came from not synchronizing both ends of the bidirectional association.
After acknowledging the problem, the merging method can be rewritten like this:
Post post = entityManager.createQuery(""" select p from Post p join fetch p.comments c where p.id = :id order by c.id """, Post.class) .setParameter("id", 1L) .getSingleResult(); entityManager.detach(post); post.getComments().clear(); for (PostComment comment : comments) { post.addComment(comment); } entityManager.merge(post);
Now, the checks will pass, but HIbernate generated the following SQL statements:
SELECT p.id as id1_0_0_, pc.id as id1_1_1_, p.title as title2_0_0_, pc.post_id as post_id3_1_1_, pc.review as review2_1_1_, pc.post_id as post_id3_1_0__, pc.id as id1_1_0__ FROM post p INNER JOIN post_comment pc ON p.id = pc.post_id WHERE p.id = 1 SELECT p.id as id1_0_1_, p.title as title2_0_1_, pc.post_id as post_id3_1_3_, pc.id as id1_1_3_, pc.id as id1_1_0_, pc.post_id as post_id3_1_0_, pc.review as review2_1_0_ FROM post p LEFT OUTER JOIN post_comment pc ON p.id = pc.post_id WHERE p.id = 1 INSERT INTO post_comment ( post_id, review, id ) VALUES ( 1, 'The last part is about jOOQ and how to get the most of your relational database.', 5 )] UPDATE post_comment SET post_id = 1, review = 'The JDBC part is a must-have!' WHERE id = 1 DELETE FROM post_comment WHERE id = 3
There are 2 SELECT statements executed instead of 1. While the former query was generated due to the JPQL query which we used to fetch the Post
entity along with its associated comments
collection, the latter is generated by the merge
method call which works as follows:
- A new entity snapshot is fetched from the database.
- The detached state is copied onto the newly fetched entity.
- The managed entity with its modified data can now be synchronized with the database when the Persistence Context is flushed.
The manual JPA entity collection merge strategy
If you want to avoid the secondary SQL query, you need to do a manual collection merging:
Post post = entityManager.createQuery(""" select p from Post p join fetch p.comments c where p.id = :id order by c.id """, Post.class) .setParameter("id", 1L) .getSingleResult(); List<PostComment> removedComments = new ArrayList<>( post.getComments() ); removedComments.removeAll(comments); for(PostComment removedComment : removedComments) { post.removeComment(removedComment); } List<PostComment> newComments = new ArrayList<>(comments); newComments.removeAll(post.getComments()); comments.removeAll(newComments); for(PostComment existingComment : comments) { existingComment.setPost(post); PostComment mergedComment = entityManager .merge(existingComment); post.getComments().set( post.getComments().indexOf(mergedComment), mergedComment ); } for(PostComment newComment : newComments) { post.addComment(newComment); }
And, this time, Hibernate is going to execute a single SELECT statement:
SELECT p.id as id1_0_0_, pc.id as id1_1_1_, p.title as title2_0_0_, pc.post_id as post_id3_1_1_, pc.review as review2_1_1_, pc.post_id as post_id3_1_0__, pc.id as id1_1_0__ FROM post p INNER JOIN post_comment pc ON p.id = pc.post_id WHERE p.id = 1 INSERT INTO post_comment ( post_id, review, id ) VALUES ( 1, 'The last part is about jOOQ and how to get the most of your relational database.', 5 )] UPDATE post_comment SET post_id = 1, review = 'The JDBC part is a must-have!' WHERE id = 1 DELETE FROM post_comment WHERE id = 3
The manual merging strategy might look complicated at first glance, but it’s not really complicated at all. Basically, the manual merging strategy must execute the following steps:
- It must remove the existing database records that are no longer found in the incoming collection.
- It must update the existing database records which can be found in the incoming collection.
- It must add the records found in the incoming collection, which cannot be found in the current database snapshot.
Merge collections via the parent JPA and Hibernate entity
The previous examples were more challenging because the child entity collection was processed independently of the parent entity.
Now, if we can fetch the parent entity along with its associated child entity collection, as illustrated by the following code snippet:
Post post = fetchPostWithComments(1L);
Where the fetchPostWithComments
method looks like this:
public Post fetchPostWithComments(Long postId) { return doInJPA(entityManager -> { return entityManager.createQuery(""" select p from Post p join fetch p.comments where p.id = :postId """, Post.class) .setHint(AvailableHints.HINT_READ_ONLY, true) .setParameter("postId", postId) .getSingleResult(); }); }
The READONLY
JPA query hint is used to ensure that the entities are fetched in read-only mode.
After we fetched the Post
entity, we can simply apply the same collection modification logic we used for the previous examples:
post.getComments().get(0).setReview("The JDBC part is a must-have!"); post.removeComment(post.getComments().get(2)); post.addComment( new PostComment() .setReview( "The last part is about jOOQ and " + "how to get the most of your relational database." ) );
At this point, the Post
entity and its PostComment
child entities are in the detached state, so we need to merge the entity aggregate back to an open Persistence Context.
To do so, we just have to merge the Post
entity, and, since the comments
association uses the CascadeType.ALL
entity state transition cascading strategy, the merge operation is going to be cascaded to the comments
association as well:
doInJPA(entityManager -> { entityManager.merge(post); });
When running this Post
entity merge example, we can see that Hibernate executes the proper SQL statements:
SELECT p.id as id1_0_0_, pc.id as id1_1_1_, p.title as title2_0_0_, pc.post_id as post_id3_1_1_, pc.review as review2_1_1_, pc.post_id as post_id3_1_0__, pc.id as id1_1_0__ FROM post p INNER JOIN post_comment pc ON p.id = pc.post_id WHERE p.id = 1 INSERT INTO post_comment ( post_id, review, id ) VALUES ( 1, 'The last part is about jOOQ and how to get the most of your relational database.', 5 )] UPDATE post_comment SET post_id = 1, review = 'The JDBC part is a must-have!' WHERE id = 1 DELETE FROM post_comment WHERE id = 3
Awesome, isn’t it!
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
As you could see, merging entity collections requires you to understand how JPA and Hibernate work under the hood. While the automatic merging strategy offered by Hibernate is convenient, it’s not as efficient as the manual merging alternative, especially when merging multiple collections.
