How to merge entity collections with JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, you are going to learn why overwriting entity collections is an anti-pattern and how you can merge collections both effectively and efficiently with JPA and Hibernate.

The reason I decided to write this article is that I’ve been this question asked over and over and over again.

Domain Model

For the upcoming tests, we are going to use the following Post and PostComment entities which have a bidirectional @OneToMany relationship:

How to merge entity collections with JPA and Hibernate Domain Model

As explained in this article, you should prefer bidirectional associations since they are more efficient than unidirectional ones in terms of SQL performance.

The Post entity is mapped as follows:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    @OneToMany(
        mappedBy = "post", 
        cascade = CascadeType.ALL, 
        orphanRemoval = true
    )
    private List<PostComment> comments = new ArrayList<>();

    public Long getId() {
        return id;
    }

    public Post setId(Long id) {
        this.id = id;
        return this;
    }

    public String getTitle() {
        return title;
    }

    public Post setTitle(String title) {
        this.title = title;
        return this;
    }

    public List<PostComment> getComments() {
        return comments;
    }

    private Post setComments(List<PostComment> comments) {
        this.comments = comments;
        return this;
    }

    public Post addComment(PostComment comment) {
        comments.add(comment);
        comment.setPost(this);

        return this;
    }

    public Post removeComment(PostComment comment) {
        comments.remove(comment);
        comment.setPost(null);

        return this;
    }
}

What’s worth noting is that we employed a Fluent API for our entities since Hibernate supports just fine.

The PostComment looks this:

@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {

    @Id
    @GeneratedValue
    private Long id;

    private String review;

    @ManyToOne(fetch = FetchType.LAZY)
    private Post post;

    public PostComment() {
    }

    public PostComment(String review) {
        this.review = review;
    }

    public Long getId() {
        return id;
    }

    public PostComment setId(Long id) {
        this.id = id;
        return this;
    }

    public String getReview() {
        return review;
    }

    public PostComment setReview(String review) {
        this.review = review;
        return this;
    }

    public Post getPost() {
        return post;
    }

    public PostComment setPost(Post post) {
        this.post = post;
        return this;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) 
            return true;
        if (!(o instanceof PostComment)) 
            return false;
        return 
            id != null && 
            id.equals(((PostComment) o).getId());
    }

    @Override
    public int hashCode() {
        return getClass().hashCode();
    }
}

The @ManyToOne association uses lazy loading because the default EAGER fetching strategy is almost always a bad idea.

While implementing equals and hashCode based on a natural identifier is the best approach, in case your entity lacks a natural identifier, you can implement equals and hashCode based on the entity identifier but only if the hashCode yields a constant value and the entity identifier is checked for equality only when it’s not null. For more details, check out this article.

Modifying a collection of entities

Now, let’s assume we have persisted the following entities in the database:

doInJPA(entityManager -> {
    entityManager.persist(
        new Post()
            .setId(1L)
            .setTitle("High-Performance Java Persistence")
    );
});

doInJPA(entityManager -> {
    entityManager
        .find(Post.class, 1L)
        .addComment(
            new PostComment()
            .setReview("JDBC section is a must read!")
        )
        .addComment(
            new PostComment()
            .setReview("The book size is larger than usual.")
        )
        .addComment(
            new PostComment()
            .setReview("Just half-way through.")
        )
        .addComment(
            new PostComment()
            .setReview("The book has over 450 pages.")
        );
});

As you can see, the fluent API is really convenient and worth considering.

Testing time

Now, assuming the client needs to fetch a List of PostComment entities associated with a given Post record:

List<PostComment> comments = fetchPostComments(1L);

The fetchPostComments method looks like this:

public List<PostComment> fetchPostComments(Long postId) {
    return doInJPA(entityManager -> {
        return entityManager.createQuery("""
            select pc
            from PostComment pc
            join pc.post p
            where p.id = :postId
            order by pc.id
            """, PostComment.class)
        .setParameter("postId", postId)
        .getResultList();
    });
}

After fetching the comments, the client needs to apply some modifications to the PostComment entities:

comments.get(0).setReview("The JDBC part is a must-have!");

comments.remove(2);

comments.add(
    new PostComment()
        .setReview(
            "The last part is about jOOQ and " +
            "how to get the most of your relational database."
        )
);

The review property of the first PostComment is going to be changed. The third comment is removed from the comments collection, and a new PostComment is added at the end of the List.

Now, the user wants to push these changes to the database. However, the PostComment entities in the comments collection are in the detached state so we need to merge them on a managed Post entity.

The collection overwrite anti-pattern

The first solution that comes to many developers is to overwrite the comments property of the Post entity with the detached one sent by the client:

Post post = entityManager.createQuery("""
    select p
    from Post p
    join fetch p.comments
    where p.id = :id
    """, Post.class)
.setParameter("id", 1L)
.getSingleResult();

post.setComments(comments);

However, when running this test case above, Hibernate throws the following exception:

Caused by: org.hibernate.PersistentObjectException: 
detached entity passed to persist: 
com.vladmihalcea.book.hpjp.hibernate.association.BidirectionalOneToManyMergeTest$PostComment

When realizing that mixing managed entities (e.g. Post) with detached ones (e.g. incoming comments) is a bad idea, the developer will attempt something like this:

Post post = entityManager.createQuery("""
    select p
    from Post p
    join fetch p.comments
    where p.id = :id
    """, Post.class)
.setParameter("id", 1L)
.getSingleResult();

entityManager.detach(post);
post.setComments(comments);
entityManager.merge(post);

And Hibernate is going to execute the following SQL INSERT statements:

SELECT 
    p.id as id1_0_0_, 
    pc.id as id1_1_1_, 
    p.title as title2_0_0_, 
    pc.post_id as post_id3_1_1_, 
    pc.review as review2_1_1_, 
    pc.post_id as post_id3_1_0__, 
    pc.id as id1_1_0__ 
FROM post p 
INNER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

SELECT 
    p.id as id1_0_1_, 
    p.title as title2_0_1_, 
    pc.post_id as post_id3_1_3_, 
    pc.id as id1_1_3_, 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_ 
FROM post p 
LEFT OUTER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    NULL(BIGINT), 
    'The last part is about jOOQ and how to get the most of your relational database.', 
    5
)]
UPDATE 
    post_comment 
SET 
    post_id = 1, 
    review = 'The JDBC part is a must-have!' 
WHERE 
    id = 1
    
DELETE FROM post_comment 
WHERE id = 3

However, the verifyResults method will fail because, instead of 4 PostComment entities, only 3 will be returned from the database:

Post post = entityManager.createQuery("""
    select p
    from Post p
    join fetch p.comments c
    where p.id = :id
    order by c.id
    """, Post.class)
.setParameter("id", 1L)
.getSingleResult();

assertEquals(4, post.getComments().size());

assertEquals(
    "The JDBC part is a must-have!",
    post.getComments().get(0).getReview()
);

assertEquals(
    "The book size is larger than usual.",
    post.getComments().get(1).getReview()
);

assertEquals(
    "The book has over 450 pages.",
    post.getComments().get(2).getReview()
);

assertEquals(
    "The last part is about jOOQ and how to get the most of your relational database.",
    post.getComments().get(3).getReview()
);

If you take a look on the previously executed SQL statement, you will notice that the post_id was set to NULL when inserting the new post_comment row, hence Hibernate is going to skip this record hen fetching the PostComment entities associated to our parent Post entity.

Unfortunately, without tests, this issue will be probably caught in production, and it will also require fixing production data.

A simple way to merge entity collections with JPA and Hibernate

The previous example issue came from not synchronizing both ends of the bidirectional association.

After acknowledging the problem, the merging method can be rewritten like this:

Post post = entityManager.createQuery("""
    select p
    from Post p
    join fetch p.comments c
    where p.id = :id
    order by c.id
    """, Post.class)
.setParameter("id", 1L)
.getSingleResult();

entityManager.detach(post);

post.getComments().clear();
for (PostComment comment : comments) {
    post.addComment(comment);
}

entityManager.merge(post);

Now, the checks will pass, but HIbernate generated the following SQL statements:

SELECT 
    p.id as id1_0_0_, 
    pc.id as id1_1_1_, 
    p.title as title2_0_0_, 
    pc.post_id as post_id3_1_1_, 
    pc.review as review2_1_1_, 
    pc.post_id as post_id3_1_0__, 
    pc.id as id1_1_0__ 
FROM post p 
INNER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

SELECT 
    p.id as id1_0_1_, 
    p.title as title2_0_1_, 
    pc.post_id as post_id3_1_3_, 
    pc.id as id1_1_3_, 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_ 
FROM post p 
LEFT OUTER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'The last part is about jOOQ and how to get the most of your relational database.', 
    5
)]
UPDATE 
    post_comment 
SET 
    post_id = 1, 
    review = 'The JDBC part is a must-have!' 
WHERE 
    id = 1
    
DELETE FROM post_comment 
WHERE id = 3

There are 2 SELECT statements executed instead of 1. While the former query was generated due to the JPQL query which we used to fetch the Post entity along with its associated comments collection, the latter is generated by the merge method call which works as follows:

  • A new entity snapshot is fetched from the database.
  • The detached state is copied onto the newly fetched entity.
  • The managed entity with its modified data can now be synchronized with the database when the Persistence Context is flushed.

The manual JPA entity collection merge strategy

If you want to avoid the secondary SQL query, you need to do a manual collection merging:

Post post = entityManager.createQuery("""
    select p
    from Post p
    join fetch p.comments c
    where p.id = :id
    order by c.id
    """, Post.class)
.setParameter("id", 1L)
.getSingleResult();

List<PostComment> removedComments = new ArrayList<>(
    post.getComments()
);
removedComments.removeAll(comments);

for(PostComment removedComment : removedComments) {
    post.removeComment(removedComment);
}

List<PostComment> newComments = new ArrayList<>(comments);
newComments.removeAll(post.getComments());

comments.removeAll(newComments);

for(PostComment existingComment : comments) {
    existingComment.setPost(post);
    
    PostComment mergedComment = entityManager
    .merge(existingComment);
    
    post.getComments().set(
        post.getComments().indexOf(mergedComment), 
        mergedComment
    );
}

for(PostComment newComment : newComments) {
    post.addComment(newComment);
}

And, this time, Hibernate is going to execute a single SELECT statement:

SELECT 
    p.id as id1_0_0_, 
    pc.id as id1_1_1_, 
    p.title as title2_0_0_, 
    pc.post_id as post_id3_1_1_, 
    pc.review as review2_1_1_, 
    pc.post_id as post_id3_1_0__, 
    pc.id as id1_1_0__ 
FROM post p 
INNER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'The last part is about jOOQ and how to get the most of your relational database.', 
    5
)]
UPDATE 
    post_comment 
SET 
    post_id = 1, 
    review = 'The JDBC part is a must-have!' 
WHERE 
    id = 1
    
DELETE FROM post_comment 
WHERE id = 3

The manual merging strategy might look complicated at first glance, but it’s not really complicated at all. Basically, the manual merging strategy must execute the following steps:

  • It must remove the existing database records that are no longer found in the incoming collection.
  • It must update the existing database records which can be found in the incoming collection.
  • It must add the records found in the incoming collection, which cannot be found in the current database snapshot.

Merge collections via the parent JPA and Hibernate entity

The previous examples were more challenging because the child entity collection was processed independently of the parent entity.

Now, if we can fetch the parent entity along with its associated child entity collection, as illustrated by the following code snippet:

Post post = fetchPostWithComments(1L);

Where the fetchPostWithComments method looks like this:

public Post fetchPostWithComments(Long postId) {
    return doInJPA(entityManager -> {
        return entityManager.createQuery("""
            select p
            from Post p
            join fetch p.comments
            where p.id = :postId
            """, Post.class)
        .setHint(AvailableHints.HINT_READ_ONLY, true)
        .setParameter("postId", postId)
        .getSingleResult();
    });
}

The READONLY JPA query hint is used to ensure that the entities are fetched in read-only mode.

After we fetched the Post entity, we can simply apply the same collection modification logic we used for the previous examples:

post.getComments().get(0).setReview("The JDBC part is a must-have!");

post.removeComment(post.getComments().get(2));

post.addComment(
    new PostComment()
    .setReview(
        "The last part is about jOOQ and " +
        "how to get the most of your relational database."
    )
);

At this point, the Post entity and its PostComment child entities are in the detached state, so we need to merge the entity aggregate back to an open Persistence Context.

To do so, we just have to merge the Post entity, and, since the comments association uses the CascadeType.ALL entity state transition cascading strategy, the merge operation is going to be cascaded to the comments association as well:

doInJPA(entityManager -> {
    entityManager.merge(post);
});

When running this Post entity merge example, we can see that Hibernate executes the proper SQL statements:

SELECT 
    p.id as id1_0_0_, 
    pc.id as id1_1_1_, 
    p.title as title2_0_0_, 
    pc.post_id as post_id3_1_1_, 
    pc.review as review2_1_1_, 
    pc.post_id as post_id3_1_0__, 
    pc.id as id1_1_0__ 
FROM post p 
INNER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'The last part is about jOOQ and how to get the most of your relational database.', 
    5
)]
UPDATE 
    post_comment 
SET 
    post_id = 1, 
    review = 'The JDBC part is a must-have!' 
WHERE 
    id = 1
    
DELETE FROM post_comment 
WHERE id = 3

Awesome, isn’t it!

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

As you could see, merging entity collections requires you to understand how JPA and Hibernate work under the hood. While the automatic merging strategy offered by Hibernate is convenient, it’s not as efficient as the manual merging alternative, especially when merging multiple collections.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.