How to merge entity collections with JPA and Hibernate

(Last Updated On: October 31, 2018)

Introduction

In this article, you are going to learn why overwriting entity collections is an anti-pattern and how you can merge collections both effectively and efficiently with JPA and Hibernate.

The reason I decided to write this article is that I’ve been this question asked over and over and over again.

Domain Model

For the upcoming tests, we are going to use the following Post and PostComment entities which have a bidirectional @OneToMany relationship:

How to merge entity collections with JPA and Hibernate Domain Model

As explained in this article, you should prefer bidirectional associations since they more efficient than unidirectional ones in terms of SQL performance.

The Post entity is mapped as follows:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    @OneToMany(
        mappedBy = "post", 
        cascade = CascadeType.ALL, 
        orphanRemoval = true
    )
    private List<PostComment> comments = new ArrayList<>();

    public Long getId() {
        return id;
    }

    public Post setId(Long id) {
        this.id = id;
        return this;
    }

    public String getTitle() {
        return title;
    }

    public Post setTitle(String title) {
        this.title = title;
        return this;
    }

    public List<PostComment> getComments() {
        return comments;
    }

    private Post setComments(List<PostComment> comments) {
        this.comments = comments;
        return this;
    }

    public Post addComment(PostComment comment) {
        comments.add(comment);
        comment.setPost(this);

        return this;
    }

    public Post removeComment(PostComment comment) {
        comments.remove(comment);
        comment.setPost(null);

        return this;
    }
}

What’s worth noting is that we employed a Fluent API for our entities since Hibernate supports just fine.

The PostComment looks this:

@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {

	@Id
	@GeneratedValue
	private Long id;

	private String review;

	@ManyToOne(fetch = FetchType.LAZY)
	private Post post;

	public PostComment() {
	}

	public PostComment(String review) {
		this.review = review;
	}

	public Long getId() {
		return id;
	}

	public PostComment setId(Long id) {
		this.id = id;
		return this;
	}

	public String getReview() {
		return review;
	}

	public PostComment setReview(String review) {
		this.review = review;
		return this;
	}

	public Post getPost() {
		return post;
	}

	public PostComment setPost(Post post) {
		this.post = post;
		return this;
	}

	@Override
	public boolean equals(Object o) {
		if (this == o) 
			return true;
		if (!(o instanceof PostComment)) 
			return false;
		return 
			id != null && 
			id.equals(((PostComment) o).id);
	}

	@Override
	public int hashCode() {
		return 31;
	}
}

The @ManyToOne association uses lazy loading because the default EAGER fetching strategy is almost always a bad idea.

While implementing equals and hashCode based on a natural identifier is the best approach, in case your entity lacks a natural identifier, you can implement equals and hashCode based on the entity identifier but only if the hashCode yields a constant value and the entity identifier is checked for equality only when it’s not null. For more details, check out this article.

Modifying a collection of entities

Now, let’s assume we have persisted the following entities in the database:

doInJPA(entityManager -> {
    entityManager.persist(
        new Post()
        .setId(1L)
        .setTitle(
            "High-Performance Java Persistence
        ")
    );
});

doInJPA(entityManager -> {
    entityManager
    .find(Post.class, 1L)
    .addComment(
        new PostComment()
        .setReview("JDBC section is a must read!")
    )
    .addComment(
        new PostComment()
        .setReview("The book size is larger than usual.")
    )
    .addComment(
        new PostComment()
        .setReview("Just half-way through.")
    )
    .addComment(
        new PostComment()
        .setReview("The book has over 450 pages.")
    );
});

As you can see, the fluent API is really convenient and worth considering.

Testing time

Now, assuming the client needs to fetch a List of PostComment entities associated with a given Post record:

List<PostComment> comments = fetchPostComments(1L);

The fetchPostComments method looks like this:

public List<PostComment> fetchPostComments(Long postId) {
    return doInJPA(entityManager -> {
        return entityManager.createQuery(
            "select pc " +
            "from PostComment pc " +
            "join pc.post p " +
            "where p.id = :postId " +
            "order by pc.id", PostComment.class)
        .setParameter("postId", postId)
        .getResultList();
    });
}

After fetching the comments, the client needs to apply some modifications to the PostComment entities:

comments.get(0)
.setReview("The JDBC part is a must have!");

comments.remove(2);

comments.add(
    new PostComment()
    .setReview(
        "The last part is about jOOQ and " +
        "how to get the most of your relational database."
    )
);

The review property of the first PostComment is going to be changed. The third comment is removed from the comments collection, and a new PostComment is added at the end of the List.

Now, the user wants to push these changes to the database. However, the PostComment entities in the comments collection are in the detached state so we need to merge them on a managed Post entity.

The collection overwrite anti-pattern

The first solution that comes to many developers is to overwrite the comments property of the Post entity with the detached one sent by the client:

Post post = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "join fetch p.comments " +
    "where p.id = :id", Post.class)
.setParameter("id", 1L)
.getSingleResult();

post.setComments(comments);

However, when running this test case above, Hibernate throws the following exception:

Caused by: org.hibernate.PersistentObjectException: 
detached entity passed to persist: 
com.vladmihalcea.book.hpjp.hibernate.association.BidirectionalOneToManyMergeTest$PostComment

When realizing that mixing managed entities (e.g. Post) with detached ones (e.g. incoming comments) is a bad idea, the developer will attempt something like this:

Post post = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "join fetch p.comments " +
    "where p.id = :id", Post.class)
.setParameter("id", 1L)
.getSingleResult();

entityManager.detach(post);
post.setComments(comments);
entityManager.merge(post);

And Hibernate is going to execute the following SQL INSERT statements:

SELECT 
    p.id as id1_0_0_, 
    pc.id as id1_1_1_, 
    p.title as title2_0_0_, 
    pc.post_id as post_id3_1_1_, 
    pc.review as review2_1_1_, 
    pc.post_id as post_id3_1_0__, 
    pc.id as id1_1_0__ 
FROM post p 
INNER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

SELECT 
    p.id as id1_0_1_, 
    p.title as title2_0_1_, 
    pc.post_id as post_id3_1_3_, 
    pc.id as id1_1_3_, 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_ 
FROM post p 
LEFT OUTER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    NULL(BIGINT), 
    'The last part is about jOOQ and how to get the most of your relational database.', 
    5
)]
UPDATE 
    post_comment 
SET 
    post_id = 1, 
    review = 'The JDBC part is a must have!' 
WHERE 
    id = 1
    
DELETE FROM post_comment 
WHERE id = 3

However, the verifyResults method will fail because, instead of 4 PostComment entities, only 3 will be returned from the database:

Post post = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "join fetch p.comments c " +
    "where p.id = :id " +
    "order by c.id", Post.class)
.setParameter("id", 1L)
.getSingleResult();

assertEquals(4, post.getComments().size());

assertEquals(
    "The JDBC part is a must have!",
    post.getComments().get(0).getReview()
);

assertEquals(
    "The book size is larger than usual.",
    post.getComments().get(1).getReview()
);

assertEquals(
    "The book has over 450 pages.",
    post.getComments().get(2).getReview()
);

assertEquals(
    "The last part is about jOOQ and how to get the most of your relational database.",
    post.getComments().get(3).getReview()
);

If you take a look on the previously executed SQL statement, you will notice that the post_id was set to NULL when inserting the new post_comment row, hence Hibernate is going to skip this record hen fetching the PostComment entities associated to our parent Post entity.

Unfortunately, without tests, this issue will be probably caught in production, and it will also require fixing production data.

The collection overwrite merge fix

The previous example issue came from not synchronizing both ends of the bidirectional association.

After acknowledging the problem, the merging method can be rewritten like this:

Post post = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "join fetch p.comments " +
    "where p.id = :id", Post.class)
.setParameter("id", 1L)
.getSingleResult();

entityManager.detach(post);

post.getComments().clear();
for (PostComment comment : comments) {
    post.addComment(comment);
}

entityManager.merge(post);

Now, the checks will pass, but HIbernate generated the following SQL statements:

SELECT 
    p.id as id1_0_0_, 
    pc.id as id1_1_1_, 
    p.title as title2_0_0_, 
    pc.post_id as post_id3_1_1_, 
    pc.review as review2_1_1_, 
    pc.post_id as post_id3_1_0__, 
    pc.id as id1_1_0__ 
FROM post p 
INNER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

SELECT 
    p.id as id1_0_1_, 
    p.title as title2_0_1_, 
    pc.post_id as post_id3_1_3_, 
    pc.id as id1_1_3_, 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_ 
FROM post p 
LEFT OUTER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'The last part is about jOOQ and how to get the most of your relational database.', 
    5
)]
UPDATE 
    post_comment 
SET 
    post_id = 1, 
    review = 'The JDBC part is a must have!' 
WHERE 
    id = 1
    
DELETE FROM post_comment 
WHERE id = 3

There are 2 SELECT statements executed instead of 1. While the former query was generated due to the JPQL query which we used to fetch the Post entity along with its associated comments collection, the latter is generated by the merge method call which works as follows:

  • A new entity snapshot is fetched from the database.
  • The detached state is copied onto the newly fetched entity.
  • The managed entity with its modified data can now be synchronized with the database when the Persistence Context is flushed.

The manual collection merge strategy

If you want to avoid the secondary SQL query, you need to do a manual collection merging:

Post post = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "join fetch p.comments " +
    "where p.id = :id", Post.class)
.setParameter("id", 1L)
.getSingleResult();

List<PostComment> removedComments = new ArrayList<>(
    post.getComments()
);
removedComments.removeAll(comments);

for(PostComment removedComment : removedComments) {
    post.removeComment(removedComment);
}

List<PostComment> newComments = new ArrayList<>(comments);
newComments.removeAll(post.getComments());

comments.removeAll(newComments);

for(PostComment existingComment : comments) {
    existingComment.setPost(post);
    
    PostComment mergedComment = entityManager
    .merge(existingComment);
    
    post.getComments().set(
        post.getComments().indexOf(mergedComment), 
        mergedComment
    );
}

for(PostComment newComment : newComments) {
    post.addComment(newComment);
}

And, this time, Hibernate is going to execute a single SELECT statement:

SELECT 
    p.id as id1_0_0_, 
    pc.id as id1_1_1_, 
    p.title as title2_0_0_, 
    pc.post_id as post_id3_1_1_, 
    pc.review as review2_1_1_, 
    pc.post_id as post_id3_1_0__, 
    pc.id as id1_1_0__ 
FROM post p 
INNER JOIN post_comment pc ON p.id = pc.post_id 
WHERE p.id = 1

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'The last part is about jOOQ and how to get the most of your relational database.', 
    5
)]
UPDATE 
    post_comment 
SET 
    post_id = 1, 
    review = 'The JDBC part is a must have!' 
WHERE 
    id = 1
    
DELETE FROM post_comment 
WHERE id = 3

The manual merging strategy might look complicated at first glance, but it’s not really complicated at all. Basically, the manual merging strategy must execute the following steps:

  • It must remove the existing database records that are no longer found in the incoming collection.
  • It must update the existing database records which can be found in the incoming collection.
  • It must add the records found in the incoming collection, which cannot be found in the current database snapshot.

That’s it!

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

As you could see, merging entity collections requires you to understand how JPA works under the hood. While the automatic merging strategy offered by Hibernate is convenient, it’s not as efficient as the manual merging alternative, especially when merging multiple collections.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.