The best way to fix the Hibernate MultipleBagFetchException

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

If you’ve been using Hibernate for some time, there is a good chance you bumped into a MultipleBagFetchException issue:

org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags

In this article, we are going to see the reason Hibernate throws the MultipleBagFetchException as well as the best way to solve this issue.

Domain Model

Let’s consider that our application defines three entities: Post, PostComment, and Tag, which are associated as in the following diagram:

Multiple List Eager Fetching

What we are mostly interested in this article is that the Post entity defines a bidirectional @OneToMany association with the PostComment child entity, as well as a unidirectional @ManyToMany association with the Tag entity.

@OneToMany(
    mappedBy = "post", 
    cascade = CascadeType.ALL, 
    orphanRemoval = true
)
private List<PostComment> comments = new ArrayList<>();

@ManyToMany(
    cascade = {
        CascadeType.PERSIST, 
        CascadeType.MERGE
    }
)
@JoinTable(
    name = "post_tag",
    joinColumns = @JoinColumn(name = "post_id"),
    inverseJoinColumns = @JoinColumn(name = "tag_id")
)
private List<Tag> tags = new ArrayList<>();

The reason why the @ManyToMany association cascades only the PERSIST and MERGE entity state transitions and not the REMOVE one is because the other side is not a child entity.

Since the Tag entity lifecycle is not tied to the Post entity, cascading REMOVE or enabling the orphanRemoval mechanism would be a mistake. For more details about this topic, check out this article.

Hibernate throwing MultipleBagFetchException

Now, if we want to fetch the Post entities with the identifier values between 1 and 50, along with all their associated PostComment and Tag entities, we would write a query like the following one:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "left join fetch p.tags " +
    "where p.id between :minId and :maxId", Post.class)
.setParameter("minId", 1L)
.setParameter("maxId", 50L)
.getResultList();

However, when running the entity query above, Hibernate throws a MultipleBagFetchException while compiling the JPQL query:

org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags [
  com.vladmihalcea.book.hpjp.hibernate.fetching.Post.comments, 
  com.vladmihalcea.book.hpjp.hibernate.fetching.Post.tags
]

So, no SQL query is executed by Hibernate. The reason why a MultipleBagFetchException is thrown by Hibernate is that duplicates can occur, and the unordered List, which is called a bag in Hibernate terminology, is not supposed to remove duplicates.

How NOT to “fix” the Hibernate MultipleBagFetchException

If you google the MultipleBagFetchException, you are going to see many wrong answers, like this one on StackOverflow, which, surprisingly, has over 280 upvotes.

StackOverflow MultipleBagFetchException Wrong Answer

So simple, yet so wrong!

Using Set instead of List

So, let’s change the association collection type from List to Set:

@OneToMany(
    mappedBy = "post", 
    cascade = CascadeType.ALL, 
    orphanRemoval = true
)
private Set<PostComment> comments = new HashSet<>();

@ManyToMany(
    cascade = {
        CascadeType.PERSIST, 
        CascadeType.MERGE
    }
)
@JoinTable(
    name = "post_tag",
    joinColumns = @JoinColumn(name = "post_id"),
    inverseJoinColumns = @JoinColumn(name = "tag_id")
)
private Set<Tag> tags = new HashSet<>();

And, now, when rerunning the previous entity query which fetched some Post entities along with their comments and tags associations, we can see that no MultipleBagFetchException is thrown.

However, this is SQL query that Hibernate executed for the aforementioned JPQL query:

SELECT 
    p.id AS id1_0_0_,
    pc.id AS id1_1_1_,
    t.id AS id1_3_2_,
    p.title AS title2_0_0_,
    pc.post_id AS post_id3_1_1_,
    pc.review AS review2_1_1_,
    t.name AS name2_3_2_,
    pt.post_id AS post_id1_2_1__,
    pt.tag_id AS tag_id2_2_1__
FROM 
    post p
LEFT OUTER JOIN 
    post_comment pc ON p.id = pc.post_id
LEFT OUTER JOIN 
    post_tag pt ON p.id = pt.post_id
LEFT OUTER JOIN 
    tag t ON pt.tag_id = t.id
WHERE 
    p.id BETWEEN 1 AND 50

So, what’s wrong with this SQL query?

The post and post_comment are associated via the post_id Foreign Key column, so the join produces a result set containing all post table rows with the Primary Key values between 1 and 50 along with their associated post_comment table rows.

The post and tag tables are also associated via the post_id and tag_id post_tag Foreign Key columns, so these two joins produce a result set containing all post table rows with the Primary Key values between 1 and 50 along with their associated tag table rows.

Now, to merge the two result sets, the database can only use a Cartesian Product, so the final result set contains 50 post rows multiplied by the associated post_comment and tag table rows.

So, if we have 50 post rows associated with 20 post_comment and 10 tag rows, the final result set will contain 10_000 records (e.g., 50 x 20 x 10), as illustrated by the following test case:

List<Post> posts = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "left join fetch p.tags " +
    "where p.id between :minId and :maxId", Post.class)
.setParameter("minId", 1L)
.setParameter("maxId", 50L)
.getResultList();

assertEquals(
    POST_COUNT * POST_COMMENT_COUNT * TAG_COUNT, 
    posts.size()
);

That’s so terrible from a performance perspective!

How to fix the Hibernate MultipleBagFetchException

To avoid a Cartesian Product, you can fetch at most one association at a time. So, instead of executing a single JPQL query that fetches two associations, we can execute two JPQL queries instead:

List<Post> posts = doInJPA(entityManager -> {
    List<Post> _posts = entityManager
    .createQuery(
        "select distinct p " +
        "from Post p " +
        "left join fetch p.comments " +
        "where p.id between :minId and :maxId ", Post.class)
    .setParameter("minId", 1L)
    .setParameter("maxId", 50L)
    .setHint(QueryHints.PASS_DISTINCT_THROUGH, false)
    .getResultList();

    _posts = entityManager
    .createQuery(
        "select distinct p " +
        "from Post p " +
        "left join fetch p.tags t " +
        "where p in :posts ", Post.class)
    .setParameter("posts", _posts)
    .setHint(QueryHints.PASS_DISTINCT_THROUGH, false)
    .getResultList();

    return _posts;
});

assertEquals(POST_COUNT, posts.size());

for(Post post : posts) {
    assertEquals(POST_COMMENT_COUNT, post.getComments().size());
    assertEquals(TAG_COUNT, post.getTags().size());
}

The first JPQL query defines the main filtering criteria and fetches the Post entities along with the associated PostComment records.

The PASS_DISTINCT_THROUGH query hint allows you to avoid passing the DISTINCT keyword to the SQL statement, and only use it to remove Java entity duplicates caused by the parent-child joined result set. For more details about the PASS_DISTINCT_THROUGH query hint, check out this article.

Now, we have to fetch the Post entities along with their associated Tag entities, and, thanks to the Persistence Context, Hibernate will set the tags collection of the previously fetched Post entities.

Cool, right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

There are so many blog posts, videos, books, and forum answers, providing the wrong solution to the MultipleBagFetchException Hibernate issues. All these resources tell you that using a Set instead of a List is the right way to avoid this exception.

However, the MultipleBagFetchException tells you that a Cartesian Product might be generated, and, most of the time, that’s undesirable when fetching entities as it can lead to terrible data access performance issues.

The best way to fetch multiple entity collections with JPA and Hibernate is to load at most one collection at a time while relying on the Hibernate Persistence Context guarantee that only a single entity object can be loading at a time in a given JPA EntityManager or Hibernate Session.

Transactions and Concurrency Control eBook

21 Comments on “The best way to fix the Hibernate MultipleBagFetchException

  1. Hi Vlad,

    thanks for the great article. Confronted with this exception for the first time, I ran into a suggestion to use @Fetch(value = FetchMode.SUBSELECT) at one of the collections. Now for 2-3 months working with this approach, I have not faced any problems, admitably in a stage where the application is not so complex, but what is your expereince on using this approach? Are there any pitfalls when using the @Fetch Annotation as a solution for fetching multiple bags?

    I really appreciate your opinion.

    Thanks in advance!

    • The FetchMode.SUBSELECT can also be a solution to this problem, but accessing the properties just to trigger the extra SELECT is not ideal and the JIT might even remove the call if you don’t use the returning value.

      • Thanks Vlad, this is important information to have in mind!

  2. Great article! But how to handle that if a List is contained in another list. In your example that would be the Tag contains a list of Posts and the Post contains a list of PostComments. If I want to fetch all Tags how can I split the query?

    Also is it possible to do that with Criteria API?

    • Yes, it’s possible.

      select t
      from Tags t
      join fetch t.posts 
      

      Afterward, you execute:

      select p
      from Post p
      join fetch p.comments
      where p in :posts
      

      And, the posts is the List of all posts fetched for all Tags in the first query.

      List<Post> posts = tags.stream().map(Tag::posts)..flatMap(List::stream).collect(Collectors.toList());
      

      That can also be done with the Criteria API.

      • Thanks Vlad! Unfortunately I might need to fetch a little more than 2000 Posts and I will likely hit the limitation of the IN clause in Oracle. In this case would you rather convert lists to sets and get the cartesian product or omit the join fetch and live with the n+1 queries? Which one will perform better?

      • You shouldn’t need to fetch more than 100 entries. What are you going to do with all that data? It doesn’t fit in the UI. If you fetch to aggregate it, better do that in the DB. if it’s for a batch processing task, then use batching. Anyway, fetching a lot of data is a code smell.

      • It’s a rest service that provides data based on search criteria. It will be used not only from UI. I’m given the interface and told to implement it. Nothing I can change. In reality, when filtering is applied, data should be much less but I must be prepared for the worst case scenario.

    • I was using that same approach in a previous project and was failing to reproducing on a new one. Searching over the net led me to your nice article.
      In my case, the problem was i didn’t precise that my method in the service was @Transactional.
      Consequence : new transaction around every query execution, persistence context cleared right after, and my two lists was then using totally different instances for the same rows in table.

      If anyone faces the same problem, take a look at that too.

      Thanks Vlad for your marvelous work.

      • huartgi: You are right, the same issue happened to me, I had two @OneToMany associations, so when I decided to go with two JPQL queries as suggested here, I got a LazyInitializationException since both queries were executed in separate SQL transactions and from what I understand, it means the JPA cache couldn’t reuse the result from the first query to merge it with the second. I think it’s a very important information to mention since the “doInJPA” is not an official method. In my case, I use the JpaRepository + the Custom so I have:

        @Transactional
        interface MyEntityRepository extends JpaRepository
        
        @Transactional(readOnly = true)
        interface MyEntityRepositoryCustom
        
        class MyEntityRepositoryCustomImpl
        

        This way the JpaRepository handles transaction with read & write, while the MyEntityRepositoryCustom is only for read.

        Vlad, I really appreciate your explanations, each articles give enough focus on a single topic, while going deep enough to understand the do & don’t. I think it would worth to talk a little bit about the best practices for JpaRepository and pitfalls, I see you mainly focus on your own home made solution of “doInJPA”, but many would instead use JpaRepository.

      • Thanks. JpaRepository is just a DAO. In the end, you have access to the JPA EntityManager so you can do whatever you want with it. In my articles, you can find explanations about Spring as well, like pitfalls.

        The doInJPA method is exactly what Spring does when executing a @Transcational block. If you execute methods in the context of two transactions, then it’s the same as if you executed multiple doInJPA methods. Since all the code is open-source, for both Spring and this repository, you can easily get a better understanding of what’s being done behind the scenes.

    • if i use FethType.Lazy the what will happen..? please tell me

  3. Thanks for your article, but what is the limit of item you can filter with the “in” statement ?

  4. Hi Vlad,

    That last part about doInJPA, along with the strategy for executing separate queries and combining the results was, for me, finding a needle in a haystack in terms of solutions for the Cartesian Product issue with an Entity that has multiple associations.

    It would be nice if there were some sort of platform support for this idea of segmenting the population of an entity. I ended up doing a similar thing (running a query for each association) and combining the results. But there is some risk of registering the Entity as having changed by doing this manually. It’s unexpected – to me anyway – that Hibernate would add the association data from the second query to Entities already in the session.

    Thanks for all of the great articles!

    • If you do this manually and assemble the results coming from secondary queries, you need to detach the entity, otherwise, Hibernate will think you are changing collections in case you overwrite the proxy collections with the one you fetched manually.

      However, my solution doesn’t have this issue as Hibernate knows that the entity we are fetching the second time is already managed, hence, it just populates its newly fetched collection.

  5. Thank you for this article!

    How about splitting up a query which is defined in an @Query-Annotation (Spring-Data)

    like

    @Query(“SELECT a FROM parent a LEFT JOIN FETCH a.sons LEFT JOIN FETCH a.daughters WHERE a.name=”Peter”)

    Is there a feasible way of implementing this?

    Thank you in advance.

    • The solution is the same one. You need to implement that method as K explained in this article. It will never work via annotations only.

    • Yes, of course. Use a query like this one:

      select p
      from Post p
      left join p.tags t
      where p.title like :title
      

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.