The best way to fix the Hibernate “HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!” warning message

(Last Updated On: January 4, 2018)

Introduction

If you’ve been using Hibernate long enough, then you surely must have seen this WARN log message when doing pagination while join-fetching multiple entities.

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

In this article, I’m going to show a very elegant solution to fix this issue.

Domain Model

Considering we have the following entities:

The Post entity has a bidirectional @OneToMany association with PostComment.

@OneToMany(mappedBy = "post", cascade = CascadeType.ALL)
private List<PostComment> comments = new ArrayList<>();

The problem

Now, consider the following JPQL query:

List<Post> posts = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title like :title " +
    "order by p.id", Post.class)
.setParameter("title", titlePattern)
.setMaxResults(maxResults)
.getResultList();

We want to fetch the Post entities along with their comments and limit the result set to a maximum number of entries.

When running the JPQL query above, Hibernate limits the number of Post entries, but it issues the aforementioned warning:

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

The reason why Hibernate does the pagination in memory as it cannot just truncate the result set using SQL-level pagination. If it did that, then the result set be truncated in the middle of the PostComments rows, therefore returning a Post entity with just a subset of comments.

Because Hibernate favors consistency, it fetches the entire result set and does the pagination in memory. However, that can be suboptimal, so what can we do about it?

The solution

As I already explained, Window Functions are the answer to many Query-related problems.

So, we just need to calculate the DENSE_RANK over the result set of post and post_comments that match our filtering criteria and restrict the output for the first N post entries only.

List<Post> posts = entityManager.createNativeQuery(
    "select p_pc_r.* " +
    "from (   " +
    "    select *, dense_rank() OVER (ORDER BY post_id) rank " +
    "    from (   " +
    "        select p.*, pc.* " +
    "        from post p  " +
    "        left join post_comment pc on p.id = pc.post_id  " +
    "        where p.title like :title " +
    "        order by p.id " +
    "    ) p_pc " +
    ") p_pc_r " +
    "where p_pc_r.rank <= :rank", Post.class)
.setParameter("title", titlePattern)
.setParameter("rank", maxResults)
.unwrap( NativeQuery.class )
.addEntity( "p", Post.class )
.addEntity( "pc", PostComment.class )
.setResultTransformer( DistinctPostResultTransformer.INSTANCE )
.getResultList();

Because this is a native SQL query, we need the DistinctPostResultTransformer to rebuild the entity graph from the tabular JDBC ResultSet:

public class DistinctPostResultTransformer 
        extends BasicTransformerAdapter {

    private static final DistinctPostResultTransformer INSTANCE  = 
            new DistinctPostResultTransformer();

    @Override
    public List transformList(List list) {
        Map<Serializable, Identifiable> identifiableMap = 
                new LinkedHashMap<>( list.size() );

        for ( Object entityArray : list ) {
            if ( Object[].class.isAssignableFrom( 
                    entityArray.getClass() ) ) {
                Post post = null;
                PostComment comment = null;

                Object[] tuples = (Object[]) entityArray;

                for ( Object tuple : tuples ) {
                    if(tuple instanceof Post) {
                        post = (Post) tuple;
                    }
                    else if(tuple instanceof PostComment) {
                        comment = (PostComment) tuple;
                    }
                    else {
                        throw new UnsupportedOperationException(
                            "Tuple " + tuple.getClass() + " is not supported!"
                        );
                    }
                }
                Objects.requireNonNull(post);
                Objects.requireNonNull(comment);

                if ( !identifiableMap.containsKey( post.getId() ) ) {
                    identifiableMap.put( post.getId(), post );
                    post.setComments( new ArrayList<>() );
                }
                post.addComment( comment );
            }
        }
        return new ArrayList<>( identifiableMap.values() );
    }
}

That’s it! This way, we can fetch both the Post and its PostComments with a single query.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Although some might fix this issue with two queries: one for selecting the Post entity identifiers for the current page and another one which fetches Post along with its comments by the previously fetched identifiers, it’s way more efficient to use a single query filtered by DENSE_RANK.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

6 thoughts on “The best way to fix the Hibernate “HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!” warning message

  1. What’s the point of using a framework like Hibernate and then write your DB specific query again? Why doesn’t Hibernate do this for me in a generic way?

    Your example gets complicate pretty quickly if you want to do it in a generic way for various different entities…

    1. Hibernate is an alternative to JDBC, not to SQL. Otherwise, why do you think there’s a createNativeQuery? The example is fairly simple in terms of SQL, and you don’t need to make everything generic, especially since this kind of query is business use case-oriented anyway.

  2. As I understand this native query forces hibernate to clear 2nd level caches for Post and PostComment, so performance gain is controversial especially if this query is very frequently called.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.