The best way to fix the Hibernate “HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!” warning message

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

If you’ve been using Hibernate long enough, then you surely must have seen this WARN log message when doing pagination while join-fetching multiple entities.

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

In this article, I’m going to show two ways you can to fix this issue.

Domain Model

Considering we have the following entities:

Entity Fetching Pagination entities

The Post entity has a bidirectional @OneToMany association with the PostComment child entity.

Notice that both entities use the Fluent-style API. For more details about building entities using a Fluent-style API with JPA and Hibernate, check out this article.

Now, let’s assume we create 50 Post entities, each one with several PostComment child entities.

LocalDateTime timestamp = LocalDateTime
.of(
    2018, 10, 9, 12, 0, 0, 0
);

LongStream
.range(1, 10)
.forEach(postId -> {
    Post post = new Post()
    .setId(postId)
    .setTitle(
        String.format(
            "High-Performance Java Persistence - Chapter %d", 
            postId
        )
    )
    .setCreatedOn(
         Timestamp.valueOf(
            timestamp.plusMinutes(postId)
        )
    );

    LongStream
    .range(1, COMMENT_COUNT + 1)
    .forEach(commentOffset -> {
        long commentId = 
            ((postId - 1) * COMMENT_COUNT) + commentOffset;

        post.addComment(
            new PostComment()
            .setId(commentId)
            .setReview(
                String.format(
                    "Comment nr. %d - A must read!", 
                    commentId
                )
            )
            .setCreatedOn(
                Timestamp.valueOf(
                    timestamp.plusMinutes(commentId)
                )
            )
        );

    });

    entityManager.persist(post);
});

The problem

We want to fetch all Post entities whose titles match a given pattern. However, we want to fetch the associated PostComment entities as well.

As I explained in this article, you might be tempted to use a JPA pagination query to fetch the Post entities while also join fetching the PostComment entities as illustrated by the following JPQL query:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title like :titlePattern " +
    "order by p.createdOn", Post.class)
.setParameter(
    "titlePattern", 
    "High-Performance Java Persistence %"
)
.setMaxResults(5)
.getResultList();

assertEquals(5, posts.size());

We want to fetch the Post entities along with their comments and limit the result set to a maximum number of entries.

When running the JPQL query above, Hibernate limits the number of Post entries, but it issues the aforementioned warning while executing the following SQL query:

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

SELECT 
    p.id AS id1_0_0_,
    pc.id AS id1_1_1_,
    p.created_on AS created_2_0_0_,
    p.title AS title3_0_0_,
    pc.created_on AS created_2_1_1_,
    pc.post_id AS post_id4_1_1_,
    pc.review AS review3_1_1_,
    pc.post_id AS post_id4_1_0__,
    pc.id AS id1_1_0__
FROM 
     post p
LEFT OUTER JOIN 
     post_comment pc ON p.id=pc.post_id
WHERE 
     p.title LIKE 'High-Performance Java Persistence %'
ORDER BY 
     p.created_on

Notice that the SQL query uses no pagination whatsoever. As mentioned by the HHH000104 warning message, the pagination is done in memory, which is bad.

The reason why Hibernate does the pagination in memory is that it cannot just truncate the result set using SQL-level pagination. If it did that, then the result set would be truncated in the middle of the PostComments rows, therefore returning a Post entity with just a subset of comments.

Because Hibernate favors consistency, it fetches the entire result set and does the pagination in memory. However, that can be suboptimal, so what can we do about it?

Fixing the issue with two SQL queries that can fetch entities in read-write mode

The easiest way to fix this issue is to execute two queries:

. The first query will fetch the Post entity identifiers matching the provided filtering criteria.
. The second query will use the previously extracted Post entity identifiers to fetch the Post and the PostComment entities.

This approach is very easy to implement and looks as follows:

List<Long> postIds = entityManager
.createQuery(
    "select p.id " +
    "from Post p " +
    "where p.title like :titlePattern " +
    "order by p.createdOn", Long.class)
.setParameter(
    "titlePattern", 
    "High-Performance Java Persistence %"
)
.setMaxResults(5)
.getResultList();

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.id in (:postIds)", Post.class)
.setParameter("postIds", postIds)
.setHint(
    "hibernate.query.passDistinctThrough", 
    false
)
.getResultList();

assertEquals(5, posts.size());

Post post1 = posts.get(0);

List<PostComment> comments = post1.getComments();

for (int i = 0; i < COMMENT_COUNT - 1; i++) {
    PostComment postComment1 = comments.get(i);

    assertEquals(
        String.format(
            "Comment nr. %d - A must read!",
            i + 1
        ),
        postComment1.getReview()
    );
}

Notice the hibernate.query.passDistinctThrough JPA query hint we used to instruct Hibernate to prevent passing the JPQL DISTINCT keyword to the underlying SQL query. For more details about this query hint, check out this article.

When executing the two JPQL queries above, Hibernate generates the following SQL queries:

Query:["
    SELECT 
        p.id AS col_0_0_
    FROM 
        post p
    WHERE 
        p.title LIKE ?
    ORDER BY 
        p.created_on
    LIMIT 
        ?
"], 
Params:[(
    'High-Performance Java Persistence %', 5
)]

Query:["
    SELECT 
        p.id AS id1_0_0_,
        pc.id AS id1_1_1_,
        p.created_on AS created_2_0_0_,
        p.title AS title3_0_0_,
        pc.created_on AS created_2_1_1_,
        pc.post_id AS post_id4_1_1_,
        pc.review AS review3_1_1_,
        pc.post_id AS post_id4_1_0__,
        pc.id AS id1_1_0__
    FROM 
        post p
    LEFT OUTER JOIN 
        post_comment pc ON p.id = pc.post_id
    WHERE 
        p.id IN (?, ?, ?, ?, ?)
"], 
Params:[(
    1, 2, 3, 4, 5
)]

That’s the easiest way to fix the issue causing the HHH000104 warning message.

Fixing the issue with one SQL query that can only fetch entities in read-only mode

As I already explained, Window Functions are the answer to many query-related problems.

So, we just need to calculate the DENSE_RANK over the result set of post and post_comments that match our filtering criteria and restrict the output for the first N post entries only.

For that, we need to define the following @NamedNativeQuery along with its associated @SqlResultSetMapping:

@NamedNativeQuery(
    name = "PostWithCommentByRank",
    query =
        "SELECT * " +
        "FROM (   " +
        "    SELECT *, dense_rank() OVER (ORDER BY \"p.created_on\", \"p.id\") rank " +
        "    FROM (   " +
        "        SELECT p.id AS \"p.id\", " +
        "               p.created_on AS \"p.created_on\", " +
        "               p.title AS \"p.title\", " +
        "               pc.id as \"pc.id\", " +
        "               pc.created_on AS \"pc.created_on\", " +
        "               pc.review AS \"pc.review\", " +
        "               pc.post_id AS \"pc.post_id\" " +
        "        FROM post p  " +
        "        LEFT JOIN post_comment pc ON p.id = pc.post_id " +
        "        WHERE p.title LIKE :titlePattern " +
        "        ORDER BY p.created_on " +
        "    ) p_pc " +
        ") p_pc_r " +
        "WHERE p_pc_r.rank <= :rank ",
    resultSetMapping = "PostWithCommentByRankMapping"
)
@SqlResultSetMapping(
    name = "PostWithCommentByRankMapping",
    entities = {
        @EntityResult(
            entityClass = Post.class,
            fields = {
                @FieldResult(name = "id", column = "p.id"),
                @FieldResult(name = "createdOn", column = "p.created_on"),
                @FieldResult(name = "title", column = "p.title"),
            }
        ),
        @EntityResult(
            entityClass = PostComment.class,
            fields = {
                @FieldResult(name = "id", column = "pc.id"),
                @FieldResult(name = "createdOn", column = "pc.created_on"),
                @FieldResult(name = "review", column = "pc.review"),
                @FieldResult(name = "post", column = "pc.post_id"),
            }
        )
    }
)

The @NamedNativeQuery fetches all Post entities matching the provided title along with their associated PostComment child entities. The DENSE_RANK Window Function is used to assign the rank for each Post and PostComment joined record so that we can later filter just the amount of Post records we are interested in fetching.

The SqlResultSetMapping provides the mapping between the SQL-level column aliases and the JPA entity properties that need to be populated.

Now, we can execute the PostWithCommentByRank @NamedNativeQuery:

List<Post> posts = entityManager
.createNamedQuery("PostWithCommentByRank")
.setParameter(
    "titlePattern", 
    "High-Performance Java Persistence %"
)
.setParameter(
    "rank", 
    5
)
.unwrap(NativeQuery.class)
.setResultTransformer(
    new DistinctPostResultTransformer(entityManager)
)
.getResultList();

assertEquals(5, posts.size());

Post post1 = posts.get(0);

List<PostComment> comments = post1.getComments();

for (int i = 0; i < COMMENT_COUNT - 1; i++) {
    PostComment postComment1 = comments.get(i);

    assertEquals(
        String.format(
            "Comment nr. %d - A must read!",
            i + 1
        ),
        postComment1.getReview()
    );
}

We used the READONLY JPA query hint to instruct Hibernate to discard the underlying entity detached state. For more details about this optimization, check out this article.

Now, by default, a native SQL query like the PostWithCommentByRank one would fetch the Post and the PostComment in the same JDBC row, so we will end up with an Object[] containing both entities.

However, we want to transform the tabular Object[] array into a tree of parent-child entities, and for this reason, we need to use the Hibernate ResultTransformer For more details about the ResultTransformer, check out this article.

The DistinctPostResultTransformer looks as follows:

public class DistinctPostResultTransformer 
        extends BasicTransformerAdapter {

    private final EntityManager entityManager;

    public DistinctPostResultTransformer(
            EntityManager entityManager) {
        this.entityManager = entityManager;
    }

    @Override
    public List transformList(
            List list) {
            
        Map<Serializable, Identifiable> identifiableMap = 
            new LinkedHashMap<>(list.size());
            
        for (Object entityArray : list) {
            if (Object[].class.isAssignableFrom(entityArray.getClass())) {
                Post post = null;
                PostComment comment = null;

                Object[] tuples = (Object[]) entityArray;

                for (Object tuple : tuples) {
                    if(tuple instanceof Identifiable) {
                        entityManager.detach(tuple);

                        if (tuple instanceof Post) {
                            post = (Post) tuple;
                        } 
                        else if (tuple instanceof PostComment) {
                            comment = (PostComment) tuple;
                        } 
                        else {
                            throw new UnsupportedOperationException(
                                "Tuple " + tuple.getClass() + " is not supported!"
                            );
                        }
                    }
                }

                if (post != null) {
                    if (!identifiableMap.containsKey(post.getId())) {
                        identifiableMap.put(post.getId(), post);
                        post.setComments(new ArrayList<>());
                    }
                    if (comment != null) {
                        post.addComment(comment);
                    }
                }
            }
        }
        return new ArrayList<>(identifiableMap.values());
    }
}

The DistinctPostResultTransformer must detach the entities being fetched because we are overwriting the child collection and we don’t want that to be propagated as an entity state transition:

post.setComments(new ArrayList<>());

Now, not only that we can fetch both the Post and its PostComments with a single query, but we can even later modify these entities and merge them back in a subsequent read-write transaction:

List<Post> posts = doInJPA(entityManager -> {
    return entityManager
    .createNamedQuery("PostWithCommentByRank")
    .setParameter(
        "titlePattern",
        "High-Performance Java Persistence %"
    )
    .setParameter(
        "rank",
        2
    )
    .unwrap(NativeQuery.class)
    .setResultTransformer(
        new DistinctPostResultTransformer(entityManager)
    )
    .getResultList();
});

assertEquals(2, posts.size());

Post post1 = posts.get(0);

post1.addComment(
    new PostComment()
    .setId((post1.getId() - 1) * COMMENT_COUNT)
    .setReview("Awesome!")
    .setCreatedOn(
        Timestamp.valueOf(LocalDateTime.now())
    )
);

Post post2 = posts.get(1);
post2.removeComment(post2.getComments().get(0));

doInJPA(entityManager -> {
    entityManager.merge(post1);
    entityManager.merge(post2);
});

And Hibernate will properly propagate the changes to the database:

INSERT INTO post_comment (
    created_on, 
    post_id, 
    review, 
    id
) 
VALUES (
    '2019-01-09 10:47:32.134', 
    1, 
    'Awesome!', 
    0
)

DELETE FROM post_comment 
WHERE id = 6

Awesome, right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

So, to fix the HHH000104 issue, you have two options. Either you execute two queries and fetch the entities in either read-write or read-only mode, or you use a single query with Window Functions to fetch the entities in read-only mode.

FREE EBOOK

27 Comments on “The best way to fix the Hibernate “HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!” warning message

  1. Hello,

    is there any way using criteria api instead of native query? These solutions cannot adopt the criteria api.

    Thanks

    • Of course, there is a way. Just take the first approach which uses 2 queries and write those 2 queries using Criteria API instead of JPQL. It’s as simple as that.

  2. Hi, thanks for article again

    However I must bring out nasty behavior with the first approach (fetch ids first) which has to-do with the ORM impedance mismatch.

    In my project specification is to create query which will fetch join related lazy oneToMany objects AND possible sort by the related objects attribute, the thing is that I will then get as many results (rows) per AggregateRoot as there are related objects in that AggregateRoot. Thus I set maxResults to 100 but might get less than 100 ids

    Any ideas about this one? Thanks

    • I have no idea what you are talking about. Send me the link to the StackOverflow question and provide more context about your problem.

      • Yes, sorry I have to be quite vague about the specific details, however in the end this kind of sql is generated
        select
        distinct aggregate0_.id as col_0_0_,
        relatedEntity_.name as col_6_0_
        from
        AggregateRoot aggregate0_
        inner join
        related_entity relatedEntity_
        on aggregate0_.id=relatedEntity_.aggregate_id
        order by
        col_6_0_ DESC,
        col_0_0_ offset 0 rows fetch next 3 rows only

        and the results will be
        col_0_0_ | col_6_0_
        1 | CCC
        1 | BBB
        2 | AAA

        So, what I wanted is to get 3 unique ids (to be used as parameter when actually querying the data), but ended up with only 2. I assume this is just the basic problem with ORM?

        Thanks!

      • Thank you very much, however I stumbled to very confusing exception when querying the actual data (entity’s id is of type Long)

        List myIds = idQuery.getResultList();
        System.out.println(“myIds ” + myIds); //printsOk

        Query dataQry = entityManager.createQuery(“SELECT distinct a from AggregateRoot a WHERE d.id in (:myIds)”);
        dataQry.setParameter(“myIds”, myIds); //throws IllegalArgumentException: Parameter value element [4] did not match expected type [java.lang.Long (n/a)]

        any idea about this one?

      • Maybe, the ids are fetched as BigInteger if you use PostgreSQL. A good old debug is what you need here.

    • Yes, thanks 🙂

      In fact it seems that IDs are BigIntegers, I just falsely relied too much on type signature on List myIds = idQuery.getResultList(); when in fact it is just List myIds;

  3. Hi Vlad,

    Am trying to improve my query performance by using the technique #1 suggested by you (splitting it into two queries) but my filter conditions are not just limited to the columns in parents table. People can also query based on columns in the child table. Lets say, I need to filter based on review in PostComment, would this still be part of query #1? If yes then will it be like this?

    select p.id ” +
    “from Post p JOIN p.comment” +
    “where p.title like :titlePattern ?

    Please advise

    • Then, you just include the conditions in the first query and make sure you use JOIN and not JOIN FETCH.

      • I also encounter the same exception
        I want to use your approach with 2 queries but the issue with in statement. I can only have 2100 IDs in IN due to sql server limitation. Any suggestion?

      • Fetching more than 2000 entities doesn’t sound right. Why do you need all that data for? It does not fit the UI. And for data processing, you are better off using batching anyway.

      • Fetching more than 2000 entities doesn’t sound right. Why do you need all that data for? It does not fit the UI. And for data processing, you are better off using batching anyway

        We are using spring for pagenation.
        I am not sure why is not just returning first records and than count. It wants to select all IN IDs. I guess it is done for total count.

      • Maybe it’s a Spring issue. You need to investigate it.

  4. Very good article, Vlad.

    Although you have said you used the READONLY JPA query hint, I think you forgot to use the QueryHints.HINT_READONLY in your code example.

    One question: I didn’t understand why you had to detach the entities in Result Transformer since you’re using Native Query. Is it really necessary?

    Thanks!

      • Got it, Vlad!

        Although I knew the annotation @SqlResultSetMapping I’ve never used it before, that’s why I didn’t know it was mainly used to mapping a SQL Native Query to managed entities. I thought it had been created to deal with custom SQL projections and non-entity classes.

        Now I understand why sometimes we may need to detach the entities before using them when using @SqlResultSetMapping.

        Thanks again!

  5. Hello this is a very good Blog Post :),

    i can see that you are detach the fetched entities, but i don’t get it why.

  6. I have a doubt. this solution only works fine, if and only if all Post entities instances contain at least one PostComment?

    • The Left Join and the fact that the child table is not used in the Where clause tell that it should work just fine even when no child is found.

      • but in your example there is an ambiguous column, the id column exists in both tables. according to my tests this will influence the PostComment mapping, the comments will have as Id the POST.Id instead of the correct value that would be comment.id.
        I’m also getting comments even in posts that should not have comments.

        😦

      • That’s the Primary Key and each table needs that. As for the SQL query, the test case is on GitHub and works just fine. Just fork my High-Performance Java Persistence GitHub repository and run the test in question.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.