Query pagination with JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

Inspired by this StackOverflow answer I gave recently, I decided it’s time to write an article about query pagination when using JPA and Hibernate.

In this article, you are going to see how to use query pagination to restrict the JDBC ResultSet size and avoid fetching more data than necessary.

Domain Model

Now, let’s assume we defined the following Post and PostComment entity classes in our application:

Post and PostComment entities

The Post class is the parent entity while the PostComment is the child as it has a @ManyToOne association with the Post entity. Both entities implement the Identifiable interface which provides a contract for accessing the underlying entity identifier.

Next, we are going to save the following Post and PostComment entities in the database:

LocalDateTime timestamp = LocalDateTime.of(
    2018, 10, 9, 12, 0, 0, 0
);
 
LongStream.rangeClosed(1, 50)
.forEach(postId -> {
    Post post = new Post()
    .setId(postId)
    .setTitle(
        String.format("High-Performance Java Persistence - Chapter %d",
        postId)
    )
    .setCreatedOn(
        Timestamp.valueOf(timestamp.plusMinutes(postId))
    );
 
    LongStream.rangeClosed(1, COMMENT_COUNT)
    .forEach(commentOffset -> {
        long commentId = ((postId - 1) * COMMENT_COUNT) + commentOffset;
 
        post.addComment(
            new PostComment()
            .setId(commentId)
            .setReview(
                String.format("Comment nr. %d - A must read!", commentId)
            )
            .setCreatedOn(
                Timestamp.valueOf(timestamp.plusMinutes(commentId))
            )
        );
 
    });
 
    entityManager.persist(post);
});

Limiting the result set size

To limit the underlying query ResultSet size, the JPA Query interface provides the setMaxResults method.

Therefore, when executing the following JPQL query:

List<Post> posts = entityManager.createQuery("""
    select p
    from Post p
    order by p.createdOn
    """, Post.class)
.setMaxResults(10)
.getResultList();
 
assertEquals(10, posts.size());
assertEquals("Post nr. 1", posts.get(0).getTitle());
assertEquals("Post nr. 10", posts.get(9).getTitle());

Hibernate generates the following SQL statement on PostgreSQL:

SELECT p.id, p.created_on, p.title
FROM post p
ORDER BY p.created_on
LIMIT 10

On SQL Server 2012 (or newer), Hibernate will execute the following SQL query:

SELECT p.id, p.created_on, p.title
FROM post p
ORDER BY p.created_on
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY

Therefore, the SQL pagination query is adapted to the underlying database engine capabilities.

The use of ORDER BY is mandatory when using query pagination because SQL does not guarantee any particular order unless we provide one via the ORDER BY clause.

Using offset to position the result set

If the previous query was typical for the first page of a given pagination query, navigating the following page requires positioning the result set where the last page ended. For this purpose, the JPA Query interface provides the setFirstResult method.

List<Post> posts = entityManager.createQuery("""
    select p
    from Post p
    order by p.createdOn
    """, Post.class)
.setFirstResult(10)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());
assertEquals("Post nr. 11", posts.get(0).getTitle());
assertEquals("Post nr. 20", posts.get(9).getTitle());

When running the previous JPQL query on PostgreSQL, Hibernate executes the following SQL SELECT statements:

SELECT p.id, p.created_on, p.title
FROM post p
ORDER BY p.created_on
LIMIT 10
OFFSET 10

and on SQL Server 2012 (or newer), Hibernate would generate this SQL query:

SELECT p.id, p.created_on, p.title
FROM post p
ORDER BY p.created_on
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

DTO projection queries

The JPA query pagination is not limited to entity queries that return entities only. You can use it for DTO projections as well.

Assuming we have the following PostCommentSummary DTO:

public class PostCommentSummary {
 
    private Number id;
    private String title;
    private String review;
 
    public PostCommentSummary(
            Number id, 
            String title, 
            String review) {
        this.id = id;
        this.title = title;
        this.review = review;
    }
 
    public PostCommentSummary() {}
 
    //Getters omitted for brevity
}

When running the following DTO projection query:

List<PostCommentSummary> summaries = entityManager.createQuery("""
    select new
       com.vladmihalcea.hpjp.hibernate.fetching.PostCommentSummary(
           p.id, p.title, c.review
       )
    from PostComment c
    join c.post p
    order by c.createdOn
    """)
.setMaxResults(10)
.getResultList();

assertEquals(10, summaries.size());
assertEquals("Post nr. 1", summaries.get(0).getTitle());
assertEquals("Comment nr. 1", summaries.get(0).getReview());

assertEquals("Post nr. 2", summaries.get(9).getTitle());
assertEquals("Comment nr. 10", summaries.get(9).getReview());

Hibernate appends the pagination clause to the underlying SQL query:

SELECT p.id, p.title, c.review
FROM post_comment c
INNER JOIN post p ON c.post_id=p.id
ORDER BY c.created_on
LIMIT 10

For more details about DTO projection with JPA and Hibernate, check out this article.

Native SQL queries

The JPA query pagination is not limited to entity queries, such as JPQL or Criteria API. You can use it for native SQL queries as well.

List<Tuple> posts = entityManager.createNativeQuery("""
    SELECT
       p.id AS id,
       p.created_on AS created_on,
       p.title AS title
    FROM post p
    ORDER BY p.created_on
    """, Tuple.class)
.setFirstResult(10)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());
assertEquals("Post nr. 11", posts.get(0).get("title"));
assertEquals("Post nr. 20", posts.get(9).get("title"));

When running the SQL query above, Hibernate appends the DB-specific pagination clause:

SELECT
    p.id AS id,
    p.created_on AS created_on,
    p.title AS title
FROM post p
ORDER BY p.created_on
LIMIT 10
OFFSET 10

JOIN FETCH and pagination

However, if we try to use the JOIN FETCH clause in the entity query while also using JPA pagination:

List<Post> posts = entityManager.createQuery("""
    select p
    from Post p
    left join fetch p.comments
    order by p.createdOn
    """, Post.class)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());

Hibernate will issue the following warning message:

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

And the executed SQL query will lack the pagination clause:

SELECT 
    p.id, p.created_on, p.title,
    pc.id, pc.post_id, pc.created_on, pc.review
FROM post p
LEFT OUTER JOIN post_comment pc ON p.id = pc.post_id
ORDER BY p.created_on

This is because Hibernate wants to fetch entities fully along with their collections as indicated by the JOIN FETCH clause while the SQL-level pagination could truncate the ResultSet possibly leaving a parent Post entity with fewer elements in the comments collection.

The problem with the HHH000104 warning is that Hibernate will fetch the product of Post and PostComment entities, and due to the result set size, the query response time is going to be significant.

In order to work around this limitation, you have to use a Window Function query:

List<Post> posts = entityManager.createQuery("""
    select p
    from Post p
    left join fetch p.comments pc
    where p.id in (
        select id
        from (
            select 
                id as id,
                dense_rank() over (
                    order by createdOn ASC
                ) as ranking
            from Post
            where title like :titlePattern 
        ) pr
        where ranking <= :rank
    )
    """, Post.class)
.setParameter(
    "titlePattern", 
    "High-Performance Java Persistence %"
)
.setParameter("rank", 10)
.getResultList();

For more details about using Window Functions to fix the HHH000104 issue, check out this article.

Why not use query streaming instead?

JPA 2.2 added the getResultStream Query method, which you might think it’s a valid alternative to pagination. However, the stream result will not provide the result set size to the query planner, hence a suboptimal execution plan might be picked. For this reason, it’s much more efficient to use pagination than streaming when it comes to fetching small amounts of data.

For more details about why pagination is more efficient than streaming, check out this article.

Keyset pagination

Although Offset pagination is an SQL standard feature, there are two reasons why you would prefer Keyset pagination:

  • performance (the index must be scanned up to the offset while, for keyset pagination, we can go directly to the first index entry that matches our order by predicate and filtering criteria)
  • correctness (if elements are being added in between, offset pagination will not provide consistent reads)

For more details about how you can use Keyset Pagination with JPA and Hibernate, check out this article.

I cover this topic in my High-Performance SQL training.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Fetching just as much data as you need is one of the most important tips when it comes to data access performance. When fetching data, pagination allows you to control the result set size so that performance is stable even if the underlying dataset grows with time.

While keyset pagination offers better performance for large result sets, if you can narrow the scanned data set using the right filtering predicates, then offset pagination is going to perform rather well. To get consistent reads, you have to make sure that the scanned data set is always sorted in such a way that new entries are appended at the end of the set, rather than being mixed in between old entries.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.