Query pagination with JPA and Hibernate

(Last Updated On: October 9, 2018)

Introduction

Inspired by this StackOverflow answer I gave recently, I decided it’s time to write an article about query pagination when using JPA and Hibernate.

In this article, you are going to see how to use query pagination to restrict the JDBC ResultSet size and avoid fetching more data than necessary.

Domain Model

Now, let’s assume we defined the following Post and PostComment entity classes in our application:

Post and PostComment entities

The Post class is the parent entity while the PostComment is the child as it has a @ManyToOne association with the Post entity. Both entities implement the Identifiable interface which provides a contract for accessing the underlying entity identifier.

Next, we are going to save the following Post and PostComment entities in the database:

LocalDateTime timestamp = LocalDateTime.of(
    2018, 10, 9, 12, 0, 0, 0
);

int commentsSize = 5;

LongStream.range(1, 50).forEach(postId -> {
    Post post = new Post();
    post.setId(postId);
    post.setTitle(
        String.format("Post nr. %d", postId)
    );
    post.setCreatedOn(
         Timestamp.valueOf(
            timestamp.plusMinutes(postId)
        )
    );

    LongStream.range(1, commentsSize + 1).forEach(commentOffset -> {
        long commentId = ((postId - 1) * commentsSize) + commentOffset;

        PostComment comment = new PostComment();        
        comment.setId(commentId);
        comment.setReview(
            String.format("Comment nr. %d", comment.getId())
        );
        comment.setCreatedOn(
            Timestamp.valueOf(
                timestamp.plusMinutes(commentId)
            )
        );

        post.addComment(comment);

    });
    
    entityManager.persist(post);
});

Limiting the result set size

To limit the underlying query ResultSet size, the JPA Query interface provides the setMaxResults method.

Therefore, when executing the following JPQL query:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "order by p.createdOn ")
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());
assertEquals("Post nr. 1", posts.get(0).getTitle());
assertEquals("Post nr. 10", posts.get(9).getTitle());

Hibernate generates the following SQL statement on PostgreSQL:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
LIMIT 10

On SQL Server 2012 (or newer), Hibernate will execute the following SQL query:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY

Therefore, the SQL pagination query is adapted to the underlying database engine capabilities.

The use of ORDER BY is mandatory when using query pagination because SQL does not guarantee any particular order unless we provide one via the ORDER BY clause.

Using offset to position the result set

If the previous query was typical for the first page of a given pagination query, navigating the following page requires positioning the result set where the last page ended. For this purpose, the JPA Query interface provides the setFirstResult method.

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "order by p.createdOn ")
.setFirstResult(10)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());
assertEquals("Post nr. 11", posts.get(0).getTitle());
assertEquals("Post nr. 20", posts.get(9).getTitle());

When running the previous JPQL query on PostgreSQL, Hibernate executes the following SQL SELECT statements:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
LIMIT 10
OFFSET 10

and on SQL Server 2012 (or newer), Hibernate would generate this SQL query:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

DTO projection queries

The JPA query pagination is not limited to entity queries that return entities only. You can use it for DTO projections as well.

Assuming we have the following PostCommentSummary DTO:

public class PostCommentSummary {

    private Number id;
    private String title;
    private String review;

    public PostCommentSummary(
            Number id, 
            String title, 
            String review) {
        this.id = id;
        this.title = title;
        this.review = review;
    }

    public PostCommentSummary() {}

    //Getters omitted for brevity
}

When running the following DTO projection query:

List<PostCommentSummary> summaries = entityManager
.createQuery(
    "select new " +
    "   com.vladmihalcea.book.hpjp.hibernate.fetching.PostCommentSummary( " +
    "       p.id, p.title, c.review " +
    "   ) " +
    "from PostComment c " +
    "join c.post p " +
    "order by c.createdOn")
.setMaxResults(10)
.getResultList();

assertEquals(10, summaries.size());
assertEquals("Post nr. 1", summaries.get(0).getTitle());
assertEquals("Comment nr. 1", summaries.get(0).getReview());

assertEquals("Post nr. 2", summaries.get(9).getTitle());
assertEquals("Comment nr. 10", summaries.get(9).getReview());

Hibernate appends the pagination clause to the underlying SQL query:

SELECT p.id AS col_0_0_,
       p.title AS col_1_0_,
       c.review AS col_2_0_
FROM post_comment c
INNER JOIN post p ON c.post_id=p.id
ORDER BY c.created_on
LIMIT 10

For more details about DTO projection with JPA and Hibernate, check out this article.

Native SQL queries

The JPA query pagination is not limited to entity queries, such as JPQL or Criteria API. You can use it for native SQL queries as well.

List<Tuple> posts = entityManager
.createNativeQuery(
    "select p.id as id, p.title as title " +
    "from post p " +
    "order by p.created_on", Tuple.class)
.setFirstResult(10)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());
assertEquals("Post nr. 11", posts.get(0).get("title"));
assertEquals("Post nr. 20", posts.get(9).get("title"));

When running the SQL query above, Hibernate appends the DB-specific pagination clause:

SELECT p.id AS id,
       p.title AS title
FROM post p
ORDER BY p.created_on
LIMIT 10
OFFSET 10

JOIN FETCH and pagination

However, if we try to use the JOIN FETCH clause in the entity query while also using JPA pagination:

List<Post> posts = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "order by p.createdOn", Post.class)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());

Hibernate will issue the following warning message:

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

And the executed SQL query will lack the pagination clause:

SELECT p.id AS id1_0_0_,
       c.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       c.created_on AS created_2_1_1_,
       c.post_id AS post_id4_1_1_,
       c.review AS review3_1_1_,
       c.post_id AS post_id4_1_0__,
       c.id AS id1_1_0__
FROM post p
LEFT OUTER JOIN post_comment c ON p.id=c.post_id
ORDER BY p.created_on

This is because Hibernate wants to fetch entities fully along with their collections as indicated by the JOIN FETCH clause while the SQL-level pagination could truncate the ResultSet possibly leaving a parent Post entity with fewer elements in the comments collection.

The problem with the HHH000104 warning is that Hibernate will fetch the product of Post and PostComment entities, and due to the result set size, the query response time is going to be significant.

In order to work around this limitation, you have to use a Window Function query:

List<Post> posts = entityManager
.createNativeQuery(
    "select * " +
    "from (   " +
    "    select *, dense_rank() OVER (ORDER BY post_id) rank " +
    "    from (   " +
    "        select p.*, pc.* " +
    "        from post p  " +
    "        left join post_comment pc on p.id = pc.post_id  " +
    "        order by p.created_on " +
    "    ) p_pc " +
    ") p_pc_r " +
    "where p_pc_r.rank <= :rank", Post.class)
.setParameter("rank", 10)
.unwrap(NativeQuery.class)
.addEntity("p", Post.class)
.addEntity("pc", PostComment.class)
.setResultTransformer(DistinctPostResultTransformer.INSTANCE)
.getResultList();

For more details about using Window Functions to fix the HHH000104 issue as well as the code for DistinctPostResultTransformer, check out this article.

Why not use query streaming instead?

JPA 2.2 added the getResultStream Query method, which you might think it’s a valid alternative to pagination. However, the stream result will not provide the result set size to the query planner, hence a suboptimal execution plan might be picked. For this reason, it’s much more efficient to use pagination than streaming when it comes to fetching small amounts of data.

For more details about why pagination is more efficient than streaming, check out this article.

Keyset pagination

Markus Winand, who wrote the SQL Performance Explained book, advocates for Keyset pagination instead of Offset. Although Offset pagination is an SQL standard feature, there are two reasons why you would prefer Keyset pagination:

  • performance (the index must be scanned up to the offset while, for keyset pagination, we can go directly to the first index entry that matches our order by predicate and filtering criteria)
  • correctness (if elements are being added in between, offset pagination will not provide consistent reads)

Even if Hibernate does not support keyset pagination, you can use Blaze Persistence for this purpose.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Fetching just as much data as you need is one of the most important tips when it comes to data access performance. When fetching data, pagination allows you to control the result set size so that performance is stable even if the underlying dataset grows with time.

While keyset pagination offers better performance for large result sets, if you can narrow the scanned data set using the right filtering predicates, then offset pagination is going to perform rather well. To get consistent reads, you have to make sure that the scanned data set is always sorted in such a way that new entries are appended at the end of the set, rather than being mixed in between old entries.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

2 thoughts on “Query pagination with JPA and Hibernate

  1. From my experience using the setMaxResults(10) method still selects the whole dataset and then only returns the amount requested after wards. For MS SQL I would have expected for the query to be ‘select top(10) from table’ but that was not the case, which ended up being a performance hit. Does the driver still operate that way?

    1. Unless you had the JOIN FETCH plus pagination, that’s not true. The setMaxResults will use either TOP for older SQL Server Dialects or the newer SQL standard syntax for SQL Server 2012 and beyond.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.