Query pagination with JPA and Hibernate
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
Inspired by this StackOverflow answer I gave recently, I decided it’s time to write an article about query pagination when using JPA and Hibernate.
In this article, you are going to see how to use query pagination to restrict the JDBC ResultSet
size and avoid fetching more data than necessary.
How to use query pagination in #Hibernate to restrict the JDBC ResultSet size and avoid fetching more data than necessary. @vlad_mihalcea https://t.co/fkd8ne1mYj pic.twitter.com/Ca78OhlIP1
— Java (@java) October 12, 2018
Domain Model
Now, let’s assume we defined the following Post
and PostComment
entity classes in our application:
The Post
class is the parent entity while the PostComment
is the child as it has a @ManyToOne
association with the Post
entity. Both entities implement the Identifiable
interface which provides a contract for accessing the underlying entity identifier.
Next, we are going to save the following Post
and PostComment
entities in the database:
LocalDateTime timestamp = LocalDateTime.of( 2018, 10, 9, 12, 0, 0, 0 ); int commentsSize = 5; LongStream.range(1, 50).forEach(postId -> { Post post = new Post(); post.setId(postId); post.setTitle( String.format("Post nr. %d", postId) ); post.setCreatedOn( Timestamp.valueOf( timestamp.plusMinutes(postId) ) ); LongStream.range(1, commentsSize + 1).forEach(commentOffset -> { long commentId = ((postId - 1) * commentsSize) + commentOffset; PostComment comment = new PostComment(); comment.setId(commentId); comment.setReview( String.format("Comment nr. %d", comment.getId()) ); comment.setCreatedOn( Timestamp.valueOf( timestamp.plusMinutes(commentId) ) ); post.addComment(comment); }); entityManager.persist(post); });
Limiting the result set size
To limit the underlying query ResultSet
size, the JPA Query
interface provides the setMaxResults
method.
Therefore, when executing the following JPQL query:
List<Post> posts = entityManager .createQuery( "select p " + "from Post p " + "order by p.createdOn ") .setMaxResults(10) .getResultList(); assertEquals(10, posts.size()); assertEquals("Post nr. 1", posts.get(0).getTitle()); assertEquals("Post nr. 10", posts.get(9).getTitle());
Hibernate generates the following SQL statement on PostgreSQL:
SELECT p.id AS id1_0_, p.created_on AS created_2_0_, p.title AS title3_0_ FROM post p ORDER BY p.created_on LIMIT 10
On SQL Server 2012 (or newer), Hibernate will execute the following SQL query:
SELECT p.id AS id1_0_, p.created_on AS created_2_0_, p.title AS title3_0_ FROM post p ORDER BY p.created_on OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
Therefore, the SQL pagination query is adapted to the underlying database engine capabilities.
The use of
ORDER BY
is mandatory when using query pagination because SQL does not guarantee any particular order unless we provide one via theORDER BY
clause.
Using offset to position the result set
If the previous query was typical for the first page of a given pagination query, navigating the following page requires positioning the result set where the last page ended. For this purpose, the JPA Query
interface provides the setFirstResult
method.
List<Post> posts = entityManager .createQuery( "select p " + "from Post p " + "order by p.createdOn ") .setFirstResult(10) .setMaxResults(10) .getResultList(); assertEquals(10, posts.size()); assertEquals("Post nr. 11", posts.get(0).getTitle()); assertEquals("Post nr. 20", posts.get(9).getTitle());
When running the previous JPQL query on PostgreSQL, Hibernate executes the following SQL SELECT statements:
SELECT p.id AS id1_0_, p.created_on AS created_2_0_, p.title AS title3_0_ FROM post p ORDER BY p.created_on LIMIT 10 OFFSET 10
and on SQL Server 2012 (or newer), Hibernate would generate this SQL query:
SELECT p.id AS id1_0_, p.created_on AS created_2_0_, p.title AS title3_0_ FROM post p ORDER BY p.created_on OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY
DTO projection queries
The JPA query pagination is not limited to entity queries that return entities only. You can use it for DTO projections as well.
Assuming we have the following PostCommentSummary
DTO:
public class PostCommentSummary { private Number id; private String title; private String review; public PostCommentSummary( Number id, String title, String review) { this.id = id; this.title = title; this.review = review; } public PostCommentSummary() {} //Getters omitted for brevity }
When running the following DTO projection query:
List<PostCommentSummary> summaries = entityManager .createQuery( "select new " + " com.vladmihalcea.book.hpjp.hibernate.fetching.PostCommentSummary( " + " p.id, p.title, c.review " + " ) " + "from PostComment c " + "join c.post p " + "order by c.createdOn") .setMaxResults(10) .getResultList(); assertEquals(10, summaries.size()); assertEquals("Post nr. 1", summaries.get(0).getTitle()); assertEquals("Comment nr. 1", summaries.get(0).getReview()); assertEquals("Post nr. 2", summaries.get(9).getTitle()); assertEquals("Comment nr. 10", summaries.get(9).getReview());
Hibernate appends the pagination clause to the underlying SQL query:
SELECT p.id AS col_0_0_, p.title AS col_1_0_, c.review AS col_2_0_ FROM post_comment c INNER JOIN post p ON c.post_id=p.id ORDER BY c.created_on LIMIT 10
For more details about DTO projection with JPA and Hibernate, check out this article.
Native SQL queries
The JPA query pagination is not limited to entity queries, such as JPQL or Criteria API. You can use it for native SQL queries as well.
List<Tuple> posts = entityManager .createNativeQuery( "select p.id as id, p.title as title " + "from post p " + "order by p.created_on", Tuple.class) .setFirstResult(10) .setMaxResults(10) .getResultList(); assertEquals(10, posts.size()); assertEquals("Post nr. 11", posts.get(0).get("title")); assertEquals("Post nr. 20", posts.get(9).get("title"));
When running the SQL query above, Hibernate appends the DB-specific pagination clause:
SELECT p.id AS id, p.title AS title FROM post p ORDER BY p.created_on LIMIT 10 OFFSET 10
JOIN FETCH and pagination
However, if we try to use the JOIN FETCH
clause in the entity query while also using JPA pagination:
List<Post> posts = entityManager.createQuery( "select p " + "from Post p " + "left join fetch p.comments " + "order by p.createdOn", Post.class) .setMaxResults(10) .getResultList(); assertEquals(10, posts.size());
Hibernate will issue the following warning message:
HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!
And the executed SQL query will lack the pagination clause:
SELECT p.id AS id1_0_0_, c.id AS id1_1_1_, p.created_on AS created_2_0_0_, p.title AS title3_0_0_, c.created_on AS created_2_1_1_, c.post_id AS post_id4_1_1_, c.review AS review3_1_1_, c.post_id AS post_id4_1_0__, c.id AS id1_1_0__ FROM post p LEFT OUTER JOIN post_comment c ON p.id=c.post_id ORDER BY p.created_on
This is because Hibernate wants to fetch entities fully along with their collections as indicated by the JOIN FETCH
clause while the SQL-level pagination could truncate the ResultSet
possibly leaving a parent Post
entity with fewer elements in the comments
collection.
The problem with the HHH000104
warning is that Hibernate will fetch the product of Post
and PostComment
entities, and due to the result set size, the query response time is going to be significant.
In order to work around this limitation, you have to use a Window Function query:
List<Post> posts = entityManager .createNativeQuery( "select * " + "from ( " + " select *, dense_rank() OVER (ORDER BY post_id) rank " + " from ( " + " select p.*, pc.* " + " from post p " + " left join post_comment pc on p.id = pc.post_id " + " order by p.created_on " + " ) p_pc " + ") p_pc_r " + "where p_pc_r.rank <= :rank", Post.class) .setParameter("rank", 10) .unwrap(NativeQuery.class) .addEntity("p", Post.class) .addEntity("pc", PostComment.class) .setResultTransformer(DistinctPostResultTransformer.INSTANCE) .getResultList();
For more details about using Window Functions to fix the
HHH000104
issue as well as the code forDistinctPostResultTransformer
, check out this article.
Why not use query streaming instead?
JPA 2.2 added the getResultStream
Query
method, which you might think it’s a valid alternative to pagination. However, the stream result will not provide the result set size to the query planner, hence a suboptimal execution plan might be picked. For this reason, it’s much more efficient to use pagination than streaming when it comes to fetching small amounts of data.
For more details about why pagination is more efficient than streaming, check out this article.
Keyset pagination
Markus Winand, who wrote the SQL Performance Explained book, advocates for Keyset pagination instead of Offset. Although Offset pagination is an SQL standard feature, there are two reasons why you would prefer Keyset pagination:
- performance (the index must be scanned up to the offset while, for keyset pagination, we can go directly to the first index entry that matches our order by predicate and filtering criteria)
- correctness (if elements are being added in between, offset pagination will not provide consistent reads)
Even if Hibernate does not support keyset pagination, you can use a native SQL query for this purpose. I cover this topic in my High-Performance SQL training.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
Fetching just as much data as you need is one of the most important tips when it comes to data access performance. When fetching data, pagination allows you to control the result set size so that performance is stable even if the underlying dataset grows with time.
While keyset pagination offers better performance for large result sets, if you can narrow the scanned data set using the right filtering predicates, then offset pagination is going to perform rather well. To get consistent reads, you have to make sure that the scanned data set is always sorted in such a way that new entries are appended at the end of the set, rather than being mixed in between old entries.
