SQL Seek Method or Keyset Pagination
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to see what the SQL Seek Method or Keyset Pagination is and why you should consider it when navigating over large results sets.
The goal of pagination is to avoid fetching large volumes of data since the UI has a limited viewport that could be used to display data.
OFFSET Pagination
Before discussing Keyset Pagination, let’s see how the default OFFSET pagination works in SQL.
Although relational database systems have long been providing specific ways of restricting a query result set, since SQL:2008, there is a standard pagination syntax.
Therefore, a TOP-N query that limits the number of records of a given result set can use the FETCH FIRST N ROWS ONLY
directive, as illustrated by the following example:
SELECT id FROM post ORDER BY created_on DESC FETCH FIRST 50 ROWS ONLY
And, a NEXT-N query that skips over the first M records and fetches the next N records looks like this:
SELECT id FROM post ORDER BY created_on DESC OFFSET 150 ROWS FETCH NEXT 50 ROWS ONLY
OFFSET Pagination Indexing
Since pagination requires an ORDER BY
clause in order to guarantee a consistent sort order, it’s common to index the sorting criteria.
In our case, we need to create the following index on the created_on
column:
CREATE INDEX idx_post_created_on ON post (created_on DESC)
When executing the TOP-N query, we can see that the idx_post_created_on
is being used, and only 50 records are being scanned:
SELECT id FROM post ORDER BY created_on DESC FETCH FIRST 50 ROWS ONLY Limit (cost=0.28..2.51 rows=50 width=16) (actual time=0.013..0.022 rows=50 loops=1) -> Index Scan using idx_post_created_on on post p (cost=0.28..223.28 rows=5000 width=16) (actual time=0.013..0.019 rows=50 loops=1) Planning time: 0.113 ms Execution time: 0.055 ms
For the second page, we can see that the idx_post_created_on
has to scan 100 records because it needs to skip over the first 50 rows contained on the first page in order to load the next 50 records that are needed to be returned by this query:
SELECT id FROM post ORDER BY created_on DESC OFFSET 50 ROWS FETCH NEXT 50 ROWS ONLY Limit (cost=2.51..4.74 rows=50 width=16) (actual time=0.032..0.044 rows=50 loops=1) -> Index Scan using idx_post_created_on on post p (cost=0.28..223.28 rows=5000 width=16) (actual time=0.022..0.040 rows=100 loops=1) Planning time: 0.198 ms Execution time: 0.071 ms
The further away we go from the first page, the more records will need to be scanned by the idx_post_created_on
index in order to skip over the records indicated by the OFFSET
clause:
SELECT id FROM post ORDER BY created_on DESC OFFSET 4950 ROWS FETCH NEXT 50 ROWS ONLY Limit (cost=221.05..223.28 rows=50 width=16) (actual time=1.154..1.166 rows=50 loops=1) -> Index Scan using idx_post_created_on on post p (cost=0.28..223.28 rows=5000 width=16) (actual time=0.079..1.033 rows=5000 loops=1) Planning time: 1.629 ms Execution time: 1.190 ms
Note that scanning the entire idx_post_created_on
index takes 20 times more than scanning a single page, which was the case for the initial TOP-N query.
SQL Seek Method or Keyset Pagination
To cope with this index scanning issue that’s inherent in the OFFSET pagination, we can use the Seek Method or Keyset Pagination technique.
The TOP-N Keyset Pagination query looks as follows:
SELECT id, created_on FROM post ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY
Notice that we need to include the id
in the ORDER BY clause since the created_on
column values are not unique. Hence, we will need to pass both the last processed created_on
and id
when loading the next page. Therefore, this time, the query projection needs to load the created_on
column as well.
The Next-N query will use the previously processed created_on
and id
column values to locate the next page of records that need to be loaded.
SELECT id, created_on FROM post WHERE (created_on, id) < ('2019-10-02 21:00:00.0', 4951) ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY
The (created_on, id) < ('2019-10-02 21:00:00.0', 4951)
row value expression is equivalent to:
created_on < '2019-10-02 21:00:00.0' OR ( (created_on = '2019-10-02 21:00:00.0') AND (id < 4951) )
SQL Seek Method or Keyset Pagination Indexing
Because the Seek Method uses both the created_on
and the id
columns in the ORDER BY
clause, we can create the idx_post_created_on
index on both these two columns:
CREATE INDEX idx_post_created_on ON post (created_on DESC, id DESC)
Now, when executing the TOP-N Keyset Pagination query, we can see that it uses the idx_post_created_on
index, and just 50 records are scanned:
SELECT id, created_on FROM post ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY Limit (cost=0.28..1.91 rows=50 width=16) (actual time=0.104..0.110 rows=50 loops=1) -> Index Only Scan using idx_post_created_on_id on post (cost=0.28..163.28 rows=5000 width=16) (actual time=0.102..0.107 rows=50 loops=1) Heap Fetches: 0 Planning Time: 0.882 ms Execution Time: 0.129 ms
The Next-N Keyset Pagination query also uses the idx_post_created_on
index and, unlike the OFFSET Pagination, only 50 rows are scanned this time:
SELECT id, created_on FROM post WHERE (created_on, id) < ('2019-10-02 21:00:00.0', 4951) ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY Limit (cost=0.28..3.40 rows=50 width=32) (actual time=0.029..0.063 rows=50 loops=1) -> Index Scan using idx_post_created_on_id on post (cost=0.28..308.58 rows=4950 width=32) (actual time=0.027..0.057 rows=50 loops=1) Index Cond: ( created_on <= '2020-04-24 06:00:00'::timestamp without time zone ) Filter: ( ROW(created_on, (id)::numeric) < ROW('2020-04-24 06:00:00'::timestamp without time zone, '4951'::numeric) ) Rows Removed by Filter: 2 Heap Fetches: 52 Planning Time: 0.806 ms Execution Time: 0.158 ms
And, loading the last page will be fast as well since Keyset Pagination doesn’t need to scan the entire index in order to skip over the OFFSET records:
SELECT id, created_on FROM post WHERE (created_on, id) < ('2019-10-03 02:00:00.0', 51) ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY Limit (cost=48.82..48.83 rows=1 width=16) (actual time=0.168..0.175 rows=50 loops=1) -> Sort (cost=48.82..48.83 rows=1 width=16) (actual time=0.166..0.170 rows=50 loops=1) Sort Key: created_on DESC, id DESC Sort Method: quicksort Memory: 27kB -> Bitmap Heap Scan on post (cost=4.76..48.81 rows=1 width=16) (actual time=0.071..0.085 rows=50 loops=1) Recheck Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone) Filter: ( (created_on < '2019-10-03 02:00:00'::timestamp without time zone) OR ( (created_on = '2019-10-03 02:00:00'::timestamp without time zone) AND (id < '51'::bigint) ) ) Rows Removed by Filter: 2 Heap Blocks: exact=1 -> Bitmap Index Scan on idx_post_created_on_id (cost=0.00..4.75 rows=63 width=0) (actual time=0.061..0.062 rows=52 loops=1) Index Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone) Planning Time: 0.676 ms Execution Time: 0.279 ms
Cool, right?
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
The Keyset Pagination allows you to use an index to locate the first record of any page that needs to be navigated, and, for this reason, the SQL query can scan fewer records than when using the default OFFSET pagination.
