Pagination best practices

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

In this article, we are going to discuss several data pagination best and worst practices.

Data pagination is omnipresent in enterprise applications. Yet, most solutions, not only they offer a bad user experience, but they are also inefficient.

The problem pagination solves

If you only had a dozen of entries in your database, then you can just simply fetch all data and display it to the user. However, this is almost never the case. Most often, database table entries range from tens of rows to billions of records.

Fetching a large amount of data takes a significant amount of time. That’s because a lot of work needs to be done to move large volumes of data from the database server to the user interface:

  • the data has to be scanned from the disk and loaded into the database server buffer pool
  • the loaded data is sent over the network
  • the application server will get the data in tabular fashion (e.g., the JDBC ResultSet)
  • the application transforms the tabular-based data into tree-based structures (e.g., entities, DTOs)
  • the tree-based structure is transformed to JSON and sent over the network to the browser
  • the browser needs to load the entire JSON and use it to build the UI

Not only that fetching large volumes of data is extremely inefficient, but it also hurts the user experience. Imagine having to load a list of tens of thousands of entries on your mobile phone over a metered connection. Loading this particular large list will be slow, expensive, and impossible to navigate on a mobile phone with a very limited viewport.

So, for all these reasons, pagination is used so that only a small subset of the entire dataset is fetched and displayed at once.

Classic pagination layout

Now, the most common way of sampling a result set is to split it into multiple pages or subsets of data. One such example can be seen on the old Hibernate forum:

Hibernate forum pagination

There are over 66k posts which are split into 2600 pages. While you can practically navigate to any of those pages, in reality, this is not very useful.

Why would I want to go to page number 1758? Or, how easily could I find the exact topic I’m interested in by jumping thousands of times from one page to the next?

Page number limit

When searching for a less-selective term on Google, you might end up getting a very large result set list of possible pages matching the searched keywords.

Searching Hibernate on Google - first page

So, when searching for “Hibernate”, Google says it has 22 million results. However, Google only provides the most relevant 16 or 17 pages:

The thing is, few users ever navigate to the second or third pages. Most users don’t even need to go from one page to another because there is a much better way to find exactly what you are interested in. You just need more selective search terms.

So, when refining the search terms, we get a much better chance of finding what we were interested in:

High-selective search terms

Better ways of filtering

Pagination is good, especially because it allows you to fetch only a small subset of data at a time. However, page navigation is only useful if the number of pages is small. If you have tens or hundreds of pages, then it’s going to be very difficult for the users to find what they are interested in.

Instead, what the user wants is a better filtering tool. Instead of manually scanning each page, it would be much better if the application could do that automatically as long as you provide highly-selective search terms.

For the application developers, it means they have to provide with a way of refining the result set, either by providing more filtering criteria or via a more detailed search term description.

This way, the application can fetch the exact subset of data the user is interested in, and the user will quickly find the right entry from the narrowed result set.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Although data pagination is a very widespread requirement, many implementations treat the problem only from a mathematical perspective instead of solving it from the user perspective.

So, limiting the number of pages is actually a very useful trick since it allows the user to focus on providing the right search terms. From the database perspective, limiting the number of pages is a safety measure as well as a single query, fetching billions of records, can end up consuming all database resources (CPU, memory, IO bandwidth), therefore affecting application availability.

FREE EBOOK

6 Comments on “Pagination best practices

    • What if you are using UUIDs as table identifiers? How would Keyset Pagination work then?

      • I am not sure what problem you specifically have in mind. It’s possible you are suggesting that it may not be practical to build a clustered index over UUID (unless they are generated via something like https://docs.microsoft.com/en-us/sql/t-sql/functions/newsequentialid-transact-sql?view=sql-server-2017). Which in turn means that keyset pagination over rows ordered by UUID will have to access row data in a random pattern, which is usually understandably slow. However, the same thing happens to offset pagination if UUID (or a “name”) is used to order rows.

        So probably you have something else in mind. What is it?

      • Keyset Pagination takes the PK into consideration when ordering the result set. However, UUIDs are randomized, so when adding a new element, it might be that it goes into a previous page, therefore breaking your page ordering which you said that’s specific to OFFSET pagination. Some DBs like SQL Server or MySQL offer a non-standard UUID type that uses the current timestamp in the very first byte, but what about other DBs or if you want to use a standard UUID? So, Keyset Pagination can also suffer from non-stable pages if rows are being added.

      • So, Keyset Pagination can also suffer from non-stable pages if rows are being added

        True, it can. I value the fact that this approach does not suffer from linearly growing amount of work for retrieving pages with higher numbers, while offset pagination has this drawback.

      • Keyset pagination is advertised for two things: predictable response time across any page request and stable result sets.

        The first one is indeed a problem with Offset pagination, but if you limit the number of pages, then it won’t be a significant performance difference between those two pagination approaches.

        The second one is also provided by Keyset pagination as long as the PK uses monotonically increasing numeric values (e.g., IDENTITY, SEQUENCE). However, stable results is not always a problem. Without seeing what others are doing, it actually breaks Linearizability because the result set is not stable at all. Other concurrent transactions can still insert/update the records that are currently displayed on some user UI.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.