How to limit the SQL query result set to Top-N rows only

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see how we can limit the SQL query result set to the Top-N rows only.

Limiting the SQL result set is very important when the underlying query could end up fetching a very large number of records, which can have a significant impact on application performance.

Why limit the number of rows of a SQL query?

Fetching more data than necessary is the number one cause of data access performance issues. When a given business use case is developed, the amount of data available in both the development and QA environment is rather small, hence not all SQL queries are written so that the result set is limited to a fixed number of records.

After the application is deployed into production, data starts accumulating, and queries that were once blazing fast start to run slower and slower. Even if indexes are applied to the SQL query filtering and sorting criteria unless the index covers the entire query (e.g., index-only scan), the table records will have to be inspected using a random-access read pattern.

If the result set size is small and the database could use an index for the filtering and sorting criteria, then the cost associated with reading the table records is still smaller than scanning the entire table. On the other hand, if the result set size is very large and the database needs to access a very large percentage of a given table, then using an index would be less efficient than scanning the entire table.

To prove it, consider we have the following post table in our database which contains 5000 records:

The post database table

So, if we don’t restrict the result set to the Top-N records:

EXPLAIN ANALYZE 
SELECT 
    title
FROM 
    post
ORDER BY 
    id DESC

The execution plan for the SQL query above looks as follows:

| QUERY PLAN                                           |
|------------------------------------------------------|
| Sort                                                 | 
|  (cost=63.66..64.92 rows=504 width=524)              |
|  (actual time=4.999..5.808 rows=5000 loops=1)        |
| Sort Key: id DESC                                    |
| Sort Method: quicksort  Memory: 583kB                |
| ->  Seq Scan on post                                 |
|       (cost=0.00..41.04 rows=504 width=524)          |
|       (actual time=0.059..1.753 rows=5000 loops=1)   |
|                                                      |
| Planning time: 0.833 ms                              |
| Execution time: 6.660 ms                             |

Notice the sequential scan done on all 5000 rows of the post table.

Now, when adding the LIMIT clause which restricts the result set to 5 records only:

EXPLAIN ANALYZE 
SELECT 
    title
FROM 
    post
ORDER BY 
    id DESC
LIMIT 5

The execution plan for the Top-N SQL query looks like this:

| QUERY PLAN                                      |
|-------------------------------------------------|
| Limit                                           |
|   (cost=0.28..0.46 rows=5 width=24)             |
|   (actual time=0.019..0.021 rows=5 loops=1)     |
| ->  Index Scan Backward using post_pkey on post |
|       (cost=0.28..178.28 rows=5000 width=24)    |
|       (actual time=0.017..0.019 rows=5 loops=1) |
|                                                 |
| Planning time: 0.854 ms                         |
| Execution time: 0.046 ms                        |

Notice that an index scan was used this time, and only 5 records were scanned and fetched as well. More, the execution time is hundreds of times lower than the previous execution.

The SQL query result set size can impact the execution plan as the database might choose a full-table scan even if an index is available for the query filtering and sorting criteria.

Not only the execution plan can be less efficient, but fetching more data than necessary will consume a significant amount of resources both on the database, server, and client-side too.

First, the records need to be fetched into the database buffer pool.

Afterward, the records are sent over the network to the server. On the server, the JDBC Driver will allocate all the necessary Java Objects to represent the query result set.

However, since the JDBC ResultSet is not passed to the client, the records must be transformed into entities or DTOs.

The resulting entities or DTOs might be transformed into JSON and passed again over the network to the client, where the JSON objects need to be loaded into the browser memory prior to being used to render the UI.

Fetching large volumes of data requires a significant amount of resources on multiple layers (e.g., database, server, client).

Fetching the Top-N rows only

So, since the UI display has a limited size, it makes no sense to fetch more data than can be displayed at once. Now, depending on the underlying relational database system you are using, the SQL clause that allows you to limit the query result set size, might differ.

SQL:2008 Standard

Up until SQL:2008, there was no standard way of fetching the Top-N records from a given result set. The standard syntax looks as follows:

SELECT 
    title
FROM 
    post
ORDER BY 
    id DESC
FETCH FIRST 5 ROWS ONLY

Notice the FETCH FIRST 5 ROWS ONLY clause which tells the database that we are interested in fetching just the first 5 records only. Another thing to notice is that we are using an ORDER BY clause since, otherwise, there is no guarantee which are the first records to be included in the returning result set.

The SQL:2008 Top-N records clause is supported in Oracle since 12c, SQL Server since 2012, and PostgreSQL since 8.4.

SQL Server

While SQL Server supports the SQL:2008 Top-N standard syntax, you need to provide the OFFSET clause as well:

SELECT
    title
FROM 
    post
ORDER BY 
    id DESC
OFFSET 0 ROWS
FETCH FIRST 5 ROWS ONLY

Since we are interested in the Top-N records only, the OFFSET is 0 in our case.

Prior to SQL Server 2012, you had to use TOP to limit the result set size:

SELECT TOP 5
    title
FROM post
ORDER BY id DESC

Oracle 11g and older versions

Prior to version 12c, to fetch the Top-N records, you had to use a Derived Table and the ROWNUM pseudocolumn:

SELECT *
FROM (
    SELECT 
        title
    FROM 
        post
    ORDER BY 
        id DESC
)
WHERE ROWNUM <= 5

The reason why the outer Derived Table is used is that the ROWNUM pseudocolumn value is assigned prior to the ORDER BY clause execution. Using a Derived Table, we can make sure that the ROWNUM pseudocolumn, we use for filtering the Top-N records, is assigned after sorting the underlying result set.

MySQL and PostgreSQL 8.3 or older

Traditionally, MySQL and PostgreSQL use the LIMIT clause to restrict the result set to the Top-N records:

SELECT 
    title
FROM 
    post
ORDER BY 
    id DESC
LIMIT 5

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Fetching the right amount of data is very important for application performance. Luckily, SQL allows us to limit a given query to the Top-N records using either the SQL:2008 standard syntax or database-specific alternatives.

Follow @vlad_mihalcea