JDBC Statement fetchSize property

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see how the JDBC Statement fetchSize property works when using Oracle, SQL Server, PostgreSQL, or MySQL.

JDBC ResultSet fetching

The JDBC ResultSet offers a client-side cursor for fetching the current statement return data.

When the statement gets executed, the result must be transferred from the database cursor to the client-side one. This operation can either be done at once or on demand.

There are three types of ResultSet cursors:

  • TYPE_FORWARD_ONLY – This is the default ResultSet cursor type. The result set can only be moved forward, and the resulting data can either be fetched at once or retrieved while the cursor is being iterated. The database can decide to fetch the data as it was available at the time the query started or as it is upon fetching.
  • TYPE_SCROLL_INSENSITIVE – The result set can be scrolled both forward and backward, and the resulting data is insensitive to concurrent changes occurring while the cursor is still open.
  • TYPE_SCROLL_SENSITIVE – The result set can be scrolled both forward and backward, and the resulting data is sensitive to concurrent changes occurring while the cursor is still open. The data is therefore fetched on demand as opposed to being retrieved from a database cursor cache.

Not all database drivers implement all cursor types, and the batch fetching behavior is controlled via the JDBC Statement fetchSize property, which according to the Javadoc:

Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement.

If the value specified is zero, then the hint is ignored. The default value is zero.

The default fetching strategy is, therefore database-specific, and from the application performance point of view, this aspect is very important when tuning the data access layer:

  • Oracle

    By default, when Oracle JDBC runs a query, it retrieves a result set of 10 rows at a time from the database cursor.

    According to Oracle JDBC Driver documentation:

    What is “reasonable” depends on the details of the application. Oracle suggests the fetchSize be no more than 100, although in some cases larger size may be appropriate. A fetchSize of 100 may be inappropriately large for some queries even when many rows are returned.

  • MySQL

    By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement.

  • SQL Server

    Normally, when the Microsoft JDBC Driver for SQL Server executes a query, the driver retrieves all of the results from the server into application memory. Although this approach minimizes resource consumption on the SQL Server, it can throw an OutOfMemoryError in the JDBC application for the queries that produce very large results.

  • PostgreSQL

    By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

  • DB2

    By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows. The fetchSize property differs from the queryDataSize property. fetchSize affects the number of rows that are returned, and queryDataSize affects the number of bytes that are returned.

    For example, if the result set size is 50 KB, and the value of queryDataSize is 32767 (32KB), two trips to the database server are required to retrieve the result set. However, if queryDataSize is set to 65535 (64 KB), only one trip to the data source is required to retrieve the result set.

Hibernate ResultSet traversing options

The JPA Query interface offers only full-result retrievals through the getResultList method call.

Hibernate also supports scrollable ResultSet cursors through its specific Query.scroll() API.

The only apparent advantage of scrollable ResultSets is that we can avoid memory issues on the client-side since data is being fetched on demand. This might sound like a natural choice, but in reality, you shouldn’t fetch large result sets for the following reasons:

  • Large result sets impose significant database server resources and because a database is a highly concurrent environment, it might hinder availability and scalability
  • Tables tend to grow in size and a moderate result set might easily turn into a very large one. This kind of situation happens in production systems, long after the application code is shipped. Because users can only browse a relatively small portion of the whole result set, pagination is a more scalable data fetching alternative
  • The overly common offset paging is not suitable for large result sets (because the response time increases linearly with the page number) and you should consider keyset pagination when traversing large result sets. The keyset pagination offers a constant response time insensitive to the relative position of the page being fetched
  • Even for batch processing jobs, it’s always safer to restrict processing items to a moderate batch size. Large batches can lead to memory issues or cause long-running transactions, which increase the undo/redo transaction log size

Testing time

Our domain entity model looks like this:

PostComment Fetch Size

The following test will be used for validating various result set fetching behaviors:

@Test
public void testFetchSize() {
    doInTransaction(session -> {
        int batchSize = batchSize();
        for(int i = 0; i < itemsCount(); i++) {
            Post post = new Post(String.format("Post no. %d", i));
            int j = 0;

            post.addComment(
                new Comment(
                    String.format("Post comment %d:%d", i, j++)
                )
            );
            post.addComment(
                new Comment(
                    String.format("Post comment %d:%d", i, j++)
                )
            );
            session.persist(post);

            if(i % batchSize == 0 && i > 0) {
                session.flush();
                session.clear();
            }
        }
    });

    long startNanos = System.nanoTime();
    LOGGER.info("Test fetch size");
    doInTransaction(session -> {
        List posts = session.createQuery("""
            select p
            from Post p
            join fetch p.comments
            """)
        .list();

        LOGGER.info("{}.fetched {} entities",
            getClass().getSimpleName(),
            posts.size()
        );

    });
    LOGGER.info("{}.testFetch took {} millis",
        getClass().getSimpleName(),
        TimeUnit.NANOSECONDS.toMillis(
            System.nanoTime() - startNanos
        )
    );
}

To configure Hibernate to use an explicit Statement fetchSize property, we need to set the following Hibernate property:

properties.put("hibernate.jdbc.fetch_size", fetchSize());

Every test will insert 5000 Post entities, each one having 2 Comments.

Tetsing on a commercial database system

The first tests are run against a commercial database with the following results:

| Fetch Size   | Duration [millis] |
|--------------|-------------------|
| 1            | 1190              |
| 10           | 640               |
| 100          | 481               |
| 1000         | 459               |
| 10000        | 449               |
| Default (10) | 545               |

The larger the fetch size, the fewer round trips are required for fetching the whole result set. If the returned rows contain many columns, a larger fetch size would require proportionally larger database buffers.

YouTube Video

I also published a YouTube video about the JDBC Statement fetchSize, so enjoy watching it if you’re interested in this topic.

PostgreSQL

The second test round is run against PostgreSQL with the following results:

| Fetch Size    | Duration [millis] |
|---------------|-------------------|
| 1             | 1181              |
| 10            | 572               |
| 100           | 485               |
| 1000          | 458               |
| 10000         | 437               |
| Default (all) | 396               |

The default fetchSize yields the best result, even when the fetchSize is equal to the total number of rows being returned. Since there is no upper-bound buffer limit, the default fetch size can cause OutOfMemoryError issues when retrieving large result sets.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While most database serves don’t impose a default upper limit on the result set fetch size, it’s a good practice to limit the whole result set (if requirements allow it).

A limited size result set should address the unbounded fetch size shortcoming while ensuring predictable response times even when the queried data grows gradually.

The shorter the queries, the quicker the row-level locks are released, and the more scalable the data access layer becomes.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.