The best way to use Spring Data JPA Stream methods

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see what is the best way to use Spring Data JPA Stream query methods.

When having to fetch a larger result set, the advantage of using a Java Stream is that the query result set could be fetched progressively instead of getting all the data at once.

JPA Stream methods

As I explained in this article, since version 2.2, you can now fetch a Stream using the getResultStream JPA Query method.

The getResultStream will then use the JDBC ResultSet to stream over the records that are returned by a given query. This approach may be useful when having to process a large volume of data. Therefore, instead of fetching all data at once and putting pressure on the application memory, the result set will be fetched and processed gradually.

Spring Data JPA Stream query methods

If you want to stream over a query result set, then you need to use the Java Stream return type for your Spring Data JPA query method, as illustrated by the following example:

@Repository
public interface PostRepository extends BaseJpaRepository<Post, Long> {

    @Query("""
        select p
        from Post p
        where date(p.createdOn) >= :sinceDate
        """
    )
    @QueryHints(
        @QueryHint(name = AvailableHints.HINT_FETCH_SIZE, value = "25")
    )
    Stream<Post> streamByCreatedOnSince(@Param("sinceDate") LocalDate sinceDate);
}

The FETCH_SIZE JPA query hint is necessary for PostgreSQL and MySQL to instruct the JDBC Driver to prefetch at most 25 records. Otherwise, the PostgreSQL and MySQL JDBC Drivers would prefetch all the query results prior to traversing the underlying ResultSet.

For more details about JPA query hints, check out this article.

With the streamByCreatedOnSince method in place, we can now implement the updatePostCache service method that will fetch the latest Post entities and update the in-memory cache.

@Transactional(readOnly = true)
public void updatePostCache() {
    LocalDate yesterday = LocalDate.now().minusDays(1);
    
    try(Stream<Post> postStream = postRepository.streamByCreatedOnSince(yesterday)) {    
        postStream.forEach(
            post -> executorService.submit(() -> 
                postCache.put(post.getId(), post)
            )
        );
    }
}

Notice that the updatePostCache service method is annotated with @Transactional(readOnly = true) because we need the database connection to be open throughout the duration of the entire Stream traversal.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Spring Data JPA provides support for Stream query methods, but there are several things to keep in mind prior to using this feature:

First, you need to make sure you don’t prefetch all the data, as it’s the case with PostgreSQL and MySQL.
Second, you need to make sure you don’t release the database connection prior to traversing the Stream.

Follow @vlad_mihalcea