The Spring Data findAll Anti-Pattern
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to see why inheriting the default Spring Data findAll method in every data access Repository is a terrible Anti-Pattern.
A picture is worth 1000 words:
You’d be surprised how many times I’ve seen this issue where tons of data was fetched from the DB just to filter it in Java and throw away the vast majority of records that were not needed for that particular business requirement.
Anti-Pattern
First, let’s start with the Anti-Pattern definition, as given by Wikipedia:
An anti-pattern in software engineering, project management, and business processes is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive.
That’s exactly what you get when you offer a findAll
method in the Spring Data Repository
base interface that all Data Access Objects will automatically inherit.
The Spring Data Repository hierarchy
Usually, when you create a Spring Data JPA Repository
, your custom interface extends the JpaRepository
:
@Repository public interface PostRepository extends JpaRepository<Post, Long> { }
However, the JpaRepository
provides a series of methods that it either declared directly or inherited from the underlying base interfaces, as illustrated by the following diagram:
Even if the JpaReository
didn’t provide a findAll
method explicitly, it would still inherit one from the CrudRepository
that’s shared by all Spring Data modules.
The Spring Data findAll Anti-Pattern
The problem with the findAll
method is that, as an API designer, you should not really offer your clients a way to fetch an entire database table because, if you do, some developers are actually going to do that, even for large database tables.
Let’s assume we have the following Post
and Tag
entities in our system:
List<Tag> tags = List.of( new Tag() .setId(1L) .setName("JDBC"), new Tag() .setId(2L) .setName("JPA"), new Tag() .setId(3L) .setName("Hibernate") ); tags.forEach(tag -> entityManager.persist(tag)); for (long i = 1; i <= POST_SIZE; i++) { entityManager.persist( new Post() .setId(i) .setTitle( String.format( "High-Performance Java Persistence, Part %d", i ) ) .addTag(tags.get((int) i % 3)) ); }
We want to get the titles of all the JDBC or Hibernate posts, so how can we do that?
The easiest way is, of course, to use a single SQL query that looks as follows:
public List<String> findPostTitleByTags(List<String> tags) { return entityManager.createNativeQuery(""" select p.title from post p where exists ( select 1 from post_tag pt join tag t on pt.tag_id = t.id and pt.post_id = p.id where t.name in (:tags) ) order by p.id """) .setParameter("tags", tags) .getResultList(); }
But, not every Java developer enjoys solving this problem using a simple SQL query. Some of them will do whatever is humanly possible to avoid writing any SQL query.
And, since we are just one findAll
method call away from “solving” this issue, it won’t take long until someone provides the following solution:
List<String> postTitlesStreamRecords = postRepository.findAll() .stream() .filter( post -> post.getTags() .stream() .map(Tag::getName) .anyMatch(matchingTags::contains) ) .sorted(Comparator.comparing(Post::getId)) .map(Post::getTitle) .collect(Collectors.toList());
When executing the findAll
workaround, Hibernate will execute the following queries:
Query:["select post0_.id as id1_1_, post0_.title as title2_1_ from post post0_"], Params:[()] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(1)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(2)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(3)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(4)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(5)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(6)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(7)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(8)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(9)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(10)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(11)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(12)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(13)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(14)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(15)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(16)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(17)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(18)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(19)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(20)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(21)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(22)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(23)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(24)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(25)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(26)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(27)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(28)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(29)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(30)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(31)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(32)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(33)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(34)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(35)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(36)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(37)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(38)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(39)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(40)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(41)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(42)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(43)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(44)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(45)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(46)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(47)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(48)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(49)] Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(50)]
Not nice!
The reason why you see so many secondary queries is due to filter
predicate that needs to fetch the tags
collection in order to match the provided tags.
So, is calling findAll
always going to be an issue? Of course, not! However, just because you might have a few tables that have a small number of records, most of the time, a table will have enough records to incur a noticeable overhead if you fetched the entire table just to do the filtering in the application.
It’s all about probabilities. So, before allowing the findAll
method to be inherited automatically by all your data access layer Repository
instances, you should answer the following questions:
- What is the probability that, if you provide a
findAll
method, everybody in your team is going to reason about whether it’s efficient to call it or not? - What if we don’t know at development time how large the tables will end up being three years after the product is launched into production?
- Are all the code changes reviewed by peers to ensure that no one will call
findAll
and filter it into memory without considering the table size in production? - Are you automating those checks using a tool like Hypersistence Optimizer that can auto-detect when you are misusing JPA or Hibernate?
If you cannot be sure about the future outcome of allowing a potentially dangerous method to be misused, then it’s a safer bet to avoid inheriting it by default in every single Repository
class.
Preventing the Spring Data fetchAll Anti-Pattern
If you don’t want to inherit the findAll
method by default, one very simple option is to use the HibernateRepository
from the Hypersistence Utils project, which chooses to deprecate the findAll
method.
More, if you happen to call it by mistake, HibernateRepository
will throw an UnsupportedOperationException
instead of risking allowing you to fetch more data than necessary.
To use the HibernateRepository
, just make your Repository
classes extend it before the default JpaRepository
, as in the following example:
@Repository public interface PostRepository extends HibernateRepository<Post>, JpaRepository<Post, Long>, CustomPostRepository { }
The
HibernateRepository
needs to precede theJpaRepository
as otherwise, IntelliJ IDEA will not strike through thefindAll
method call.We also have to provide the
io.hypersistence.utils.spring.repository
package to the@EnableJpaRepositories
annotation:@EnableJpaRepositories( basePackages = { "io.hypersistence.utils.spring.repository", ... } )
This way, you will instantly see that the method is a risky bet since it’s struck through now:findAll
However, if you know for sure that for specific use cases, it’s fine to have a findAll
method, then you can simply override it as follows:
@Repository public interface PostRepository extends HibernateRepository<Post>, JpaRepository<Post, Long>, CustomPostRepository { @Query("from Post") @Override List<Post> findAll(); }
This way, you make your data access code explicit. By providing a findAll
declaration, you make it clear that you are really going to need it. And this way, you will never call findAll
on an entity that has many records since you made an informed decision when you decided to add an explicit findAll
method definition.
For more details about the
HibernateRepository
, check out this article.
Preventing the Spring Data fetchAll Anti-Pattern using a custom Repository
Another alternative is to define a new base interface by extending the Repository
interface instead of the JpaRepository
and declaring explicitly which methods are allowed to be inherited by your custom repositories:
public interface BaseJpaRepository<T, ID> extends Repository<T, ID> { Optional<T> findById(ID id); T getReferenceById(ID id); boolean existsById(ID id); void deleteById(ID id); void delete(T entity); long count(); }
And, the entities that have a large number of records can now define their specific Repository
interface by extending the BaseJpaRepository
:
@Repository public interface PostRepository extends BaseJpaRepository<Post, Long>, CustomPostRepository { }
And, the entities with a small number of records, like the Country
entity, can extend the default Spring Data JpaRepository
:
@Repository public interface CountryRepository extends JpaRepository<Country, Short> { }
As simple as that!
Both the HibernateRepository
and the BaseJpaRepository
options will allow you to prevent using the findAll
method on entities that have lots of records in their associated database table.
For entities that have few records, you can still use findAll
as you have the option of choosing between the default JpaRepository
and the BaseJpaRepository
or HibernateRepository
. This way, you can have fine-grained control when the findAll
method is allowed to be called.
The Hibernate implicit polymorphic query Anti-Pattern
The findAll
Anti-Pattern reminds me of a lesser-known “feature” of Hibernate that allows you to select the entire database with a single query:
//Don't try this at home! List<Object> objects = entityManager.createQuery(""" select o from java.lang.Object o """) .getResultList();
The reason why this query works is because of the implicit polymorphic query feature offered by Hibernate. The reason why you haven’t seen this Anti-Pattern used in practice is that the Hibernate developers didn’t think it was a good idea to encourage fetching the entire database at once, so they omitted it from the documentation.
Internally, Hibernate team actually used such a query during testing:
protected void cleanupTestData() { doInHibernate(this::sessionFactory, session -> { session.createQuery( "delete from java.lang.Object" ).executeUpdate(); }); }
But, since this was limited to the Hibernate integration tests, there was no problem using such a dangerous query. And, what’s interesting is that not all Hibernate tests used it. Some of Hibernate integration tests were simply dropping and recreating the DB schema in between tests, so there was no point in deleting all records.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
Going back to the definition of an Anti-Pattern:
An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive.
We can conclude that the findAll
method is ineffective and risks being highly counterproductive, and if you offer it by default, then it risks becoming a common response to a recurring problem.
So, for this reason, you are better off avoiding having a findAll
method in every single data access Repository
instance. Better get back in control over your API and decide for yourself when you really need to have such a method.
