The Spring Data JPA findById Anti-Pattern
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to see how the Spring Data JPA findById method can become an Anti-Pattern when using it to reference parent entity associations.
Domain Model
Let’s consider we have a PostComment
child entity that is associated with the parent Post
entity via the post
reference in the PostComment
entity:
The Post
entity is mapped as follows:
@Entity @Table(name = "post") public class Post { @Id private Long id; private String title; @NaturalId private String slug; }
And, the PostComment
entity maps the Foreign Key column using the @ManyToOne
annotation:
@Entity @Table(name = "post_comment") public class PostComment { @Id @GeneratedValue private Long id; private String review; @ManyToOne(fetch = FetchType.LAZY) private Post post; }
Spring Data JPA findById Anti-Pattern
Let’s assume we want to provide an addNewPostComment
method with the following signature:
PostComment addNewPostComment(String review, Long postId);
The most typical way the addNewPostComment
method is usually implemented is this:
@Transactional(readOnly = true) public class PostServiceImpl implements PostService { @Autowired private PostRepository postRepository; @Autowired private PostCommentRepository postCommentRepository; @Transactional public PostComment addNewPostComment(String review, Long postId) { PostComment comment = new PostComment() .setReview(review) .setPost(postRepository.findById(postId) .orElseThrow( ()-> new EntityNotFoundException( String.format( "Post with id [%d] was not found!", postId ) ) ) ); postCommentRepository.save(comment); return comment; } }
However, when calling the addNewPostComment
method:
postService.addNewPostComment( "Best book on JPA and Hibernate!", postId );
We’ll see that Spring Data JPA generates the following SQL statements:
SELECT post0_.id AS id1_0_0_, post0_.slug AS slug2_0_0_, post0_.title AS title3_0_0_ FROM post post0_ WHERE post0_.id = 1 SELECT nextval ('hibernate_sequence') INSERT INTO post_comment ( post_id, review, id ) VALUES ( 1, 'Best book on JPA and Hibernate!', 1 )
Every time I run this example during my High-Performance Java Persistence training, I ask my students where does the first SQL query come from:
SELECT post0_.id AS id1_0_0_, post0_.slug AS slug2_0_0_, post0_.title AS title3_0_0_ FROM post post0_ WHERE post0_.id = 1
This query was generated by the findById
method call, which is meant to load the entity in the current Persistence Context. However, in our case, we don’t need that. We just want to save a new PostComment
entity and set the post_id
Foreign Key column to a value that we already know.
But, since the only way to set the underlying post_id
column value is to provide a Post
entity reference, that’s why many developers end up calling the findById
method.
In our case, running this SQL query is unnecessary because we don’t need to fetch the parent Post
entity. But how can we get rid of this extra SQL query?
How to fix the Spring Data JPA findById Anti-Pattern
The fix is actually very easy. Instead of using findById
, we need to use the getReferenceById
method that’s inherited automatically from the JpaRepository
:
@Transactional public PostComment addNewPostComment(String review, Long postId) { PostComment comment = new PostComment() .setReview(review) .setPost(postRepository.getReferenceById(postId)); postCommentRepository.save(comment); return comment; }
That’s it!
When calling the same addNewPostComment
method now, we see that the Post
entity is no longer fetched:
SELECT nextval ('hibernate_sequence') INSERT INTO post_comment ( post_id, review, id ) VALUES ( 1, 'Best book on JPA and Hibernate!', 1 )
The reason why the Post
entity is no longer fetched is that the getReferenceById
method calls the getReference
method of the underlying EntityManager
, giving you an entity Proxy instead:
public T getReferenceById(ID id) { Assert.notNull(id, "The given id must not be null!"); return this.em.getReference(this.getDomainClass(), id); }
An entity Proxy is sufficient for our use case because HIbernate will only call the getId
method on it in order to set the underlying post_id
column value before executing the SQL INSERT statement.
And since the Proxy
already has the id
value as we provided it to the getReferenceById
method, there is no need for Hibernate to initialize it with a secondary SQL query as long as only the getId
method is called on the Proxy entity.
On the other hand, if we called any other method on the Proxy object, Hibernate would have to trigger the Proxy initialization, and an SQL SELECT statement would be expected to load the entity object from the database.
For more details about the difference between the
find
and thegetReference
methods of the JPAEntityManager
, check out this article as well.
Third time’s a charm!
Now, how many times have you seen anyone use the getReferenceById
method in any Spring Data JPA project you’ve ever worked on?
If you haven’t seen it very often, it’s understandable. Naming things is hard.
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton.
Initially, Spring Data JPA offered a getOne
method that we should call in order to get an entity Proxy. But we can all agree that getOne
is not very intuitive.
So, in the 2.5 version, the getOne
method got deprecated in favor of the getById
method alternative, that’s just as unintuitive as its previous version.
Neither getOne
nor getById
is self-explanatory. Without reading the underlying Spring source code or the underlying Javadoc, would you know that these are the Spring Data JPA methods to call when you need to get an entity Proxy?
Therefore, in the 2.7 version, the getById
method was also deprecated, and now we have the getReferenceById
method instead, as illustrated by the Spring Data SimpleJpaRepository
implementation:
@Deprecated public T getOne(ID id) { return this.getReferenceById(id); } @Deprecated public T getById(ID id) { return this.getReferenceById(id); }
In my opinion, even the JPA getReference
method name was badly chosen. I’d have called it getProxy
, as that’s exactly what it returns, an entity proxy.
If the JPA method were called getProxy
and the JpaRepository
offered a getProxyById
method, I think we’d have seen this method being used much more often.
YouTube Video
I also published a YouTube video about what is the best way to fetch entities when using JPA and Hibernate, so if you enjoyed this article, you are going to love that video episode as well:
The findById race condition
Way too often, developers think that calling findById
would provide them a way to validate the parent record existence, but this is just an illusion, as illustrated by the following test case:
@Transactional public PostComment addNewPostComment(String review, Long postId) { Post post = postRepository.findById(postId).orElseThrow( ()-> new EntityNotFoundException( String.format("Post with id [%d] was not found!", postId) ) ); try { Integer updateCount = executorService.submit( () -> transactionTemplate.execute(status -> entityManager.createQuery(""" delete from Post where id = :id """) .setParameter("id", postId) .executeUpdate() ) ).get(); assertEquals(1, updateCount.intValue()); } catch (Exception e) { throw new IllegalStateException(e); } PostComment comment = new PostComment() .setReview(review) .setPost(post); postCommentRepository.save(comment); return comment; }
This time, we are emulating the use case when someone deletes the Post
parent record just after we have loaded it from the database.
When executing this version of the addNewPostComment
method, we get the following output:
[Alice]: -- Save PostComment [Alice]: o.h.e.t.i.TransactionImpl - begin [Alice]: SELECT p.id, p.slug, p.title FROM post p where p.id = 1 [Bob]: o.h.e.t.i.TransactionImpl - begin [Bob]: DELETE FROM post WHERE id = 1 [Bob]: o.h.e.t.i.TransactionImpl - committing [Alice]: SELECT nextval('post_comment_SEQ' [Alice]: o.h.e.t.i.TransactionImpl - committing [Alice]: INSERT INTO post_comment ( post_id, review, id ) VALUES ( 1, 'Best book on JPA and Hibernate!', 1 ) [Alice]: o.h.e.j.s.SqlExceptionHelper - SQL Error: 0, SQLState: 23503 [Alice]: o.h.e.j.s.SqlExceptionHelper - ERROR: insert or update on table "post_comment" violates foreign key constraint "fk_post_comment_post_id"
So, calling findById
is an Anti-Pattern as not only that it introduces the overhead of executing a SELECT that fetches the parent entity, but it’s also ineffective as the record can be deleted anyway by a concurrent transaction, right after our findById
call has executed.
So, in the end, you still have to provide the error handling logic for the Foreign Key constraint violation failure unless the SELECT query locks the parent entity, as explained in this article.
But then, the reason why most relational database systems have migrated to MVCC (Multi-Version Concurrency Control) from the 2PL (Two-Phase Locking) concurrency control strategy that was very popular in the ’80s is to avoid taking more locks than necessary. So, if you always lock the parent record just to avoid its deletion, you will affect your application scalability, as demonstrated by the Universal Scalability Law.
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
While the findById
method is very useful when you really need to fetch an entity, to create a child entity, you don’t need to fetch the parent entity just to set the Foreign Key column value.
Especially for batch processing tasks that need to create many such entities, the findById
Anti-Pattern can easily lead to N+1 query issues, so better avoid it.

Awesome! Thank you a lot!
You’re welcome