The Spring Data JPA findById Anti-Pattern

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see how the Spring Data JPA findById method can become an Anti-Pattern when using it to reference parent entity associations.

Domain Model

Let’s consider we have a PostComment child entity that is associated with the parent Post entity via the post reference in the PostComment entity:

Post and PostComment entities

The Post entity is mapped as follows:

@Entity
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    @NaturalId
    private String slug;
}

And, the PostComment entity maps the Foreign Key column using the @ManyToOne annotation:

@Entity
@Table(name = "post_comment")
public class PostComment {

    @Id
    @GeneratedValue
    private Long id;

    private String review;

    @ManyToOne(fetch = FetchType.LAZY)
    private Post post;

}

Spring Data JPA findById Anti-Pattern

Let’s assume we want to provide an addNewPostComment method with the following signature:

PostComment addNewPostComment(String review, Long postId);

The most typical way the addNewPostComment method is usually implemented is this:

@Transactional(readOnly = true)
public class PostServiceImpl implements PostService {

    @Autowired
    private PostRepository postRepository;

    @Autowired
    private PostCommentRepository postCommentRepository;

    @Transactional
    public PostComment addNewPostComment(String review, Long postId) {            
        PostComment comment = new PostComment()
            .setReview(review)
            .setPost(postRepository.findById(postId)
                .orElseThrow(
                    ()-> new EntityNotFoundException(
                        String.format(
                            "Post with id [%d] was not found!", 
                            postId
                        )
                    )
                )
            );

        postCommentRepository.save(comment);

        return comment;
    }
}

However, when calling the addNewPostComment method:

postService.addNewPostComment(
    "Best book on JPA and Hibernate!", 
    postId
);

We’ll see that Spring Data JPA generates the following SQL statements:

SELECT 
    post0_.id AS id1_0_0_,
    post0_.slug AS slug2_0_0_,
    post0_.title AS title3_0_0_
FROM 
    post post0_
WHERE 
    post0_.id = 1

SELECT nextval ('hibernate_sequence')

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'Best book on JPA and Hibernate!', 
    1
)

Every time I run this example during my High-Performance Java Persistence training, I ask my students where does the first SQL query come from:

SELECT 
    post0_.id AS id1_0_0_,
    post0_.slug AS slug2_0_0_,
    post0_.title AS title3_0_0_
FROM 
    post post0_
WHERE 
    post0_.id = 1

This query was generated by the findById method call, which is meant to load the entity in the current Persistence Context. However, in our case, we don’t need that. We just want to save a new PostComment entity and set the post_id Foreign Key column to a value that we already know.

But, since the only way to set the underlying post_id column value is to provide a Post entity reference, that’s why many developers end up calling the findById method.

In our case, running this SQL query is unnecessary because we don’t need to fetch the parent Post entity. But how can we get rid of this extra SQL query?

How to fix the Spring Data JPA findById Anti-Pattern

The fix is actually very easy. Instead of using findById, we need to use the getReferenceById method that’s inherited automatically from the JpaRepository:

@Transactional
public PostComment addNewPostComment(String review, Long postId) {
    PostComment comment = new PostComment()
        .setReview(review)
        .setPost(postRepository.getReferenceById(postId));

    postCommentRepository.save(comment);

    return comment;
}

That’s it!

When calling the same addNewPostComment method now, we see that the Post entity is no longer fetched:

SELECT nextval ('hibernate_sequence')

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'Best book on JPA and Hibernate!', 
    1
)

The reason why the Post entity is no longer fetched is that the getReferenceById method calls the getReference method of the underlying EntityManager, giving you an entity Proxy instead:

public T getReferenceById(ID id) {
    Assert.notNull(id, "The given id must not be null!");
    return this.em.getReference(this.getDomainClass(), id);
}

An entity Proxy is sufficient for our use case because HIbernate will only call the getId method on it in order to set the underlying post_id column value before executing the SQL INSERT statement.

And since the Proxy already has the id value as we provided it to the getReferenceById method, there is no need for Hibernate to initialize it with a secondary SQL query as long as only the getId method is called on the Proxy entity.

On the other hand, if we called any other method on the Proxy object, Hibernate would have to trigger the Proxy initialization, and an SQL SELECT statement would be expected to load the entity object from the database.

For more details about the difference between the find and the getReference methods of the JPA EntityManager, check out this article as well.

Third time’s a charm!

Now, how many times have you seen anyone use the getReferenceById method in any Spring Data JPA project you’ve ever worked on?

If you haven’t seen it very often, it’s understandable. Naming things is hard.

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton.

Initially, Spring Data JPA offered a getOne method that we should call in order to get an entity Proxy. But we can all agree that getOne is not very intuitive.

So, in the 2.5 version, the getOne method got deprecated in favor of the getById method alternative, that’s just as unintuitive as its previous version.

Neither getOne nor getById is self-explanatory. Without reading the underlying Spring source code or the underlying Javadoc, would you know that these are the Spring Data JPA methods to call when you need to get an entity Proxy?

Therefore, in the 2.7 version, the getById method was also deprecated, and now we have the getReferenceById method instead, as illustrated by the Spring Data SimpleJpaRepository implementation:

@Deprecated
public T getOne(ID id) {
    return this.getReferenceById(id);
}

@Deprecated
public T getById(ID id) {
    return this.getReferenceById(id);
}

In my opinion, even the JPA getReference method name was badly chosen. I’d have called it getProxy, as that’s exactly what it returns, an entity proxy.

If the JPA method were called getProxy and the JpaRepository offered a getProxyById method, I think we’d have seen this method being used much more often.

YouTube Video

I also published a YouTube video about what is the best way to fetch entities when using JPA and Hibernate, so if you enjoyed this article, you are going to love that video episode as well:

The findById race condition

Way too often, developers think that calling findById would provide them a way to validate the parent record existence, but this is just an illusion, as illustrated by the following test case:

@Transactional
public PostComment addNewPostComment(String review, Long postId) {

    Post post = postRepository.findById(postId).orElseThrow(
        ()-> new EntityNotFoundException(
            String.format("Post with id [%d] was not found!", postId)
        )
    );

    try {
        Integer updateCount = executorService.submit(
            () -> transactionTemplate.execute(status ->
                entityManager.createQuery("""
                    delete from Post
                    where id = :id
                    """)
                .setParameter("id", postId)
                .executeUpdate()
            )
        ).get();

        assertEquals(1, updateCount.intValue());
    } catch (Exception e) {
        throw new IllegalStateException(e);
    }

    PostComment comment = new PostComment()
        .setReview(review)
        .setPost(post);

    postCommentRepository.save(comment);

    return comment;
}

This time, we are emulating the use case when someone deletes the Post parent record just after we have loaded it from the database.

When executing this version of the addNewPostComment method, we get the following output:

[Alice]: -- Save PostComment
[Alice]: o.h.e.t.i.TransactionImpl - begin
[Alice]: 
SELECT 
    p.id,
    p.slug,
    p.title 
FROM 
    post p 
where 
    p.id = 1
    
[Bob]: o.h.e.t.i.TransactionImpl - begin
[Bob]: 
DELETE FROM 
    post 
WHERE 
    id = 1
[Bob]: o.h.e.t.i.TransactionImpl - committing

[Alice]: 
SELECT nextval('post_comment_SEQ'
[Alice]: o.h.e.t.i.TransactionImpl - committing
[Alice]: 
INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'Best book on JPA and Hibernate!', 
    1
)
[Alice]: o.h.e.j.s.SqlExceptionHelper - SQL Error: 0, SQLState: 23503
[Alice]: o.h.e.j.s.SqlExceptionHelper - ERROR: insert or update on 
         table "post_comment" violates foreign key constraint 
         "fk_post_comment_post_id"

So, calling findById is an Anti-Pattern as not only that it introduces the overhead of executing a SELECT that fetches the parent entity, but it’s also ineffective as the record can be deleted anyway by a concurrent transaction, right after our findById call has executed.

So, in the end, you still have to provide the error handling logic for the Foreign Key constraint violation failure unless the SELECT query locks the parent entity, as explained in this article.

But then, the reason why most relational database systems have migrated to MVCC (Multi-Version Concurrency Control) from the 2PL (Two-Phase Locking) concurrency control strategy that was very popular in the ’80s is to avoid taking more locks than necessary. So, if you always lock the parent record just to avoid its deletion, you will affect your application scalability, as demonstrated by the Universal Scalability Law.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While the findById method is very useful when you really need to fetch an entity, to create a child entity, you don’t need to fetch the parent entity just to set the Foreign Key column value.

Especially for batch processing tasks that need to create many such entities, the findById Anti-Pattern can easily lead to N+1 query issues, so better avoid it.

Follow @vlad_mihalcea