The best Spring Data JpaRepository

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, I’m going to show you the best way to use the Spring Data JpaRepository, which, most often, is used the wrong way.

The biggest issue with the default Spring Data JpaRepository is the fact that it extends the generic CrudRepository, which is not really compatible with the JPA specification.

The JpaRepository save method paradox

There’s no such thing as a save method in JPA because JPA implements the ORM paradigm, not the Active Record pattern.

JPA is basically an entity state machine, as illustrated by the following diagram:

JPA Entity State Machine

As you can clearly see, there’s no save method in JPA.

Now, Hibernate was created before JPA, hence besides implementing the JPA specification, it also provides its own specific methods, such as the update one.

While there are two methods called save and saveOrUpdate in the Hibernate Session, as I explained in this article, they are just an alias for update.

In fact, starting with Hibernate 6, the save and saveOrUpdate methods are now deprecated and will be removed in a future version as they are just a mistake that got carried away from Hibernate 1.

If you create a new entity, you have to call persist so that the entity becomes managed, and the flush will generate the INSERT statement.

If the entity becomes detached and you changed it, you have to propagate the changes back to the database, in which case you can use either merge or update. The former method, merge, copies the detached entity state onto a new entity that has been loaded by the current Persistence Context and lets the flush figure out whether an UPDATE is even necessary. The latter method, update, forces the flush to trigger an UPDATE with the current entity state.

The remove method schedules the removal, and the flush will trigger the DELETE statement.

But, the JpaRepository inherits a save method from the CrudRepository, just like MongoRepository or SimpleJdbcRepository.

However, the MongoRepository and SimpleJdbcRepository take the Active Record approach, while JPA does not.

In fact, the save method of the JpaRepository is implemented like this:

@Transactional
public <S extends T> S save(S entity) {
    if (this.entityInformation.isNew(entity)) {
        this.em.persist(entity);
        return entity;
    } else {
        return this.em.merge(entity);
    }
}

There’s no magic behind the scenes. It’s just either a call to persist or merge in reality.

The save method anti-pattern

Because the JpaRepository features a save method, the vast majority of software developers treat it as such, and you end up bumping into the following anti-pattern:

@Transactional
public void saveAntiPattern(Long postId, String postTitle) {
        
    Post post = postRepository.findById(postId).orElseThrow();

    post.setTitle(postTitle);

    postRepository.save(post);
}

How familiar is that? How many times did you see this “pattern” being employed?

The problem is the save line, which, while unnecessary, it’s not cost-free. Calling merge on a managed entity burns CPU cycles by triggering a MergeEvent, which can be cascaded further down the entity hierarchy only to end up in a code block that does this:

protected void entityIsPersistent(MergeEvent event, Map copyCache) {
    LOG.trace( "Ignoring persistent instance" );

    final Object entity = event.getEntity();
    final EventSource source = event.getSession();
    
    final EntityPersister persister = source.getEntityPersister( 
        event.getEntityName(), 
        entity 
    );

    //before cascade!
    ( (MergeContext) copyCache ).put( entity, entity, true );  
    
    cascadeOnMerge( source, persister, entity, copyCache );
    copyValues( persister, entity, entity, source, copyCache );

    event.setResult( entity );
}

Not only that the merge call doesn’t provide anything beneficial, but it actually adds extra overhead to your response time and makes the cloud provider wealthier with every such call.

And, that’s not all. As I explained in this article, the generic save method is not always able to determine whether an entity is new. For instance, if the entity has an assigned identifier, Spring Data JPA will call merge instead of persist, therefore triggering a useless SELECT query. If this happens in the context of a batch processing task, then it’s even worse, you can generate lots of such useless SELECT queries.

So, don’t do that! You can do way better.

The best Spring Data JpaRepository alternative

If the save method is there, people will misuse it. That’s why it’s best not to have it at all and provide the developer with better JPA-friendly alternatives.

The following solution uses the custom Spring Data JPA Repository idiom.

So, we start with the custom HibernateRepository interface that defines the new contract for propagating entity state changes:

public interface HibernateRepository<T> {

    //Save methods will trigger an UnsupportedOperationException
    
    @Deprecated
    <S extends T> S save(S entity);

    @Deprecated
    <S extends T> List<S> saveAll(Iterable<S> entities);

    @Deprecated
    <S extends T> S saveAndFlush(S entity);

    @Deprecated
    <S extends T> List<S> saveAllAndFlush(Iterable<S> entities);

    //Persist methods are meant to save newly created entities

    <S extends T> S persist(S entity);

    <S extends T> S persistAndFlush(S entity);

    <S extends T> List<S> persistAll(Iterable<S> entities);

    <S extends T> List<S> peristAllAndFlush(Iterable<S> entities);

    //Merge methods are meant to propagate detached entity state changes
    //if they are really needed
    
    <S extends T> S merge(S entity);

    <S extends T> S mergeAndFlush(S entity);

    <S extends T> List<S> mergeAll(Iterable<S> entities);

    <S extends T> List<S> mergeAllAndFlush(Iterable<S> entities);

    //Update methods are meant to force the detached entity state changes

    <S extends T> S update(S entity);

    <S extends T> S updateAndFlush(S entity);

    <S extends T> List<S> updateAll(Iterable<S> entities);

    <S extends T> List<S> updateAllAndFlush(Iterable<S> entities);

}

The methods in the HibernateRepository interface are implemented by the HibernateRepositoryImpl class, as follows:

public class HibernateRepositoryImpl<T> implements HibernateRepository<T> {

    @PersistenceContext
    private EntityManager entityManager;

    public <S extends T> S save(S entity) {
        return unsupported();
    }

    public <S extends T> List<S> saveAll(Iterable<S> entities) {
        return unsupported();
    }

    public <S extends T> S saveAndFlush(S entity) {
        return unsupported();
    }

    public <S extends T> List<S> saveAllAndFlush(Iterable<S> entities) {
        return unsupported();
    }

    public <S extends T> S persist(S entity) {
        entityManager.persist(entity);
        return entity;
    }

    public <S extends T> S persistAndFlush(S entity) {
        persist(entity);
        entityManager.flush();
        return entity;
    }

    public <S extends T> List<S> persistAll(Iterable<S> entities) {
        List<S> result = new ArrayList<>();
        for(S entity : entities) {
            result.add(persist(entity));
        }
        return result;
    }

    public <S extends T> List<S> peristAllAndFlush(Iterable<S> entities) {
        return executeBatch(() -> {
            List<S> result = new ArrayList<>();
            for(S entity : entities) {
                result.add(persist(entity));
            }
            entityManager.flush();
            return result;
        });
    }

    public <S extends T> S merge(S entity) {
        return entityManager.merge(entity);
    }

    public <S extends T> S mergeAndFlush(S entity) {
        S result = merge(entity);
        entityManager.flush();
        return result;
    }

    public <S extends T> List<S> mergeAll(Iterable<S> entities) {
        List<S> result = new ArrayList<>();
        for(S entity : entities) {
            result.add(merge(entity));
        }
        return result;
    }

    public <S extends T> List<S> mergeAllAndFlush(Iterable<S> entities) {
        return executeBatch(() -> {
            List<S> result = new ArrayList<>();
            for(S entity : entities) {
                result.add(merge(entity));
            }
            entityManager.flush();
            return result;
        });
    }

    public <S extends T> S update(S entity) {
        session().update(entity);
        return entity;
    }

    public <S extends T> S updateAndFlush(S entity) {
        update(entity);
        entityManager.flush();
        return entity;
    }

    public <S extends T> List<S> updateAll(Iterable<S> entities) {
        List<S> result = new ArrayList<>();
        for(S entity : entities) {
            result.add(update(entity));
        }
        return result;
    }

    public <S extends T> List<S> updateAllAndFlush(Iterable<S> entities) {
        return executeBatch(() -> {
            List<S> result = new ArrayList<>();
            for(S entity : entities) {
                result.add(update(entity));
            }
            entityManager.flush();
            return result;
        });
    }

    protected Integer getBatchSize(Session session) {
        SessionFactoryImplementor sessionFactory = session
            .getSessionFactory()
            .unwrap(SessionFactoryImplementor.class);
            
        final JdbcServices jdbcServices = sessionFactory
            .getServiceRegistry()
            .getService(JdbcServices.class);
            
        if(!jdbcServices.getExtractedMetaDataSupport().supportsBatchUpdates()) {
            return Integer.MIN_VALUE;
        }
        return session
            .unwrap(AbstractSharedSessionContract.class)
            .getConfiguredJdbcBatchSize();
    }

    protected <R> R executeBatch(Supplier<R> callback) {
        Session session = session();
        Integer jdbcBatchSize = getBatchSize(session);
        Integer originalSessionBatchSize = session.getJdbcBatchSize();
        try {
            if (jdbcBatchSize == null) {
                session.setJdbcBatchSize(10);
            }
            return callback.get();
        } finally {
            session.setJdbcBatchSize(originalSessionBatchSize);
        }
    }

    protected Session session() {
        return entityManager.unwrap(Session.class);
    }

    protected <S extends T> S unsupported() {
        throw new UnsupportedOperationException(
            "There's no such thing as a save method in JPA, so don't use this hack!"
        );
    }
}

First, all the save methods trigger an UnsupportedOperationException, forcing you to evaluate which entity state transition you are actually supposed to call instead.

Unlike the dummy saveAllAndFlush, the persistAllAndFlush, mergeAllAndFlush, and updateAllAndFlush can benefit from the automatic batching mechanism even if you forgot to configure it previously, as explained in this article.

Testing time

To use the HibernateRepository, all you have to do is extend it beside the standard JpaRepository, like this:

@Repository
public interface PostRepository 
    extends JpaRepository<Post, Long>, HibernateRepository<Post> {

}

That’s it!

This time, there’s no way you can ever bump into the infamous save call anti-pattern:

try {
    transactionTemplate.execute(
            (TransactionCallback<Void>) transactionStatus -> {
        postRepository.save(
            new Post()
                .setId(1L)
                .setTitle("High-Performance Java Persistence")
                .setSlug("high-performance-java-persistence")
        );
        
        return null;
    });

    fail("Should throw UnsupportedOperationException!");
} catch (UnsupportedOperationException expected) {
    LOGGER.warn("You shouldn't call the JpaRepository save method!");
}

Instead, you can use the persist, merge, or update method. So, if I want to persist some new entities, I can do it like this:

postRepository.persist(
    new Post()
        .setId(1L)
        .setTitle("High-Performance Java Persistence")
        .setSlug("high-performance-java-persistence")
);

postRepository.persistAndFlush(
    new Post()
        .setId(2L)
        .setTitle("Hypersistence Optimizer")
        .setSlug("hypersistence-optimizer")
);

postRepository.peristAllAndFlush(
    LongStream.range(3, 1000)
        .mapToObj(i -> new Post()
            .setId(i)
            .setTitle(String.format("Post %d", i))
            .setSlug(String.format("post-%d", i))
        )
        .collect(Collectors.toList())
);

And, pushing the changes from some detached entities back to the database is done as follows:

List<Post> posts = transactionTemplate.execute(transactionStatus ->
    entityManager.createQuery("""
        select p
        from Post p
        where p.id < 10
        """, Post.class)
    .getResultList()
);

posts.forEach(post -> 
    post.setTitle(post.getTitle() + " rocks!")
);

transactionTemplate.execute(transactionStatus ->
    postRepository.updateAll(posts)
);

And, unlike merge, update allows us to avoid some unnecessary SELECT statements, and there’s just a single UPDATE being executed:

Query:["
update 
  post 
set 
  slug=?, 
  title=? 
where 
  id=?"
], 
Params:[
  (high-performance-java-persistence, High-Performance Java Persistence rocks!, 1), 
  (hypersistence-optimizer, Hypersistence Optimizer rocks!, 2), 
  (post-3, Post 3 rocks!, 3), 
  (post-4, Post 4 rocks!, 4), 
  (post-5, Post 5 rocks!, 5), 
  (post-6, Post 6 rocks!, 6), 
  (post-7, Post 7 rocks!, 7), 
  (post-8, Post 8 rocks!, 8), 
  (post-9, Post 9 rocks!, 9)
]

Awesome right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

JPA has no such thing as a save method. It’s just a hack that had to be implemented in the JpaRepository because the method is inherited from the CrudRepository, which is a base interface shared by almost Spring Data projects.

Using the HibernateRepository, not only that you can better reason which method you need to call, but you can also benefit from the update method that provides better performance for batch processing tasks.

Transactions and Concurrency Control eBook

10 Comments on “The best Spring Data JpaRepository

  1. Hi vlad. Thank you for sharing this approch.

    I tried to make the example with post and post_coment entities. I also assigned values to the IDs (primary key). and without surprise we can see that the save makes a select before inserting the values.
    On the other hand, I also tried to do the same thing with a merge, I also have a select before insertion. So what is the difference if both do exactly the same thing?
    With the update nothing happens (no update or addition)!
    And so, if I have a post with a list of comments (all have a given ID) that I save in the database then after I modify the list of comments and then I try to save again. What is the best way to update the post entity? because between save(spring data) and merge I have absolutely no difference. and the update even less.

    • On the other hand, I also tried to do the same thing with a merge, I also have a select before insertion. So what is the difference if both do exactly the same thing?

      That’s where the problem is. Merge is for pushing entity changes from a detached entity to the DB. Merge is not for inserting new data. Persist is for that.

      So, you should use persist for new entities, not merge.

      And so, if I have a post with a list of comments (all have a given ID) that I save in the database then after I modify the list of comments and then I try to save it again.

      It depends on whether the entity graph is attached to the Persistence Context or not. If it is, then it will synchronize automatically. If you know already that you did the modifications, then use the Session.update to force the UPDATE. If you have no idea whether the incoming entity graph has been modified (because this is what the client sends you), then use EntityManager.merge.

  2. Hi Vlad,

    I would like to chime in on this one as you are a well-respected member of the community and usually blogs of yours are well-crafted and insightful. This one here however seems different and apparently does not consider some ideas and concepts of repositories in general, which are important to the Spring Data programming model around those in particular. As the post is going to reach a wide audience, I would like to elaborate on the suggestions you present in the context of those ideas, so that Spring Data users can come to their own conclusions.

    I know you are deeply rooted in JPA and you are an authority on that. I am still surprised that you apparently completely misunderstand the architectural purpose of the Spring Data repository abstraction. It does not exist to make JPA work. Or make it work as efficient as possible. It is an abstraction of a collection of aggregate roots, that can be accessed, added, removed or accessed in a filtered way. You interact with aggregates as follows: you find the ones you are interested in, you change their state, you mark that operation as complete.

    As you point out correctly, the latter is obsolete with JPA (hint: it is not for any other persistence technology, which already indicates that it is more the exception than the rule) due to the assumed managed state of the entity instances. That, however, cannot make any difference to the way you interact with the repositories, as there is functionality tied to the interaction pattern described above. For example, domain events held in the aggregates are only published on the call to save(…) etc.

    Another aspect of repositories is shielding clients from implementation details of the persistence technology underneath them as much as possible. Client code accesses aggregate instances – plain Java objects – and hands them back to confirm state changes. It must not know about some JPA-specific specialty and must not be implemented with the implicit assumption that the changed state would be persisted by some magic, implementation technology-specific trick. It is the worst kind of leaky abstraction, as it is not even leaking explicitly. That “transparent change persistence” feature of JPA, that the post here is centered around is the source of so many questions on Stack Overflow, so much frustration about JPA as “it is doing opaque things”. One reason we see more and more folks consider Spring Data JDBC as – despite being definitely less powerful than full-blown JPA – it provides a way less complex aggregate and database interaction model.

    A final word on your HibernateRepository suggestion. It is not a repository really, as it forwards the underlying persistence technology’s API directly. If you really want that – which is just fine –, just use EntityManager or Session in your client code directly. Less indirection, less magic. Only use the abstractions you really need. Otherwise, you just end up implementing repositories on top of repositories, which is pointless.

    If client code needs to be aware of all the details of a persistence technology, the repository abstraction serves no purpose. If you think that – and it is completely fine to do so –, it is worthwhile to optimize for CPU cycles on that level of abstraction (some argue it is futile if you decided to work with an ORM technology like JPA in the first place), by all means go ahead, ditch the abstraction level and implement your optimization. I would just make sure that this kind of optimization actually has significant enough impact and whether the broken abstraction is not creating a much bigger cost. Furthermore, this kind of “optimization” should by no means be the default way developers think about architectural concepts like repositories in the first place.

    It is fine to think SpringData is too much abstraction. I just do not understand why – if you think that – you still want to force some even lower level type of access into that programming model and poke so many holes into the latter for it to become completely deprived of its original purpose.

    • I’ve been using Spring since 2004, so I do understand it very well. I know why the Spring Data abstraction was designed like that. It’s just that it doesn’t fit perfectly with JPA, hence this article.

      Nevertheless, a custom JPA repository, such as the one presented here, provides an extension that allows devs to use the current abstraction along with additional methods that can deliver better performance and help them understand how things work behind the scenes.

      Now, projects are not created equal. Some can do just fine with a standard approach while others require more consideration. That’s the whole purpose why you created the custom repositories too, to allow devs to provide extensions to the existing abstraction. That’s what the HibernateRepository does too.

      Do all Spring devs have to use this approach? Of course, not! But, when you do need it, it can surely help to provide finer-grained control over what happens behind the scenes.

  3. I support the opinion that save is wrong.

    But why are you using the Hibernate Session and the update method?

    There are also other JPA implementations than Hibernate.

    • But why are you using the Hibernate Session and the update method?

      Because there’s no replacement for the update method in JPA, and, for a batch processing task, update is more efficient than merge. Check out this article for more details.

      There are also other JPA implementations than Hibernate.

      I’ve started my career in 2005 and I only had to use Hibernate ever since. I also doubt I’ll ever use EclipseLink or OpenJPA. I specialized in Hibernate only because performance is all about implementation details, not standard abstractions. Also, the vast majority of Spring projects use Hibernate anyway, so the time investment in studying other JPA providers would not yield a good ROI.

  4. Thank you for the excellent article. Can you estimate when this should land in hibernate-types?

  5. Is this something that Spring Data JPA should do instead of the existing behaviour? Based on the article above, this is what I infer. Maybe this can raised as an issue in Spring Data JPA repo?

    • If you take a look at the PostRepository in my example, you’ll see it extends both the JpaRepository and my own custom abstraction.

      I think of this abstraction as a plugin that’s useful if you use Spring Data JPA with Hibernate. In the case when you have to use EclipseLink, you’d not need the HibernateRepository.

      For this reason, I don’t see how this could be provided by SPring Data JPA which is meant to be abstract. I’ll release this abstraction in the Hibernate Types project, so no worries.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.