Posted on June 7, 2022 by vladmihalcea

The best Spring Data JpaRepository

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, I’m going to show you the best way to use the Spring Data JpaRepository, which, most often, is used the wrong way.

The biggest issue with the default Spring Data JpaRepository is the fact that it extends the generic CrudRepository, which is not really compatible with the JPA specification.

The JpaRepository save method paradox

There’s no such thing as a save method in JPA because JPA implements the ORM paradigm, not the Active Record pattern.

JPA is basically an entity state machine, as illustrated by the following diagram:

JPA Entity State Machine

As you can clearly see, there’s no save method in JPA.

Now, Hibernate was created before JPA, hence besides implementing the JPA specification, it also provides its own specific methods, such as the update one.

While there are two methods called save and saveOrUpdate in the Hibernate Session, as I explained in this article, they are just an alias for update.

In fact, starting with Hibernate 6, the save and saveOrUpdate methods are now deprecated and will be removed in a future version as they are just a mistake that got carried away from Hibernate 1.

If you create a new entity, you have to call persist so that the entity becomes managed, and the flush will generate the INSERT statement.

If the entity becomes detached and you changed it, you have to propagate the changes back to the database, in which case you can use either merge or update. The former method, merge, copies the detached entity state onto a new entity that has been loaded by the current Persistence Context and lets the flush figure out whether an UPDATE is even necessary. The latter method, update, forces the flush to trigger an UPDATE with the current entity state.

The remove method schedules the removal, and the flush will trigger the DELETE statement.

But, the JpaRepository inherits a save method from the CrudRepository, just like MongoRepository or SimpleJdbcRepository.

However, the MongoRepository and SimpleJdbcRepository take the Active Record approach, while JPA does not.

In fact, the save method of the JpaRepository is implemented like this:

@Transactional
public <S extends T> S save(S entity) {
    if (this.entityInformation.isNew(entity)) {
        this.em.persist(entity);
        return entity;
    } else {
        return this.em.merge(entity);
    }
}

There’s no magic behind the scenes. It’s just either a call to persist or merge in reality.

The save method anti-pattern

Because the JpaRepository features a save method, the vast majority of software developers treat it as such, and you end up bumping into the following anti-pattern:

@Transactional
public void saveAntiPattern(Long postId, String postTitle) {
        
    Post post = postRepository.findById(postId).orElseThrow();

    post.setTitle(postTitle);

    postRepository.save(post);
}

How familiar is that? How many times did you see this “pattern” being employed?

The problem is the save line, which, while unnecessary, it’s not cost-free. Calling merge on a managed entity burns CPU cycles by triggering a MergeEvent, which can be cascaded further down the entity hierarchy only to end up in a code block that does this:

protected void entityIsPersistent(MergeEvent event, Map copyCache) {
    LOG.trace( "Ignoring persistent instance" );

    final Object entity = event.getEntity();
    final EventSource source = event.getSession();
    
    final EntityPersister persister = source.getEntityPersister( 
        event.getEntityName(), 
        entity 
    );

    //before cascade!
    ( (MergeContext) copyCache ).put( entity, entity, true );  
    
    cascadeOnMerge( source, persister, entity, copyCache );
    copyValues( persister, entity, entity, source, copyCache );

    event.setResult( entity );
}

Not only that the merge call doesn’t provide anything beneficial, but it actually adds extra overhead to your response time and makes the cloud provider wealthier with every such call.

And that’s not all. As I explained in this article, the generic save method is not always able to determine whether an entity is new. For instance, if the entity has an assigned identifier, Spring Data JPA will call merge instead of persist, therefore triggering a useless SELECT query. If this happens in the context of a batch processing task, then it’s even worse, you can generate lots of such useless SELECT queries.

So, don’t do that! You can do way better.

The best Spring Data JpaRepository alternative

If the save method is there, people will misuse it. That’s why it’s best not to have it at all and provide the developer with better JPA-friendly alternatives.

The following solution uses the custom Spring Data JPA Repository idiom.

So, we start with the custom HibernateRepository interface that defines the new contract for propagating entity state changes:

public interface HibernateRepository<T> {

    //The findAll method will trigger an UnsupportedOperationException

    @Deprecated
    List<T> findAll();

    //Save methods will trigger an UnsupportedOperationException
    
    @Deprecated
    <S extends T> S save(S entity);

    @Deprecated
    <S extends T> List<S> saveAll(Iterable<S> entities);

    @Deprecated
    <S extends T> S saveAndFlush(S entity);

    @Deprecated
    <S extends T> List<S> saveAllAndFlush(Iterable<S> entities);

    //Persist methods are meant to save newly created entities

    <S extends T> S persist(S entity);

    <S extends T> S persistAndFlush(S entity);

    <S extends T> List<S> persistAll(Iterable<S> entities);

    <S extends T> List<S> persistAllAndFlush(Iterable<S> entities);

    //Merge methods are meant to propagate detached entity state changes
    //if they are really needed
    
    <S extends T> S merge(S entity);

    <S extends T> S mergeAndFlush(S entity);

    <S extends T> List<S> mergeAll(Iterable<S> entities);

    <S extends T> List<S> mergeAllAndFlush(Iterable<S> entities);

    //Update methods are meant to force the detached entity state changes

    <S extends T> S update(S entity);

    <S extends T> S updateAndFlush(S entity);

    <S extends T> List<S> updateAll(Iterable<S> entities);

    <S extends T> List<S> updateAllAndFlush(Iterable<S> entities);

}

The methods in the HibernateRepository interface are implemented by the HibernateRepositoryImpl class, as follows:

public class HibernateRepositoryImpl<T> implements HibernateRepository<T> {

    @PersistenceContext
    private EntityManager entityManager;

    public List<T> findAll() {
        throw new UnsupportedOperationException("Fetching all records from a given database table is a terrible idea!");
    }

    public <S extends T> S save(S entity) {
        return unsupportedSave();
    }

    public <S extends T> List<S> saveAll(Iterable<S> entities) {
        return unsupportedSave();
    }

    public <S extends T> S saveAndFlush(S entity) {
        return unsupportedSave();
    }

    public <S extends T> List<S> saveAllAndFlush(Iterable<S> entities) {
        return unsupportedSave();
    }

    public <S extends T> S persist(S entity) {
        entityManager.persist(entity);
        return entity;
    }

    public <S extends T> S persistAndFlush(S entity) {
        persist(entity);
        entityManager.flush();
        return entity;
    }

    public <S extends T> List<S> persistAll(Iterable<S> entities) {
        List<S> result = new ArrayList<>();
        for(S entity : entities) {
            result.add(persist(entity));
        }
        return result;
    }

    public <S extends T> List<S> persistAllAndFlush(Iterable<S> entities) {
        return executeBatch(() -> {
            List<S> result = new ArrayList<>();
            for(S entity : entities) {
                result.add(persist(entity));
            }
            entityManager.flush();
            return result;
        });
    }

    public <S extends T> S merge(S entity) {
        return entityManager.merge(entity);
    }

    public <S extends T> S mergeAndFlush(S entity) {
        S result = merge(entity);
        entityManager.flush();
        return result;
    }

    public <S extends T> List<S> mergeAll(Iterable<S> entities) {
        List<S> result = new ArrayList<>();
        for(S entity : entities) {
            result.add(merge(entity));
        }
        return result;
    }

    public <S extends T> List<S> mergeAllAndFlush(Iterable<S> entities) {
        return executeBatch(() -> {
            List<S> result = new ArrayList<>();
            for(S entity : entities) {
                result.add(merge(entity));
            }
            entityManager.flush();
            return result;
        });
    }

    public <S extends T> S update(S entity) {
        session().update(entity);
        return entity;
    }

    public <S extends T> S updateAndFlush(S entity) {
        update(entity);
        entityManager.flush();
        return entity;
    }

    public <S extends T> List<S> updateAll(Iterable<S> entities) {
        List<S> result = new ArrayList<>();
        for(S entity : entities) {
            result.add(update(entity));
        }
        return result;
    }

    public <S extends T> List<S> updateAllAndFlush(Iterable<S> entities) {
        return executeBatch(() -> {
            List<S> result = new ArrayList<>();
            for(S entity : entities) {
                result.add(update(entity));
            }
            entityManager.flush();
            return result;
        });
    }

    protected Integer getBatchSize(Session session) {
        SessionFactoryImplementor sessionFactory = session
            .getSessionFactory()
            .unwrap(SessionFactoryImplementor.class);
            
        final JdbcServices jdbcServices = sessionFactory
            .getServiceRegistry()
            .getService(JdbcServices.class);
            
        if(!jdbcServices.getExtractedMetaDataSupport().supportsBatchUpdates()) {
            return Integer.MIN_VALUE;
        }
        return session
            .unwrap(AbstractSharedSessionContract.class)
            .getConfiguredJdbcBatchSize();
    }

    protected <R> R executeBatch(Supplier<R> callback) {
        Session session = session();
        Integer jdbcBatchSize = getBatchSize(session);
        Integer originalSessionBatchSize = session.getJdbcBatchSize();
        try {
            if (jdbcBatchSize == null) {
                session.setJdbcBatchSize(10);
            }
            return callback.get();
        } finally {
            session.setJdbcBatchSize(originalSessionBatchSize);
        }
    }

    protected Session session() {
        return entityManager.unwrap(Session.class);
    }

    protected <S extends T> S unsupportedSave() {
        throw new UnsupportedOperationException(
            "There's no such thing as a save method in JPA, so don't use this hack!"
        );
    }
}

First, all the save methods trigger an UnsupportedOperationException, forcing you to evaluate which entity state transition you are actually supposed to call instead.

Unlike the dummy saveAllAndFlush, the persistAllAndFlush, mergeAllAndFlush, and updateAllAndFlush can benefit from the automatic batching mechanism even if you forgot to configure it previously, as explained in this article.

Testing time

To use the HibernateRepository, all you have to do is extend it beside the standard JpaRepository, like this:

@Repository
public interface PostRepository 
    extends HibernateRepository<Post>, JpaRepository<Post, Long> {

}

Notice that we extend the HibernateRepository first and then the JpaRepository. The order is important as, this way, the finaAll method will be deprecated by your IDE.

That’s it!

This time, there’s no way you can ever bump into the infamous save call anti-pattern:

try {
    transactionTemplate.execute(
            (TransactionCallback<Void>) transactionStatus -> {
        postRepository.save(
            new Post()
                .setId(1L)
                .setTitle("High-Performance Java Persistence")
                .setSlug("high-performance-java-persistence")
        );
        
        return null;
    });

    fail("Should throw UnsupportedOperationException!");
} catch (UnsupportedOperationException expected) {
    LOGGER.warn("You shouldn't call the JpaRepository save method!");
}

Instead, you can use the persist, merge, or update method. So, if I want to persist some new entities, I can do it like this:

postRepository.persist(
    new Post()
        .setId(1L)
        .setTitle("High-Performance Java Persistence")
        .setSlug("high-performance-java-persistence")
);

postRepository.persistAndFlush(
    new Post()
        .setId(2L)
        .setTitle("Hypersistence Optimizer")
        .setSlug("hypersistence-optimizer")
);

postRepository.persistAllAndFlush(
    LongStream.range(3, 1000)
        .mapToObj(i -> new Post()
            .setId(i)
            .setTitle(String.format("Post %d", i))
            .setSlug(String.format("post-%d", i))
        )
        .collect(Collectors.toList())
);

And, pushing the changes from some detached entities back to the database is done as follows:

List<Post> posts = transactionTemplate.execute(transactionStatus ->
    entityManager.createQuery("""
        select p
        from Post p
        where p.id < 10
        """, Post.class)
    .getResultList()
);

posts.forEach(post -> 
    post.setTitle(post.getTitle() + " rocks!")
);

transactionTemplate.execute(transactionStatus ->
    postRepository.updateAll(posts)
);

And, unlike merge, update allows us to avoid some unnecessary SELECT statements, and there’s just a single UPDATE being executed:

Query:["
update 
  post 
set 
  slug=?, 
  title=? 
where 
  id=?"
], 
Params:[
  (high-performance-java-persistence, High-Performance Java Persistence rocks!, 1), 
  (hypersistence-optimizer, Hypersistence Optimizer rocks!, 2), 
  (post-3, Post 3 rocks!, 3), 
  (post-4, Post 4 rocks!, 4), 
  (post-5, Post 5 rocks!, 5), 
  (post-6, Post 6 rocks!, 6), 
  (post-7, Post 7 rocks!, 7), 
  (post-8, Post 8 rocks!, 8), 
  (post-9, Post 9 rocks!, 9)
]

Awesome right?

The BaseJpaRepository alternative

As I explained in this article, a more straightforward approach is to replace the JpaRepository with the BaseJpaRepository from the Hypersistence Utils project.

The BaseJpaRepository is available on Maven Central, so the first thing we need to do is add the Hypersistence Utils dependency.

For instance, if you are using Maven, then you need to add the following dependency to your project pom.xml configuration file.

For Hibernate 6:

<dependency>
    <groupId>io.hypersistence</groupId>
    <artifactId>hypersistence-utils-hibernate-60</artifactId>
    <version>${hypersistence-utils.version}</version>
</dependency>

For Hibernate 5.5 and 5.6:

<dependency>
    <groupId>io.hypersistence</groupId>
    <artifactId>hypersistence-utils-hibernate-55</artifactId>
    <version>${hypersistence-utils.version}</version>
</dependency>

And, for Hibernate 5.4 and 5.3, and 5.2:

<dependency>
    <groupId>io.hypersistence</groupId>
    <artifactId>hypersistence-utils-hibernate-52</artifactId>
    <version>${hypersistence-utils.version}</version>
</dependency>

Afterward, you need to include the BaseJpaRepositoryImpl in the @EnableJpaRepositories configuration like this:

@EnableJpaRepositories(
    value = "your.repository.package",
    repositoryBaseClass = BaseJpaRepositoryImpl.class
)
public class JpaConfiguration {
    ...
}

And that’s it!

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

JPA has no such thing as a save method. It’s just a hack that had to be implemented in the JpaRepository because the method is inherited from the CrudRepository, which is a base interface shared by almost Spring Data projects.

Using the HibernateRepository, not only that you can better reason which method you need to call, but you can also benefit from the update method that provides better performance for batch processing tasks.

Follow @vlad_mihalcea