How do persist and merge work in JPA

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, I’m going to explain how the persist and merge entity operations work when using JPA and Hibernate.

When using JPA, entity state transitions are translated automatically to SQL statements. This post is going to explain when to use persist and when to use merge.

Persist

The persist operation must be used only for new entities. From JPA perspective, an entity is new when it has never been associated with a database row, meaning that there is no table record in the database to match the entity in question.

For instance, when executing the following test case:

Post post = new Post();
post.setTitle("High-Performance Java Persistence");

entityManager.persist(post);
LOGGER.info("The post entity identifier is {}", post.getId());

LOGGER.info("Flush Persistence Context");
entityManager.flush();

Hibernate is going to attach the Post entity to the currently running Persistence Context.
The INSERT SQL statement can either be executed directly or postponed until flush time.

IDENTITY

If the entity uses an IDENTITY generator:

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

The INSERT is executed the right away, and Hibernate generates the following output:

INSERT INTO post (id, title) 
VALUES (DEFAULT, 'High-Performance Java Persistence')

-- The Post entity identifier is 1

-- Flush Persistence Context

Whenever an entity is persisted, Hibernate must attach it to the currently running Persistence Context which acts as a Map of entities. The Map key is formed of the entity type (its Java Class) and the entity identifier.

For IDENTITY columns, the only way to know the identifier value is to execute the SQL INSERT. Hence, the INSERT is executed when the persist method is called and cannot be disabled until flush time.

For this reason, Hibernate disables JDBC batch inserts for entities using the IDENTITY generator strategy.

SEQUENCE

When using a SEQUENCE identifier strategy, and rerunning the same example, Hibernate generates the following output:

CALL NEXT VALUE FOR 'hibernate_sequence'

-- The post entity identifier is 1

-- Flush Persistence Context

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

This time, the INSERT statement can be delayed until flush-time, and Hibernate can apply batch insert optimizations if you set the batch size configuration property.

The TABLE strategy behaves like SEQUENCE, but you should avoid it at any cost because it uses a separate transaction to generate the entity identifier, therefore putting pressure on the underlying connection pool and the database transaction log.

Even worse, row-level locks are used to coordinate multiple concurrent requests, and, just like Amdhal’s Law tells us, introducing a Serializability execution can affect scalability.

For more details about why you should avoid the TABLE strategy, check out this article.

Merge

Merging is required only for detached entities.

Assuming we have the following entity:

Post post = doInJPA(entityManager -> {
    Post _post = new Post();
    _post.setTitle("High-Performance Java Persistence");

    entityManager.persist(_post);
    return _post;
});

Because the EntityManager which loaded the Post entity has been closed, the Post becomes detached, and Hibernate can no longer track any changes. The detached entity can be modified, and, to propagate these changes, the entity needs to be reattached to a new Persistence Context:

post.setTitle("High-Performance Java Persistence Rocks!");

doInJPA(entityManager -> {
    LOGGER.info("Merging the Post entity");
    Post post_ = entityManager.merge(post);
});

When running the test case above, Hibernate is going to execute the following statements:

-- Merging the Post entity

SELECT p.id AS id1_0_0_ ,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

UPDATE post 
SET title='High-Performance Java Persistence Rocks!' 
WHERE id=1

Hibernate generates a SELECT statement first to fetch the latest state of the underlying database record, and afterward, it copies the detached entity state onto the newly fetched managed entity. This way, the dirty checking mechanism can detect any state change and propagate it to the database.

While for IDENTITY and SEQUENCE generator strategies, you can practically use merge to persist an entity, for the assigned generator, this would be less efficient.

Considering that the Post entity requires that identifiers are manually assigned:

@Id
private Long id;

When using merge instead of persist:

doInJPA(entityManager -> {
    Post post = new Post();
    post.setId(1L);
    post.setTitle("High-Performance Java Persistence");

    entityManager.merge(post);
});

Hibernate is going to issue a SELECT statement to make sure that there is no record in the database having the same identifier:

SELECT p.id AS id1_0_0_,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

You can actually fix this issue by adding a version property to your entity which is actually a good thing to do since you can also prevent lost updates in multi-request transactions:

@Version
private Long version; 

If you use the assigned identifier generator, it’s important to use the Java Wrapper (e.g. java.lang.Long) for which Hibernate can check for nullability, instead of a primitive (e.g. long) for the @Version property.

The reason why I wanted to show you this example is that you might happen to use a save method like this one offered by Spring Data SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {

    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

The same rules apply to the Spring Data save method as well. If you ever use an assigned identifier generator, you have to remember to add a Java Wrapper @Version property, otherwise, a redundant SELECT statement is going to be generated.

The redundant save anti-pattern

By now, it’s clear that new entities must go through persist, whereas detached entities must be reattached using merge. However, while reviewing lots of projects, I came to realize that the following anti-pattern is rather widespread:

@Transactional
public void savePostTitle(Long postId, String title) {
    Post post = postRepository.findOne(postId);
    post.setTitle(title);
    postRepository.save(post);
}

The save method serves no purpose. Even if we remove it, Hibernate will still issue the UPDATE statement since the entity is managed and any state change is propagated as long as the currently running EntityManager is open.

This is an anti-pattern because the save call fires a MergeEvent which is handled by the DefaultMergeEventListener which does the following operations:

protected void entityIsPersistent(MergeEvent event, Map copyCache) {
    LOG.trace( "Ignoring persistent instance" );

    final Object entity = event.getEntity();
    final EventSource source = event.getSession();
    final EntityPersister persister = source
        .getEntityPersister( event.getEntityName(), entity );

    ( (MergeContext) copyCache ).put( entity, entity, true );

    cascadeOnMerge( source, persister, entity, copyCache );
    copyValues( persister, entity, entity, source, copyCache );

    event.setResult( entity );
}

In the copyValues method call, the hydrated state is copied again, so a new array is redundantly created, therefore wasting CPU cycles. If the entity has child associations and the merge operation is also cascaded from parent to child entities, the overhead is even greater because each child entity will propagate a MergeEvent and the cycle continues.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While a save method might be convenient in some situations, in practice, you should never call merge for entities that are either new or already managed. As a rule of thumb, you shouldn’t be using save with JPA. For new entities, you should always use persist, while for detached entities you need to call merge. For managed entities, you don’t need any save method because Hibernate automatically synchronizes the entity state with the underlying database record.

Transactions and Concurrency Control eBook

3 Comments on “How do persist and merge work in JPA

  1. This is a really good and informative article. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.