How do persist and merge work in JPA

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

In this article, I’m going to explain how the persist and merge entity operations work when using JPA and Hibernate.

When using JPA, entity state transitions are translated automatically to SQL statements. This post is going to explain when to use persist and when to use merge.

Persist

The persist operation must be used only for new entities. From JPA perspective, an entity is new when it has never been associated with a database row, meaning that there is no table record in the database to match the entity in question.

For instance, when executing the following test case:

Post post = new Post();
post.setTitle("High-Performance Java Persistence");

entityManager.persist(post);
LOGGER.info("The post entity identifier is {}", post.getId());

LOGGER.info("Flush Persistence Context");
entityManager.flush();

Hibernate is going to attach the Post entity to the currently running Persistence Context.
The INSERT SQL statement can either be executed directly or postponed until flush time.

IDENTITY

If the entity uses an IDENTITY generator:

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

The INSERT is executed the right away, and Hibernate generates the following output:

INSERT INTO post (id, title) 
VALUES (DEFAULT, 'High-Performance Java Persistence')

-- The Post entity identifier is 1

-- Flush Persistence Context

Whenever an entity is persisted, Hibernate must attach it to the currently running Persistence Context which acts as a Map of entities. The Map key is formed of the entity type (its Java Class) and the entity identifier.

For IDENTITY columns, the only way to know the identifier value is to execute the SQL INSERT. Hence, the INSERT is executed when the persist method is called and cannot be disabled until flush time.

For this reason, Hibernate disables JDBC batch inserts for entities using the IDENTITY generator strategy.

SEQUENCE

When using a SEQUENCE identifier strategy, and rerunning the same example, Hibernate generates the following output:

CALL NEXT VALUE FOR 'hibernate_sequence'

-- The post entity identifier is 1

-- Flush Persistence Context

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

This time, the INSERT statement can be delayed until flush-time, and Hibernate can apply batch insert optimizations if you set the batch size configuration property.

The TABLE strategy behaves like SEQUENCE, but you should avoid it at any cost because it uses a separate transaction to generate the entity identifier, therefore putting pressure on the underlying connection pool and the database transaction log.

Even worse, row-level locks are used to coordinate multiple concurrent requests, and, just like Amdhal’s Law tells us, introducing a serializability execution can affect scalability.

For more details about why you should avoid the TABLE strategy, check out this article.

Merge

Merging is required only for detached entities.

Assuming we have the following entity:

Post post = doInJPA(entityManager -> {
    Post _post = new Post();
    _post.setTitle("High-Performance Java Persistence");

    entityManager.persist(_post);
    return _post;
});

Because the EntityManager which loaded the Post entity has been closed, the Post becomes detached, and Hibernate can no longer track any changes. The detached entity can be modified, and, to propagate these changes, the entity needs to be reattached to a new Persistence Context:

post.setTitle("High-Performance Java Persistence Rocks!");

doInJPA(entityManager -> {
    LOGGER.info("Merging the Post entity");
    Post post_ = entityManager.merge(post);
});

When running the test case above, Hibernate is going to execute the following statements:

-- Merging the Post entity

SELECT p.id AS id1_0_0_ ,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

UPDATE post 
SET title='High-Performance Java Persistence Rocks!' 
WHERE id=1

Hibernate generates a SELECT statement first to fetch the latest state of the underlying database record, and afterward, it copies the detached entity state onto the newly fetched managed entity. This way, the dirty checking mechanism can detect any state change and propagate it to the database.

While for IDENTITY and SEQUENCE generator strategies, you can practically use merge to persist an entity, for the assigned generator, this would be less efficient.

Considering that the Post entity requires that identifiers are manually assigned:

@Id
private Long id;

When using merge instead of persist:

doInJPA(entityManager -> {
    Post post = new Post();
    post.setId(1L);
    post.setTitle("High-Performance Java Persistence");

    entityManager.merge(post);
});

Hibernate is going to issue a SELECT statement to make sure that there is no record in the database having the same identifier:

SELECT p.id AS id1_0_0_,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

You can actually fix this issue by adding a version property to your entity which is actually a good thing to do since you can also prevent lost updates in multi-request transactions:

@Version
private Long version; 

If you use the assigned identifier generator, it’s important to use the Java Wrapper (e.g. java.lang.Long) for which Hibernate can check for nullability, instead of a primitive (e.g. long) for the @Version property.

The reason why I wanted to show you this example is that you might happen to use a save method like this one offered by Spring Data SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {

    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

The same rules apply to the Spring Data save method as well. If you ever use an assigned identifier generator, you have to remember to add a Java Wrapper @Version property, otherwise, a redundant SELECT statement is going to be generated.

The redundant save anti-pattern

By now, it’s clear that new entities must go through persist, whereas detached entities must be reattached using merge. However, while reviewing lots of projects, I came to realize that the following anti-pattern is rather widespread:

@Transactional
public void savePostTitle(Long postId, String title) {
    Post post = postRepository.findOne(postId);
    post.setTitle(title);
    postRepository.save(post);
}

The save method serves no purpose. Even if we remove it, Hibernate will still issue the UPDATE statement since the entity is managed and any state change is propagated as long as the currently running EntityManager is open.

This is an anti-pattern because the save call fires a MergeEvent which is handled by the DefaultMergeEventListener which does the following operations:

protected void entityIsPersistent(MergeEvent event, Map copyCache) {
    LOG.trace( "Ignoring persistent instance" );

    final Object entity = event.getEntity();
    final EventSource source = event.getSession();
    final EntityPersister persister = source
        .getEntityPersister( event.getEntityName(), entity );

    ( (MergeContext) copyCache ).put( entity, entity, true );

    cascadeOnMerge( source, persister, entity, copyCache );
    copyValues( persister, entity, entity, source, copyCache );

    event.setResult( entity );
}

In the copyValues method call, the hydrated state is copied again, so a new array is redundantly created, therefore wasting CPU cycles. If the entity has child associations and the merge operation is also cascaded from parent to child entities, the overhead is even greater because each child entity will propagate a MergeEvent and the cycle continues.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Seize the deal! 40% discount. Seize the deal! 40% discount.

Conclusion

While a save method might be convenient in some situations, in practice, you should never call merge for entities that are either new or already managed. As a rule of thumb, you shouldn’t be using save with JPA. For new entities, you should always use persist, while for detached entities you need to call merge. For managed entities, you don’t need any save method because Hibernate automatically synchronizes the entity state with the underlying database record.

Transactions and Concurrency Control eBook

16 Comments on “How do persist and merge work in JPA

  1. We found a weird issue in our application because of using save(). We had an entity object save in a service method. The whole service class was made @Transactional.
    After the save, we have been setting some new values to use it for another method call. This got “merged” to the DB entity. Until seeing your post, I’ve no idea why this is getting merged even after save. Your post is more helpful explaining the impact of using save.

  2. It’s not wrong. There’s functionality tied to the invocation of the save(…) method on the repository, e.g. the publication of domain events accumulated in the aggregate. JPA being the odd one out here doesn’t change the fundamental principles in the way repositories work (the concept in general, not the Spring Data JPA specific implementation).

    • For Hibernate, event listeners could be registered to fire when a given entity is persisted or merged. I think that’s a much better approach to this problem if the application uses Hibernate.

      • My original reply was supposed to reply to this comment:

        https://vladmihalcea.com/jpa-persist-and-merge/#comment-75714

        While that’s technically correct from a JPA implementation point of view, it’s completely backwards from an architectural and conceptual point of view. The lack of the need to explicitly mark state changes as “done” in JPA is a JPA oddity. By no means should service layer code have to know about such an implementation detail.

        The save method serves no purpose.

        This statement is completely overreaching as you cannot even remotely judge whether that is the case. You can argue that calling methods on JPA API is superfluous. But from that, you cannot derive that a call on user provided API is. There could be other functionality tied to the invocation of the save(…) method (hint: there is). And just because some persistence technology decides it doesn’t need that call, it’s quite a stretch to derive from that, that the call to save(…) on a much higher abstraction level is not needed. Not calling save(…) is leaky abstraction by definition. If such a call is obsolete, and the reason for that being some intricacy of the persistence technology implementation at hand, its their task to detect superflous calls and simply ignore them.

      • Well, calling merge on entities with no auto-generated id will issue an extra Select. Doing that in a batch process might have a performance impact (hundreds of entities being inserted) while the developer has little idea where the problem originated.

        Some CRUD operations are easy to abstract, like JDBC or documentat stores. Others, like JPA, are not. Hence, the problem.

      • I don’t disagree on the problem. I disagree on the consequence being to tell people not to call methods on higher level abstractions anymore and by that potentially creating code that is broken. You rightfully point at an optimization problem, but that needs to be solved in the implementation, not on the call side. It’s better to have less optimal code (some people argue, that you already made that decision when choosing JPA) than broken code.

      • There’s no broken code in what I suggest people to do. After all, calling persist or merge on the EntityManager is just as easy as calling save on a Repository.

      • We already decided to roll that back, just as you commented over there 🙃

  3. Hi Vlad,

    About the “The redundant save anti-pattern”: I agree the save operation has no effect in this case, but it seems what you describe as an anti-pattern is the recommended way to do according to the official Spring Data docs. See the sample code in the “Transactionality” section: https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#transactions

    Maybe it is a good thing to always call “save” after making updates so the code will always work whatever the context is (transactional or not).

    What are you thoughts about this?

    • The save method makes sense for JDBC, Mongo, or other stores, but not for JPA. So, it depends.

      • This is the Spring Data JPA documentation, hence why I’m asking.

      • That example is wrong. Save is not necessary since the User is already managed. You should open a Jira issue for that so they fix it. That’s a fine example of the redundant-save Anti Pattern.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.