How do persist and merge work in JPA

(Last Updated On: January 29, 2018)

Introduction

When using JPA, entity state transitions are translated automatically to SQL statements. This post is going to explain when to use persist and when to use merge.

Persist

The persist operation must be used only for new entities. From JPA perspective, an entity is new when it has never been associated with a database row, meaning that there is no table record in the database to match the entity in question.

For instance, when executing the following test case:

Post post = new Post();
post.setTitle("High-Performance Java Persistence");

entityManager.persist(post);
LOGGER.info("The post entity identifier is {}", post.getId());

LOGGER.info("Flush Persistence Context");
entityManager.flush();

Hibernate is going to attach the Post entity to the currently running Persistence Context.
The INSERT SQL statement can either be executed directly or postponed until flush time.

IDENTITY

If the entity uses an IDENTITY generator:

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

The INSERT is executed the right away, and Hibernate generates the following output:

INSERT INTO post (id, title) 
VALUES (DEFAULT, 'High-Performance Java Persistence')

-- The Post entity identifier is 1

-- Flush Persistence Context

Whenever an entity is persisted, Hibernate must attach it to the currently running Persistence Context which acts as a Map of entities. The Map key is formed of the entity type (its class) and the entity identifier.

For IDENTITY columns, Hibernate cannot delay the INSERT statement until flush time because the identifier value can only be generated by executing the statement.
For this reason, Hibernate disables JDBC batch inserts for entities using the IDENTITY generator strategy.

SEQUENCE

When using a SEQUENCE identifier strategy, and rerunning the same example, Hibernate generates the following output:

CALL NEXT VALUE FOR 'hibernate_sequence'

-- The post entity identifier is 1

-- Flush Persistence Context

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

This time, the INSERT statement can be delayed until flush-time, and Hibernate can apply batch insert optimizations if you set the batch size configuration property.

The TABLE strategy behaves like SEQUENCE, but you should avoid it at any cost because it uses a separate transaction to generate the entity identifier, therefore putting pressure on the underlying connection pool and the database transaction log.

Even worse, row-level locks are used to coordinate multiple concurrent requests, and, just like Amdhal’s Law tells us, introducing a serializability execution can affect scalability.

Merge

Merging is required only for detached entities.

Assuming we have the following entity:

Post post = doInJPA(entityManager -> {
    Post _post = new Post();
    _post.setTitle("High-Performance Java Persistence");

    entityManager.persist(_post);
    return _post;
});

Because the EntityManager which loaded the Post entity has been closed, the Post becomes detached, and Hibernate can no longer track any changes. The detached entity can be modified, and, to propagate these changes, the entity needs to be reattached to a new Persistence Context:

post.setTitle("High-Performance Java Persistence Rocks!");

doInJPA(entityManager -> {
    LOGGER.info("Merging the Post entity");
    Post post_ = entityManager.merge(post);
});

When running the test case above, Hibernate is going to execute the following statements:

-- Merging the Post entity

SELECT p.id AS id1_0_0_ ,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

UPDATE post 
SET title='High-Performance Java Persistence Rocks!' 
WHERE id=1

Hibernate generates a SELECT statement first to fetch the latest state of the underlying database record, and afterward, it copies the detached entity state onto the newly fetched managed entity. This way, the dirty checking mechanism can detect any state change and propagate it to the database.

While for IDENTITY and SEQUENCE generator strategies, you can practically use merge to persist an entity, for the assigned generator, this would be less efficient.

Considering that the Post entity requires that identifiers are manually assigned:

@Id
private Long id;

When using merge instead of persist:

doInJPA(entityManager -> {
    Post post = new Post();
    post.setId(1L);
    post.setTitle("High-Performance Java Persistence");

    entityManager.merge(post);
});

Hibernate is going to issue a SELECT statement to make sure that there is no record in the database having the same identifier:

SELECT p.id AS id1_0_0_,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

You can actually fix this issue by adding a version property to your entity which is actually a good thing to do since you can also prevent lost updates in multi-request transactions:

@Version
private Long version; 

If you use the assigned identifier generator, it’s important to use the Java Wrapper (e.g. java.lang.Long) for which Hibernate can check for nullability, instead of a primitive (e.g. long) for the @Version property.

The reason why I wanted to show you this example is that you might happen to use a save method like this one offered by Spring Data SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {

    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

The same rules apply to the Spring Data save method as well. If you ever use an assigned identifier generator, you have to remember to add a Java Wrapper @Version property, otherwise, a redundant SELECT statement is going to be generated.

The redundant save anti-pattern

By now, it’s clear that new entities must go through persist, whereas detached entities must be reattached using merge. However, while reviewing lots of projects, I came to realize that the following anti-pattern is rather widespread:

@Transactional
public void savePostTitle(Long postId, String title) {
    Post post = postRepository.findOne(postId);
    post.setTitle(title);
    postRepository.save(post);
}

The save method serves no purpose. Even if we remove it, Hibernate will still issue the UPDATE statement since the entity is managed and any state change is propagated as long as the currently running EntityManager is open.

This is an anti-pattern because the save call fires a MergeEvent which is handled by the DefaultMergeEventListener which does the following operations:

protected void entityIsPersistent(MergeEvent event, Map copyCache) {
    LOG.trace( "Ignoring persistent instance" );

    final Object entity = event.getEntity();
    final EventSource source = event.getSession();
    final EntityPersister persister = source
        .getEntityPersister( event.getEntityName(), entity );

    ( (MergeContext) copyCache ).put( entity, entity, true );

    cascadeOnMerge( source, persister, entity, copyCache );
    copyValues( persister, entity, entity, source, copyCache );

    event.setResult( entity );
}

In the copyValues method call, the hydrated state is copied again, so a new array is redundantly created, therefore wasting CPU cycles. If the entity has child associations and the merge operation is also cascaded from parent to child entities, the overhead is even greater because each child entity will propagate a MergeEvent and the cycle continues.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While a save method might be convenient in some situations, in practice, you should never call merge for entities that are either new or already managed. As a rule of thumb, you shouldn’t be using save with JPA. For new entities, you should always use persist, while for detached entities you need to call merge. For managed entities, you don’t need any save method because Hibernate automatically synchronizes the entity state with the underlying database record.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

22 thoughts on “How do persist and merge work in JPA

  1. Hi, Vlad! Regarding the save antipattern: I use Spring Data CrudRepository, and it seems I actually have to use that antipattern because the saved entity does not get updated if I use setters afterward, I found that the underlying save method implementation has a @Transactional annotation on it, which I think closes persistent context immediately. For example:

    void doStuff(){
        Child child1 = new Child();
        childRepo.save(child1);
    
    Parent parent = new Parent(child1);
    parentRepo.save(parent);
    
    Child child2 = new Child();
    childRepo.save(child2);
    
    parent.setChild(child2);
    
    parentRepo.findOne(parent.getId()).getChild(); // returns child1
    
    } //parent still has child1 in the DB and the changes from the setter are lost
    

    I create a child entity, save it, then create a parent entity using the child, save the parent and then I create another child entity, save it and call setter on the saved parent entity. Then I call parentEntity.findOne(parent.getId()) and it returns the parent but with the first child, not the one I set using the setter. And when the method ends, changes are lost.

    1. You need the doStuff Service method to be annotated with @Transactional. You want all entities to be saved in the same Persistence Context and database transaction. More, you can use cascading from Parent to Child which makes it redundant to call save on the ChildRepisitory.

      1. Thank you! I figured out how to do that. In particular, the method that I annotate with @Transactional must be inside a @Service class, also I need to annotate @Configuration class with @EnableTransactionManagement, and I may need to use @org.springframework.transaction.annotation.Transactional instead of @javax.transaction.Transactional. Now when method body goes out of scope, the transaction gets committed and it’s all good

Leave a Reply

Your email address will not be published. Required fields are marked *