How do persist and merge work in JPA

Introduction

When using JPA, entity state transitions are translated automatically to SQL statements. This post is going to explain when to use persist and when to use merge.

Persist

The persist operation must be used only for new entities. From JPA perspective, an entity is new when it has never been associated with a database row, meaning that there is no table record in the database to match the entity in question.

For instance, when executing the following test case:

Post post = new Post();
post.setTitle("High-Performance Java Persistence");

entityManager.persist(post);
LOGGER.info("The post entity identifier is {}", post.getId());

LOGGER.info("Flush Persistence Context");
entityManager.flush();

Hibernate is going to attach the Post entity to the currently running Persistence Context.
The INSERT SQL statement can either be executed directly or postponed until flush time.

IDENTITY

If the entity uses an IDENTITY generator:

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

The INSERT is executed the right away, and Hibernate generates the following output:

INSERT INTO post (id, title) 
VALUES (DEFAULT, 'High-Performance Java Persistence')

-- The Post entity identifier is 1

-- Flush Persistence Context

Whenever an entity is persisted, Hibernate must attach it to the currently running Persistence Context which acts as a Map of entities. The Map key is formed of the entity type (its class) and the entity identifier.

For IDENTITY columns, Hibernate cannot delay the INSERT statement until flush time because the identifier value can only be generated by executing the statement.
For this reason, Hibernate disables JDBC batch inserts for entities using the IDENTITY generator strategy.

SEQUENCE

When using a SEQUENCE identifier strategy, and rerunning the same example, Hibernate generates the following output:

CALL NEXT VALUE FOR 'hibernate_sequence'

-- The post entity identifier is 1

-- Flush Persistence Context

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

This time, the INSERT statement can be delayed until flush-time, and Hibernate can apply batch insert optimizations if you set the batch size configuration property.

The TABLE strategy behaves like SEQUENCE, but you should avoid it at any cost because it uses a separate transaction to generate the entity identifier, therefore putting pressure on the underlying connection pool and the database transaction log.

Even worse, row-level locks are used to coordinate multiple concurrent requests, and, just like Amdhal’s Law tells us, introducing a serializability execution can affect scalability.

Merge

Merging is required only for detached entities.

Assuming we have the following entity:

Post post = doInJPA(entityManager -> {
    Post _post = new Post();
    _post.setTitle("High-Performance Java Persistence");

    entityManager.persist(_post);
    return _post;
});

Because the EntityManager which loaded the Post entity has been closed, the Post becomes detached, and Hibernate can no longer track any changes. The detached entity can be modified, and, to propagate these changes, the entity needs to be reattached to a new Persistence Context:

post.setTitle("High-Performance Java Persistence Rocks!");

doInJPA(entityManager -> {
    LOGGER.info("Merging the Post entity");
    Post post_ = entityManager.merge(post);
});

When running the test case above, Hibernate is going to execute the following statements:

-- Merging the Post entity

SELECT p.id AS id1_0_0_ ,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

UPDATE post 
SET title='High-Performance Java Persistence Rocks!' 
WHERE id=1

Hibernate generates a SELECT statement first to fetch the latest state of the underlying database record, and afterward, it copies the detached entity state onto the newly fetched managed entity. This way, the dirty checking mechanism can detect any state change and propagate it to the database.

While for IDENTITY and SEQUENCE generator strategies, you can practically use merge to persist an entity, for the assigned generator, this would be less efficient.

Considering that the Post entity requires that identifiers are manually assigned:

@Id
private Long id;

When using merge instead of persist:

doInJPA(entityManager -> {
    Post post = new Post();
    post.setId(1L);
    post.setTitle("High-Performance Java Persistence");

    entityManager.merge(post);
});

Hibernate is going to issue a SELECT statement to make sure that there is no record in the database having the same identifier:

SELECT p.id AS id1_0_0_ ,
       p.title AS title2_0_0_
FROM   post p
WHERE  p.id = 1

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence', 1)

You can actually fix this issue by adding a version property to your entity which is actually a good thing to do since you can also prevent lost updates in multi-request transactions:

@Version
private Long version; 

It’s important to use the Java Wrapper (e.g. java.lang.Long) for which Hibernate can check for nullability, instead of a primitive (e.g. long) for the @Version property.

The reason why I wanted to show you this example is that you might happen to use a save method like this one offered by Spring Data SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {

    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

The same rules apply to the Spring Data save method as well. If you ever use an assigned identifier generator, you have to remember to add a Java Wrapper @Version property, otherwise, a redundant SELECT statement is going to be generated.

The redundant save anti-pattern

By now, it’s clear that new entities must go through persist, whereas detached entities must be reattached using merge. However, while reviewing lots of projects, I came to realize that the following anti-pattern is rather widespread:

@Transactional
public void savePostTitle(Long postId, String title) {
    Post post = postRepository.findOne(postId);
    post.setTitle(title);
    postRepository.save(post);
}

The save method serves no purpose. Even if we remove it, Hibernate will still issue the UPDATE statement since the entity is managed and any state change is propagated as long as the currently running EntityManager is open.

This is an anti-pattern because the save call fires a MergeEvent which is handled by the DefaultMergeEventListener which does the following operations:

protected void entityIsPersistent(MergeEvent event, Map copyCache) {
    LOG.trace( "Ignoring persistent instance" );

    final Object entity = event.getEntity();
    final EventSource source = event.getSession();
    final EntityPersister persister = source
        .getEntityPersister( event.getEntityName(), entity );

    ( (MergeContext) copyCache ).put( entity, entity, true );

    cascadeOnMerge( source, persister, entity, copyCache );
    copyValues( persister, entity, entity, source, copyCache );

    event.setResult( entity );
}

In the copyValues method call, the hydrated state is copied again, so a new array is redundantly created, therefore wasting CPU cycles. If the entity has child associations and the merge operation is also cascaded from parent to child entities, the overhead is even greater because each child entity will propagate a MergeEvent and the cycle continues.

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

While a save method might be convenient in some situations, in practice, you should never call merge for entities that are either new or already managed. As a rule of thumb, you shouldn’t be using save with JPA. For new entities, you should always use persist, while for detached entities you need to call merge. For managed entities, you don’t need any save method because Hibernate automatically synchronizes the entity state with the underlying database record.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

14 thoughts on “How do persist and merge work in JPA

  1. Great article, Vlad!

    I didn’t know about this performance issue when using merge with a managed entity. For me, Hibernate was very smart to know if it needs to fire events or not.

    Well, without merge Hibernate doesn’t fire any event?

      1. Yeah, I know this! The problem is I thought Hibernate fired MergeEvent when the state of an entity changed.

        So can’t we consider state-changing as a “simple” merge operation?

      2. State-changing is not related to merge. Only the dirty checking mechanism is the one to propagate state changes. Merge is just to reattach an entity that was fetched in some other Persistence Context.

      3. Humm… this subtle difference turns the understading hard. But you’re right, I already had problems with state-changing not firing MergeEvent to my child associations.

        So I just have to use merge if my intend is to reattach an entity to a Persistence Context or if I want to fire MergeEvents?

      4. You should be merging an entity along with its child entities only if the entity tree got detached. Otherwise, if all entities in that tree are managed, Hibernate will simply trigger the update for you.

        I guess you had problems when you added new child entities to a managed graph, and you probably wanted to call merge on the root entity.
        In this case, it’s much simpler if you just call persist on the newly created subentities and just associate them to the graph. Cascading makes sense more when you really want to persist/remove/merge a whole graph, not just a single entry.

  2. Hey, Vlad. A great article again (as expected 😉). I am kind of wondering if we can do something about the anti-pattern you describe. TBCH, I don’t think it’s an anti-pattern at all. Mostly because of the reason that a repository is supposed to abstract the persistence technology and removing the explicit save(…) call is basically the service assuming an automatic write at the end of the UOW, which is JPA specific. Also, I’d argue it’s not very understandable code. You load an entity, change it and then what? You have to understand details of the mechanics of two layers below to actually make sense of that. The service layer code shouldn’t have to have that knowledge.

    That said, I wonder whether we can be smarter in the implementation of save(…). In the scenario you described the entity is still attached to the current persistence context, which means we should be allowed to skip the call to merge(…) in that case, right? We could check EntityManager.contains(…) before the call to merge (actually, we already do something similar in the implementation of delete(…)).

    That way, the client would still issue the call to save (preserving the descriptiveness of the client code), but we’d actually sort of drop it as we know the persistence provider is going to write the changes eventually.

    Do you think that makes sense?

    1. Hi, Oliver. Thanks for posting your feedback on this topic.

      From Spring Data perspective, the save method is an abstraction that needs to be accommodated for any Persistence framework (JPA, Mongo, Redis, etc.), so checking the entity to see of it’s contained in the currently running Persistence Context and skipping the merge call is actually a good idea.

      My point was to familiarize the reader with the inner workings of JPA because I have the feeling that the entity state management paradigm. Unfortunately, I’ve seen the save methods with custom-made Generic DAO where there was no optimization being made like in Spring Data. In a way, the smarter the save method gets, the less a Spring Data user will have to think about how stuff works in JPA.

      But then, even if we use an abstraction over an API< we still need to know how the underlying API works, right? The same goes with JPA and SQL. Just because we use JPA, we still have to know what an execution plan is, how the Extractor fetches data, why selecting too much data might lead to a full-table scan rather than using an index. Details always matter.

      1. Right, totally agree here. My point was not so much about the user having or not having to understand what’s going on but the effects on the code based on the level of abstraction. Removing the save(…) call from the service method basically brings the code into a state that it only runs in very special scenarios, and might outright fail in others. That’s a bad thing I’d argue.

        On the other hand I am all with you that every layer of abstraction has to be understood to be properly used.

  3. Hi
    after reading your article above I added below in my entity class to avoid the unnecessary select statement before inserting a row in table.
    @Version
    private Long version;

    however I get below error:
    21:54:44.725 [main] DEBUG o.h.e.jdbc.spi.SqlExceptionHelper – could not execute statement [n/a]
    com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column ‘version’ in ‘field list’

    does it mean i need to have a column with VERSION name in my table. If that is the case, is there any other way I can ignore select statement as I know the entries are new before inserting into the table

      1. is that the only option. what about when I insert a new record, what value will it have OR do i need to provide value (though I can keep that as default null).

        is there any other way I can implement this (not executing select statement while inserting records when I know those are new records).

        can I override save method in a way where entityInformation.isNew(entity) always return true?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s