Introduction
When using JPA, entity state transitions are translated automatically to SQL statements. This post is going to explain when to use persist
and when to use merge
.
Persist
The persist
operation must be used only for new entities. From JPA perspective, an entity is new when it has never been associated with a database row, meaning that there is no table record in the database to match the entity in question.
For instance, when executing the following test case:
Post post = new Post(); post.setTitle("High-Performance Java Persistence"); entityManager.persist(post); LOGGER.info("The post entity identifier is {}", post.getId()); LOGGER.info("Flush Persistence Context"); entityManager.flush();
Hibernate is going to attach the Post
entity to the currently running Persistence Context.
The INSERT
SQL statement can either be executed directly or postponed until flush time.
IDENTITY
If the entity uses an IDENTITY generator:
@Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id;
The INSERT
is executed the right away, and Hibernate generates the following output:
INSERT INTO post (id, title) VALUES (DEFAULT, 'High-Performance Java Persistence') -- The Post entity identifier is 1 -- Flush Persistence Context
Whenever an entity is persisted, Hibernate must attach it to the currently running Persistence Context which acts as a Map of entities. The Map key is formed of the entity type (its class) and the entity identifier.
For IDENTITY
columns, Hibernate cannot delay the INSERT
statement until flush time because the identifier value can only be generated by executing the statement.
For this reason, Hibernate disables JDBC batch inserts for entities using the IDENTITY
generator strategy.
SEQUENCE
When using a SEQUENCE
identifier strategy, and rerunning the same example, Hibernate generates the following output:
CALL NEXT VALUE FOR 'hibernate_sequence' -- The post entity identifier is 1 -- Flush Persistence Context INSERT INTO post (title, id) VALUES ('High-Performance Java Persistence', 1)
This time, the INSERT
statement can be delayed until flush-time, and Hibernate can apply batch insert optimizations if you set the batch size configuration property.
The TABLE strategy behaves like SEQUENCE, but you should avoid it at any cost because it uses a separate transaction to generate the entity identifier, therefore putting pressure on the underlying connection pool and the database transaction log.
Even worse, row-level locks are used to coordinate multiple concurrent requests, and, just like Amdhal’s Law tells us, introducing a serializability execution can affect scalability.
Merge
Merging is required only for detached entities.
Assuming we have the following entity:
Post post = doInJPA(entityManager -> { Post _post = new Post(); _post.setTitle("High-Performance Java Persistence"); entityManager.persist(_post); return _post; });
Because the EntityManager
which loaded the Post
entity has been closed, the Post
becomes detached, and Hibernate can no longer track any changes. The detached entity can be modified, and, to propagate these changes, the entity needs to be reattached to a new Persistence Context:
post.setTitle("High-Performance Java Persistence Rocks!"); doInJPA(entityManager -> { LOGGER.info("Merging the Post entity"); Post post_ = entityManager.merge(post); });
When running the test case above, Hibernate is going to execute the following statements:
-- Merging the Post entity SELECT p.id AS id1_0_0_ , p.title AS title2_0_0_ FROM post p WHERE p.id = 1 UPDATE post SET title='High-Performance Java Persistence Rocks!' WHERE id=1
Hibernate generates a SELECT
statement first to fetch the latest state of the underlying database record, and afterward, it copies the detached entity state onto the newly fetched managed entity. This way, the dirty checking mechanism can detect any state change and propagate it to the database.
While for IDENTITY
and SEQUENCE
generator strategies, you can practically use merge
to persist an entity, for the assigned generator, this would be less efficient.
Considering that the Post
entity requires that identifiers are manually assigned:
@Id private Long id;
When using merge
instead of persist
:
doInJPA(entityManager -> { Post post = new Post(); post.setId(1L); post.setTitle("High-Performance Java Persistence"); entityManager.merge(post); });
Hibernate is going to issue a SELECT
statement to make sure that there is no record in the database having the same identifier:
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 1 INSERT INTO post (title, id) VALUES ('High-Performance Java Persistence', 1)
You can actually fix this issue by adding a version property to your entity which is actually a good thing to do since you can also prevent lost updates in multi-request transactions:
@Version private Long version;
If you use the assigned identifier generator, it’s important to use the Java Wrapper (e.g.
java.lang.Long
) for which Hibernate can check for nullability, instead of a primitive (e.g. long) for the@Version
property.
The reason why I wanted to show you this example is that you might happen to use a save
method like this one offered by Spring Data SimpleJpaRepository:
@Transactional public <S extends T> S save(S entity) { if (entityInformation.isNew(entity)) { em.persist(entity); return entity; } else { return em.merge(entity); } }
The same rules apply to the Spring Data save
method as well. If you ever use an assigned identifier generator, you have to remember to add a Java Wrapper @Version
property, otherwise, a redundant SELECT
statement is going to be generated.
The redundant save anti-pattern
By now, it’s clear that new entities must go through persist
, whereas detached entities must be reattached using merge
. However, while reviewing lots of projects, I came to realize that the following anti-pattern is rather widespread:
@Transactional public void savePostTitle(Long postId, String title) { Post post = postRepository.findOne(postId); post.setTitle(title); postRepository.save(post); }
The save
method serves no purpose. Even if we remove it, Hibernate will still issue the UPDATE
statement since the entity is managed and any state change is propagated as long as the currently running EntityManager
is open.
This is an anti-pattern because the save
call fires a MergeEvent
which is handled by the DefaultMergeEventListener
which does the following operations:
protected void entityIsPersistent(MergeEvent event, Map copyCache) { LOG.trace( "Ignoring persistent instance" ); final Object entity = event.getEntity(); final EventSource source = event.getSession(); final EntityPersister persister = source .getEntityPersister( event.getEntityName(), entity ); ( (MergeContext) copyCache ).put( entity, entity, true ); cascadeOnMerge( source, persister, entity, copyCache ); copyValues( persister, entity, entity, source, copyCache ); event.setResult( entity ); }
In the copyValues
method call, the hydrated state is copied again, so a new array is redundantly created, therefore wasting CPU cycles. If the entity has child associations and the merge
operation is also cascaded from parent to child entities, the overhead is even greater because each child entity will propagate a MergeEvent
and the cycle continues.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
While a save
method might be convenient in some situations, in practice, you should never call merge
for entities that are either new or already managed. As a rule of thumb, you shouldn’t be using save
with JPA. For new entities, you should always use persist
, while for detached entities you need to call merge
. For managed entities, you don’t need any save
method because Hibernate automatically synchronizes the entity state with the underlying database record.
Hi, Vlad! Regarding the save antipattern: I use Spring Data CrudRepository, and it seems I actually have to use that antipattern because the saved entity does not get updated if I use setters afterward, I found that the underlying save method implementation has a @Transactional annotation on it, which I think closes persistent context immediately. For example:
I create a child entity, save it, then create a parent entity using the child, save the parent and then I create another child entity, save it and call setter on the saved parent entity. Then I call parentEntity.findOne(parent.getId()) and it returns the parent but with the first child, not the one I set using the setter. And when the method ends, changes are lost.
You need the doStuff Service method to be annotated with
@Transactional
. You want all entities to be saved in the same Persistence Context and database transaction. More, you can use cascading from Parent to Child which makes it redundant to call save on the ChildRepisitory.Thank you! I figured out how to do that. In particular, the method that I annotate with @Transactional must be inside a @Service class, also I need to annotate @Configuration class with @EnableTransactionManagement, and I may need to use @org.springframework.transaction.annotation.Transactional instead of @javax.transaction.Transactional. Now when method body goes out of scope, the transaction gets committed and it’s all good