I decided to turn it into a dedicated article and explain how UPSERT and MERGE work in the top 4 most common relational database systems: Oracle, SQL Server, PostgreSQL, and MySQL.
For this article, let’s assume we have the following post and post_details tables which have a one-to-one table relationship.
The original StackOverflow question asks for a find-or-insert mechanism in Hibernate. Basically, we want to retrieve a record from the database, and if the record is not found, we want to safely insert it. However, the solution must work in a multi-threaded environment, so we need a solid concurrency control mechanisms.
On duplicate key, ignore!
Although there are several answers based on Hibernate pessimistic locking mechanism, I came up with a different approach. Instead of finding the record, we can try to insert it first. For this to work, we need a way to skip the INSERT when there’s already a row that was inserted previously.
Now, Hibernate does not support UPSERT or MERGE, but, even if we use Hibernate in our project, it does not mean we cannot combine it with other data access frameworks as well. Many times, only a native SQL query could satisfy a given requirement.
Therefore, you should take advantage of everything the underlying RDBMS has to offer.
For our example, we are going to use jOOQ because it exposes a unified API that translates to the proper UPSERT or MERGE syntax supported by the underlying database.
When using jOOQ, the aforementioned INSERT then SELECT solution can be expressed as follows:
MERGE INTO "ORACLE"."POST_DETAILS"
(SELECT 1 "one" FROM dual)
("ORACLE"."POST_DETAILS"."ID" = 1)
WHEN NOT matched THEN
TIMESTAMP '2017-11-06 16:12:18.407'
Just like its name implies, MERGE is meant to select records from one or more tables so that we can insert or update a given table or view.
In our case, the INSERT is executed only when there is no post_details record with the given identifier.
MERGE INTO [high_performance_java_persistence].[dbo].[post_details]
(SELECT 1 [one]) AS dummy_82901439([one])
[high_performance_java_persistence].[dbo].[post_details].[id] = 1
WHEN NOT matched THEN
Just like Oracle, the SQL Server MERGE statement is used to execute INSERT, UPDATE or DELETE statements on a target table based on the result set generated from a source table. A typical scenario for using MERGE would be when you have to synchronize two tables having the same structure but potentially different data sets. The MERGE statement would then allow you to generate a diff between these two tables.
In our example, the INSERT is executed only when there is no post_details record with the given identifier.
Unlike Oracle and SQL Server which implement the SQL:2003 MERGE statement, PostgreSQL does not, and they offer the UPSERT alternative via the ON CONFLICT DO UPDATE / NOTHING SQL clause.
So, when running our example on PostgreSQL, jOOQ generates the following SQL statement:
INSERT INTO "public"."post_details" (
TIMESTAMP '2017-11-06 16:42:37.692')
ON CONFLICT DO NOTHING
Because of the ON CONFLICT DO NOTHING clause, the INSERT will not fail if there is already a record satisfying the same filtering criteria, which, in our case, means that it has the same Primary Key value.
Looks scary, but it’s actually pretty simple! The test executes the following steps:
First, we delete everything from the post_details and post tables.
Alice, who executes the main thread, is going to insert a post row and fetch the newly inserted record identifier.
Alice executes a MERGE or UPSERT for the post_details record that has the same Primary Key as the newly inserted post record.
After Alixa has executed the MERGE or UPSERT, Bob will start a new transaction and attempt to execute a MERGE or UPSERT for the same post_details record, meaning we are going to use the same Primary Key value.
Bob’s MERGE or UPSERT will be blocked by Alice lock and a timeout exception will be thrown.
Once we catch the timeout exception, we count down the latch on which Alice awaits.
Alice thread can resume and generate the SELECT statement.
The process can be better visualized in the following diagram:
It’s as easy as that!
All tests are available on GitHub, so feel free to fork my GitHub repository and run them yourself:
The way a relational database implements its concurrency control mechanism might not be always obvious for a data access developer. However, it’s very important to understand it, especially in regards to transaction isolation level.