How to batch INSERT and UPDATE statements with Hibernate

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

JDBC has long been offering support for DML statement batching. By default, all statements are sent one after the other, each one in a separate network round-trip. Batching allows us to send multiple statements in one-shot, saving unnecessary socket stream flushing.

Hibernate hides the database statements behind a transactional write-behind abstraction layer. An intermediate layer allows us to hide the JDBC batching semantics from the persistence layer logic. This way, we can change the JDBC batching strategy without altering the data access code.

Configuring Hibernate to support JDBC batching is not as easy as it should be, so I’m going to explain everything you need to do in order to make it work.

Testing time

We’ll start with the following entity model:

PostComment Jdbc Batch

The Post has a one-to-many association with the Comment entity:

@OneToMany(
    cascade = CascadeType.ALL, 
    mappedBy = "post", 
    orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();

Or test scenario issues both INSERT and UPDATE statements, so we can validate if JDBC batching is being used:

LOGGER.info("Test batch insert");
long startNanos = System.nanoTime();
doInTransaction(session -> {
    int batchSize = batchSize();
    for(int i = 0; i < itemsCount(); i++) {
        if(i % batchSize == 0 && i > 0) {
            session.flush();
            session.clear();
        }

        Post post = new Post(
            String.format("Post no. %d", i)
        );
        int j = 0;
        post.addComment(new Comment(
                String.format(
                    "Post comment %d:%d", i, j++
        )));
        post.addComment(new Comment(
                String.format(
                     "Post comment %d:%d", i, j++
        )));
        session.persist(post);
    }
});

LOGGER.info("{}.testInsert took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos
    ));

LOGGER.info("Test batch update");
startNanos = System.nanoTime();

doInTransaction(session -> {
    List<Post> posts = session.createQuery(
        "select distinct p " +
        "from Post p " +
        "join fetch p.comments c")
    .list();

    for(Post post : posts) {
        post.title = "Blog " + post.title;
        for(Comment comment : post.comments) {
            comment.review = "Blog " + comment.review;
        }
    }
});

LOGGER.info("{}.testUpdate took {} millis",
    getClass().getSimpleName(),
    TimeUnit.NANOSECONDS.toMillis(
        System.nanoTime() - startNanos
    ));

This test will persist a configurable number of Post entities, each one containing two Comments. For the sake of brevity, we are going to persist 3 Posts and the Dialect default batch size:

protected int itemsCount() {
    return 3;
}

protected int batchSize() {
    return Integer.valueOf(Dialect.DEFAULT_BATCH_SIZE);
}

Default batch support

Hibernate doesn’t implicitly employ JDBC batching and each INSERT and UPDATE statement is executed separately:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]}

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} 

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]} 

Configuring hibernate.jdbc.batch_size

To enable JDBC batching, we have to configure the hibernate.jdbc.batch_size property:

A non-zero value enables use of JDBC2 batch updates by Hibernate (e.g. recommended values between 5 and 30)

We’ll set this property and rerun our test:

properties.put("hibernate.jdbc.batch_size", 
    String.valueOf(batchSize()));

This time, the Comment INSERT statements are batched, while the UPDATE statements are left untouched:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} 

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} 

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}

A JDBC batch can target one table only, so every new DML statement targeting a different table ends up the current batch and initiates a new one. Mixing different table statements is therefore undesirable when using SQL batch processing.

Ordering statements

Hibernate can sort INSERT and UPDATE statements using the following configuration options:

properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");

While the Post and Comment INSERT statements are batched accordingly, the UPDATE statements are still executed separately:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 

Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}

Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} 
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 

Adding version data batch support

There’s the hibernate.jdbc.batch_versioned_data configuration property we need to set, in order to enable UPDATE batching:

Set this property to true if your JDBC driver returns correct row counts from executeBatch(). It is usually safe to turn this option on. Hibernate will then use batched DML for automatically versioned data. Defaults to false.

We will rerun our test with this property set too:

properties.put("hibernate.jdbc.batch_versioned_data", "true");

Now both the INSERT and the UPDATE statements are properly batched:

Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]} 

Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}

Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]} 

Benchmark

Now that we managed to configure Hibernate for JDBC batching, we can benchmark the performance gain of statement grouping.

  • the test case uses a PostgreSQL database installed on the same machine with the currently running JVM
  • a batch size of 50 was chosen and each test iteration increases the statement count by an order of magnitude
  • all durations are expressed in milliseconds

The results are:

Statement count No batch Insert duration No batch Update duration Batch Insert duration Batch Update duration
30 218 178 191 144
300 311 327 208 217
3000 1047 1089 556 478
30000 5889 6032 2640 2301
300000 51785 57869 16052 20954

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The more rows we INSERT or UPDATE, the more we can benefit from JDBC batching. For write-most applications (e.g enterprise enterprise batch processors), we should definitely enable JDBC batching as the performance benefits might be staggering.

Code available on GitHub.

FREE EBOOK

14 Comments on “How to batch INSERT and UPDATE statements with Hibernate

  1. Hello Vlad,

    how is batch inserting possible for many to many associations with a join table? We have two Entities Version and Source. Version has a many to many relationship to the sources:

    @ManyToMany
    @JoinTable(name = “version_source”, joinColumns = {@JoinColumn(name = “version_id”)}, inverseJoinColumns = {@JoinColumn(name = “source_id”)})
    private Set sources = new HashSet();

    When I add a lot of sources to a version, there is a single insert statement for each source in the join table.
    The batch insert doesn’t work, because there is no entity insert. All the added source were inserted before adding to the version.

    Is a batch insert possible for this case, too?

    Regards
    Peter

      • Sorry, but I can’t find an answer in this book. I read the Batch Updates and Batching chapter multiple times. The Batch insert works, when I create the entities. Just like creating posts and comments in your example.
        But in my example there is no entity creation. Only the associations of the existing entities are created. But the insert statements for the join tables are not batched.

      • Send me a Pull Request with a replicating test case so I can take a look at it.

  2. Thanks for the information Vlad
    I’m currently using Hibernate 3.6.5 with Oracle 11g (with an Oracle 10 driver for unfathomable reasons…)
    I’ve got the settings as described above as confirmed by log statements. However inserts are not being batched. Updates are – but not inserts. There are multiple inserts to multiple tables – my understanding was that inserts to multiple tables could be batched by adding the order inserts property. A high low sequence is being used to generate primary keys so batching should not be getting switched off (and is being used for updates – just not inserts…)

    See Settings
    2019-07-06 11:05:25,754 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,754 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,755 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,755 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,755 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,756 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,756 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,756 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,756 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –
    2019-07-06 11:05:25,756 INFO [ORB.thread.pool : 0][org.hibernate.cfg.SettingsFactory] –

    See batch size of 1 with inserts
    2019-07-06 11:09:49,936 DEBUG [schedulerFileFactoryBean_Worker-1][org.hibernate.jdbc.AbstractBatcher] –
    2019-07-06 11:09:49,938 DEBUG [schedulerFileFactoryBean_Worker-1][org.hibernate.jdbc.Expectations] –

    See batch size of 20 with updates
    2019-07-06 11:09:48,003 DEBUG [schedulerFactoryBean_Worker-1][org.hibernate.jdbc.AbstractBatcher] –
    2019-07-06 11:09:48,007 DEBUG [schedulerFactoryBean_Worker-1][org.hibernate.jdbc.Expectations] –
    2019-07-06 11:09:48,007 DEBUG [schedulerFactoryBean_Worker-1][org.hibernate.jdbc.Expectations] –
    2019-07-06 11:09:48,007 DEBUG [schedulerFactoryBean_Worker-1][org.hibernate.jdbc.Expectations] –

    • The batch insert had many issues, and it got fixed in 5.4. Therefore, you should consider upgrading if you want to get this and many other fixes as well.

  3. How about session.saveOrUpdate(entity).it can be used for common service right?

    • Yes. It does the same thing as persist, but you should favor the latter.

  4. Hello

    Is it possible to use JDBC Batching on Criteria Updates as well? From what I can see it only works for whole Entities.

    • CriteriaUpdate is for bulk processing which is an alternative to JDBC batching. Usually, you use either one or the other, but you don’t combine them.

  5. I am using Mysql, Spring Data JPA. I am trying to create POC with only 1 table e.g. Customer (ID, FIRST_NAME, LAST_NAME)
    What I am trying to achieve is to update in a batch where update statements are a group as shown above in the example (to reduce database round trips).

    I have set all the above-mentioned properties
    hibernate.order_inserts: true
    hibernate.order_updates: true
    hibernate.jdbc.batch_versioned_data: true

    But the result is (update statements are not getting grouped)

    2018-10-28T03:18:32.545233Z 1711 Query update CUSTOMER set FIRST_NAME=’499997′, LAST_NAME=’499998′ where id=499996;
    2018-10-28T03:18:32.545488Z 1711 Query update CUSTOMER set FIRST_NAME=’499998′, LAST_NAME=’499999′ where id=499997;
    2018-10-28T03:18:32.545809Z 1711 Query update CUSTOMER set FIRST_NAME=’499999′, LAST_NAME=’500000′ where id=499998;

    Desired Result: (updates are grouped as a single query, thus reducing the DB roundtrips)

    2018-10-28T03:18:32.545233Z 1711 Query update CUSTOMER set FIRST_NAME=’499997′, LAST_NAME=’499998′ where id=499996; update CUSTOMER set FIRST_NAME=’499998′, LAST_NAME=’499999′ where id=499997; update CUSTOMER set FIRST_NAME=’499999′, LAST_NAME=’500000′ where id=499998;

      • Thanks. Purchased book from Amazon.

        Please provide some hint, what to study exactly or how to approach this problem.
        Need to solve this urgently before tomorrow. Really facing performance bottleneck in the application. My application needs to update 100 million records.

        Any help/hint is really appreciable.

      • It’s explained in the Batching chapter why it does not work for MySQL, and there’s a solution in the jOOQ chapter which can be implemented fine even without jOOQ.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.