How to enable multi-row inserts with the PostgreSQL reWriteBatchedInserts configuration property

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

Vladimir Sitnikov has been working on many optimizations to the PostgreSQL JDBC Driver, and one of these is the reWriteBatchedInserts configuration property which he recently told me about.

In this article, you will see how the reWriteBatchedInserts JDBC configuration property works in PostgreSQL, and how it allows you to rewrite INSERT statements into a multi-VALUE INSERT.

Default PostgreSQL batching behavior

Assuming we have the following Post entity:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue(
        strategy = GenerationType.SEQUENCE
    )
    private Long id;

    private String title;

    public Post() {}

    public Post(String title) {
        this.title = title;
    }

    //Getters and setters omitted for brevity
}

And, that we have enabled JDBC batching by setting the following Hibernate configuration property:

<property
    name="hibernate.jdbc.batch_size"
    value="10"
/>

Let’s persist 10 Post entities:

for (int i = 0; i < 10; i++) {
    entityManager.persist(
        new Post(
            String.format("Post no. %d", i + 1)
        )
    );
}

Hibernate will execute the following SQL INSERT statement:

Query: ["insert into post (title, id) values (?, ?)"], 
Params:[(Post no. 1, 1), (Post no. 2, 2), (Post no. 3, 3), 
        (Post no. 4, 4), (Post no. 5, 5), (Post no. 6, 6), 
        (Post no. 7, 7), (Post no. 8, 8), (Post no. 9, 9), 
        (Post no. 10, 10)
]

This is a JDBC-level log taken by datasource-proxy, but to be really sure what happens behind the scenes we need to activate the PostgreSQL log as well. For this, we need to open the postgresql.conf file and set the log_statements property to all:

log_statement = 'all'

Upon inspecting the PostgreSQL log, we can see that, although there was a single database roundtrip, the database server has to process each EXECUTE call individually, and that can take time:

LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 1', $2 = '1'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 2', $2 = '2'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 3', $2 = '3'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 4', $2 = '4'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 5', $2 = '5'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 6', $2 = '6'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 7', $2 = '7'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 8', $2 = '8'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 9', $2 = '9'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 10', $2 = '10'

Activating reWriteBatchedInserts

Now, let’s activate the reWriteBatchedInserts configuration and rerun our test case. To set the reWriteBatchedInserts property, we can use the PGSimpleDataSource as follows:

PGSimpleDataSource dataSource = 
    (PGSimpleDataSource) super.dataSource();
    
dataSource.setReWriteBatchedInserts(true);

Now, when rerunning our test case that inserts 10 post records, PostgreSQL logs the following entries:

LOG:  execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4),($5, $6),($7, $8),($9, $10),($11, $12),($13, $14),($15, $16)
DETAIL:  parameters: $1 = 'Post no. 1', $2 = '1', $3 = 'Post no. 2', $4 = '2', $5 = 'Post no. 3', $6 = '3', $7 = 'Post no. 4', $8 = '4', $9 = 'Post no. 5', $10 = '5', $11 = 'Post no. 6', $12 = '6', $13 = 'Post no. 7', $14 = '7', $15 = 'Post no. 8', $16 = '8'
LOG:  execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4)
DETAIL:  parameters: $1 = 'Post no. 9', $2 = '9', $3 = 'Post no. 10', $4 = '10'

This time, we only have 2 executions instead of 10 which can speed up the batch processing on the database side.

In this StackOverflow answer, Vladimir has given another reason why you should consider the setting. If the batch is fairly large, the PostgreSQL driver will split it into smaller ones to address TCP deadlock issues.

I'm running an online workshopk on the 14th of May about The Best Way to Fetch Data with Java Persistence and Hibernate.

Conclusion

The reWriteBatchedInserts configuration property was added in the PostgreSQL JDBC version 9.4.1209 (2016-07-15), so if you are still using the same JDBC Driver version set up when your project was started, you might want to consider upgrading it if you want to speed up batch inserts.

For more details about this configuration property and some benchmarks Vladimir run, check out this GitHub issue.

Transactions and Concurrency Control eBook

10 Comments on “How to enable multi-row inserts with the PostgreSQL reWriteBatchedInserts configuration property

  1. hmm.I set this property . And check the how many query executed. In both case, with reWriteBatchedInserts true or false hibernate log insert query 100 times.

    • Check the JDBC dialect version as well. Maybe it’s an old version you are using. Also, check out my High-Performance Java Persistence GitHub repository as there are tests showing both default batching and PostgreSQL rewrite.

      • I read all of them and tested them. but did not affect. all libs are fresh)). Postgres jdbc driver 42.2.9 and hibernate 5.4.10.Final. But I saw that I used this class and this is the old one.

      • Try to send me a Pull Request with a replicating test case in the GitHub repository so I can take a look next week.

      • Thank you very much Vlad. But I want to solve the problem by myself. And apply what we learned in Oslo. If I can not find any solution then I will do it.

      • Sure thing. Try to debug it and see whether addBatch is called on the JDBC PreparedStatement.

  2. Yep. On hibernate version 5.4.10-Final without reWriteBatchedInserts does not affect. You should enable both of them otherwise you could not benefit from the batching process.

    • Batching should work just fine even without reWriteBatchedInserts. The only effect of reWriteBatchedInserts is that the SQL query is rewritten so that the execution is more efficient on the DB side. Without it, the default batching will only save the network roundtrips.

      • Yesterday I tested. But does not work. I set hibernate.jdbc.batch_size=50 and insert 100 rows. but every time when I execute the test hibernate generates 100 inserts. And I also check the database log. 100 insert send to the database. But when I set reWriteBatchedInserts database execute less inserts

      • It’s important to pay attention to how you log statements. As I explained you during the training in Oslo, Hibernate logs statement at prepared time, nor execution time. Better use datasource-proxy for logging or check the DB logs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.