How to enable multi-row inserts with the PostgreSQL reWriteBatchedInserts configuration property

(Last Updated On: April 12, 2018)

Introduction

Vladimir Sitnikov has been working on many optimizations to the PostgreSQL JDBC Driver, and one of these is the reWriteBatchedInserts configuration property which he recently told me about.

In this article, you will see how the reWriteBatchedInserts JDBC configuration property works in PostgreSQL, and how it allows you to rewrite INSERT statements into a multi-VALUE INSERT.

Default PostgreSQL batching behavior

Assuming we have the following Post entity:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue(
        strategy = GenerationType.SEQUENCE
    )
    private Long id;

    private String title;

    public Post() {}

    public Post(String title) {
        this.title = title;
    }

    //Getters and setters omitted for brevity
}

And, that we have enabled JDBC batching by setting the following Hibernate configuration property:

<property
    name="hibernate.jdbc.batch_size"
    value="10"
/>

Let’s persist 10 Post entities:

for (int i = 0; i < 10; i++) {
    entityManager.persist(
        new Post(
            String.format("Post no. %d", i + 1)
        )
    );
}

Hibernate will execute the following SQL INSERT statement:

Query: ["insert into post (title, id) values (?, ?)"], 
Params:[(Post no. 1, 1), (Post no. 2, 2), (Post no. 3, 3), 
        (Post no. 4, 4), (Post no. 5, 5), (Post no. 6, 6), 
        (Post no. 7, 7), (Post no. 8, 8), (Post no. 9, 9), 
        (Post no. 10, 10)
]

This is a JDBC-level log taken by datasource-proxy, but to be really sure what happens behind the scenes we need to activate the PostgreSQL log as well. For this, we need to open the postgresql.conf file and set the log_statements property to all:

log_statement = 'all'

Upon inspecting the PostgreSQL log, we can see that, although there was a single database roundtrip, the database server has to process each EXECUTE call individually, and that can take time:

LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 1', $2 = '1'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 2', $2 = '2'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 3', $2 = '3'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 4', $2 = '4'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 5', $2 = '5'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 6', $2 = '6'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 7', $2 = '7'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 8', $2 = '8'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 9', $2 = '9'
LOG:  execute S_2: insert into post (title, id) values ($1, $2)
DETAIL:  parameters: $1 = 'Post no. 10', $2 = '10'

Activating reWriteBatchedInserts

Now, let’s activate the reWriteBatchedInserts configuration and rerun our test case. To set the reWriteBatchedInserts property, we can use the PGSimpleDataSource as follows:

PGSimpleDataSource dataSource = 
    (PGSimpleDataSource) super.dataSource();
    
dataSource.setReWriteBatchedInserts(true);

Now, when rerunning our test case that inserts 10 post records, PostgreSQL logs the following entries:

LOG:  execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4),($5, $6),($7, $8),($9, $10),($11, $12),($13, $14),($15, $16)
DETAIL:  parameters: $1 = 'Post no. 1', $2 = '1', $3 = 'Post no. 2', $4 = '2', $5 = 'Post no. 3', $6 = '3', $7 = 'Post no. 4', $8 = '4', $9 = 'Post no. 5', $10 = '5', $11 = 'Post no. 6', $12 = '6', $13 = 'Post no. 7', $14 = '7', $15 = 'Post no. 8', $16 = '8'
LOG:  execute <unnamed>: insert into post (title, id) values ($1, $2),($3, $4)
DETAIL:  parameters: $1 = 'Post no. 9', $2 = '9', $3 = 'Post no. 10', $4 = '10'

This time, we only have 2 executions instead of 10 which can speed up the batch processing on the database side.

In this StackOverflow answer, Vladimir has given another reason why you should consider the setting. If the batch is fairly large, the PostgreSQL driver will split it into smaller ones to address TCP deadlock issues.

Conclusion

The reWriteBatchedInserts configuration property was added in the PostgreSQL JDBC version 9.4.1209 (2016-07-15), so if you are still using the same JDBC Driver version set up when your project was started, you might want to consider upgrading it if you want to speed up batch inserts.

For more details about this configuration property and some benchmarks Vladimir run, check out this GitHub issue.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

4 thoughts on “How to enable multi-row inserts with the PostgreSQL reWriteBatchedInserts configuration property

  1. Really thanks for the article, very interesting.
    PS
    I see a missing for-loop in inserting the posts.

    1. I realized that this setting is just for INSERT statements while optimistic locking is for UPDATE and DELETE.

Leave a Reply

Your email address will not be published. Required fields are marked *