MySQL rewriteBatchedStatements configuration property

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see how MySQL rewriteBatchedStatements works when using JDBC, JPA, or Hibernate.

I first researched this MySQL configuration property when I was writing the batching chapter of my High-Performance Java Persistence book, and, back then, I discovered that this setting allows batching plain Statement by rewriting the SQL string that is sent to the database.

However, the MySQL 6 Connector/J documentation mentioned that:

for prepared statements, server-side prepared statements can not currently take advantage of this rewrite option

So, for a long time, I wrongly assumed this feature was not meant for batching JDBC prepared statements.

It was when I read the MySQL 8.0.30 Connector/J release notes that I realized the documentation had been misleading us:

The description for the connection property rewriteBatchedStatements has been corrected, removing the limitation that server-sided prepared statements could not take advantage of the rewrite option. (Bug #34022110)

So, apparently, the rewriteBatchedStatements was working with JDBC PreparedStatements, and, for this reason, I decided to test this functionality and write down my findings in this article.

Using rewriteBatchedStatements with JDBC Statement batching

Most Java developers use the executeUpdate method of the Statement interface when having to execute INSERT, UPDATE, and DELETE statements.

However, since Java 1.2, the Statement interface has been providing the addBatch that we can use to batch multiple statements so that they are sent on a single request when calling the executeBatch method, as illustrated by the following example:

String INSERT = "insert into post (id, title) values (%1$d, 'Post no. %1$d')";

try(Statement statement = connection.createStatement()) {
    for (long id = 1; id <= 10; id++) {
        statement.addBatch(
            String.format(INSERT, id)
        );
    }
    statement.executeBatch();
}

Now, you’d assume that the above example will execute the INSERT statements in a single database roundtrip, but if you debug through the MySQL JDBC Driver, you’ll find the following code block:

if (this.rewriteBatchedStatements.getValue() && nbrCommands > 4) {
    return executeBatchUsingMultiQueries(
        multiQueriesEnabled, 
        nbrCommands, 
        individualStatementTimeout
    );
}

updateCounts = new long[nbrCommands];

for (int i = 0; i < nbrCommands; i++) {
    updateCounts[i] = -3;
}

int commandIndex = 0;

for (commandIndex = 0; commandIndex < nbrCommands; commandIndex++) {
    try {
        String sql = (String) batchedArgs.get(commandIndex);
        updateCounts[commandIndex] = executeUpdateInternal(sql, true, true);
        
        ...
    } catch (SQLException ex) {
        updateCounts[commandIndex] = EXECUTE_FAILED;

        ...
    }
}

Because the rewriteBatchedStatements is false, each INSERT statement will be executed individually using the executeUpdateInternal method call.

So, even if we used addBatch and executeBatch, by default, MySQL still executes the INSERT statements individually when using the plain JDBC Statement object.

However, if we enable the rewriteBatchedStatements JDBC configuration property:

MysqlDataSource dataSource = new MysqlDataSource();

String url = "jdbc:mysql://localhost/high_performance_java_persistence?useSSL=false";

dataSource.setURL(url);
dataSource.setUser(username());
dataSource.setPassword(password());

dataSource.setRewriteBatchedStatements(true);

And debug the executeBatch method execution, you will see that, now, the executeBatchUsingMultiQueries is called instead:

if (this.rewriteBatchedStatements.getValue() && nbrCommands > 4) {
    return executeBatchUsingMultiQueries(
        multiQueriesEnabled, 
        nbrCommands, 
        individualStatementTimeout
    );
}

And the executeBatchUsingMultiQueries method is going to concatenate the individual INSERT statements into a StringBuilder and run a single execute call instead:

StringBuilder queryBuf = new StringBuilder();

batchStmt = locallyScopedConn.createStatement();
JdbcStatement jdbcBatchedStmt = (JdbcStatement) batchStmt;

...

int argumentSetsInBatchSoFar = 0;

for (commandIndex = 0; commandIndex < nbrCommands; commandIndex++) {
    String nextQuery = (String) this.query.getBatchedArgs().get(commandIndex);

    ...

    queryBuf.append(nextQuery);
    queryBuf.append(";");
    argumentSetsInBatchSoFar++;
}

if (queryBuf.length() > 0) {
    try {
        batchStmt.execute(queryBuf.toString(), java.sql.Statement.RETURN_GENERATED_KEYS);
    } catch (SQLException ex) {
        sqlEx = handleExceptionForBatch(
            commandIndex - 1, argumentSetsInBatchSoFar, updateCounts, ex
        );
    }

    ...
}

So, for the plain JDBC Statement batching, the MySQL rewriteBatchedStatements configuration property is going to append the currently batched statements and execute them in a single database roundtrip.

Using rewriteBatchedStatements with JDBC PreparedStatement batching

When using JPA and Hibernate, all your SQL statements are going to be executed using the JDBC PreparedStatement, and that’s for very good reasons:

  • prepared statements allow you to increase the likelihood of statement caching
  • prepared statements allow you to avoid SQL injection attacks because you bind parameter values instead of injecting them as we did with the previous String.format call.

However, since Hibernate doesn’t enable JDBC batching by default, we need to provide the following configuration properties to activate the automatic batching mechanism:

spring.jpa.properties.hibernate.jdbc.batch_size=10
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true

Therefore, when persisting 10 Post entities:

for (long i = 1; i <= 10; i++) {
    entityManager.persist(
        new Post()
            .setId(i)
            .setTitle(String.format("Post no. %d", i))
    );
}

Hibernate is going to execute a single JDBC INSERT, as illustrated by the datasource-proxy log entry:

Type:Prepared, Batch:True, QuerySize:1, BatchSize:10, 
Query:["
    insert into post (title, id) values (?, ?)
"], 
Params:[
    (Post no. 1, 1), (Post no. 2, 2), (Post no. 3, 3), 
    (Post no. 4, 4), (Post no. 5, 5), (Post no. 6, 6), 
    (Post no. 7, 7), (Post no. 8, 8), (Post no. 9, 9), 
    (Post no. 10, 10)
]

If you are using the IDENTITY entity identifier strategy, Hibernate won’t be able to batch the insert statement automatically. Check out this article.

So, using the default MySQL JDBC Driver settings, a single statement was sent to the MySQL database server. However, if you inspect the database server logs, we can see that after the statement arrives, MySQL executes each statement as if they are run in a for-loop:

Query	insert into post (title, id) values ('Post no. 1', 1)
Query	insert into post (title, id) values ('Post no. 2', 2)
Query	insert into post (title, id) values ('Post no. 3', 3)
Query	insert into post (title, id) values ('Post no. 4', 4)
Query	insert into post (title, id) values ('Post no. 5', 5)
Query	insert into post (title, id) values ('Post no. 6', 6)
Query	insert into post (title, id) values ('Post no. 7', 7)
Query	insert into post (title, id) values ('Post no. 8', 8)
Query	insert into post (title, id) values ('Post no. 9', 9)
Query	insert into post (title, id) values ('Post no. 10', 10)
Query	commit

So, after enabling the rewriteBatchedStatements MySQL JDBC Driver setting:

dataSource.setRewriteBatchedStatements(true);

When we rerun the previous test case that inserts 10 Post entities, we can see that the following INSERT statement is executed on the database side:

Query   insert into post (title, id) 
        values ('Post no. 1', 1),('Post no. 2', 2),('Post no. 3', 3),
               ('Post no. 4', 4),('Post no. 5', 5),('Post no. 6', 6),
               ('Post no. 7', 7),('Post no. 8', 8),('Post no. 9', 9),
               ('Post no. 10', 10)
Query   commit

The reason why the statement has changed is that the MySQL JDBC Driver now calls the executeBatchWithMultiValuesClause method that rewrites the batched INSERT statements to a single multi-value INSERT.

if (!this.batchHasPlainStatements && 
	this.rewriteBatchedStatements.getValue()) {

	if (getQueryInfo().isRewritableWithMultiValuesClause()) {
		return executeBatchWithMultiValuesClause(batchTimeout);
	}

	...
}

Testing Time

For plain statements, there’s no need to test the rewriteBatchedStatements optimization because most of the SQL statements you will execute using JDBC, JPA, Hibernate, or jOOQ are done using the JDBC PreparedStatement interface.

So, when running a test that inserts 5000 post records using a batch size of 100 for a duration of 60 seconds, we get the following results:

MySQL rewriteBatchedStatements

And here are the Dropwizard Metrics for both scenarios:

Test MySQL batch insert with rewriteBatchedStatements=false
type=TIMER, name=batchInsertTimer, count=55, min=909.9544999999999, max=1743.0735, 
mean=1072.3787996947426, stddev=128.4560649360703, median=1049.4146, 
p75=1106.231, p95=1224.2176, p98=1649.8706, p99=1743.0735, p999=1743.0735, 
mean_rate=0.8612772397894758, m1=0.6330960191792878, m5=0.3192705968508436, 
m15=0.24209506781664528, rate_unit=events/second, duration_unit=milliseconds

Test MySQL batch insert with rewriteBatchedStatements=true
type=TIMER, name=batchInsertTimer, count=441, min=80.09599999999999, max=565.4343, 
mean=112.20623474996226, stddev=29.01211110828766, median=103.52319999999999, 
p75=120.9807, p95=161.3664, p98=173.9123, p99=182.2464, p999=565.4343, 
mean_rate=7.263224298238385, m1=6.872524588278418, m5=6.547662085190082, 
m15=6.453339001683109, rate_unit=events/second, duration_unit=milliseconds

Clearly, the MySQL rewriteBatchedStatements setting provides an advantage as the total batch execution time is way shorter when activating this property.

As explained in the MySQL documentation, there are some caveats you should be aware of:

  • Statement.getGeneratedKeys() works only when the rewritten statement consists of INSERT or REPLACE statements only. That’s not really a problem when using JPA and Hibernate since only INSERT will be batched during flush.
  • Rewriting INSERT ... ON DUPLICATE KEY UPDATE statements might not work as expected, but, again, this is not a problem for JPA and Hibernate since the default INSERT does not use the ON DUPLICATE KEY UPDATE clause.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While the MySQL JDBC Driver has been providing the rewriteBatchedStatements setting for a long time, because the documentation was rather misleading, it was not clear that this property applies to PreparedStatement batching.

Therefore, if your batch processing task is running on MySQL, enabling the rewriteBatchedStatements setting may provide better performance.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.