MySQL rewriteBatchedStatements configuration property
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to see how MySQL rewriteBatchedStatements works when using JDBC, JPA, or Hibernate.
I first researched this MySQL configuration property when I was writing the batching chapter of my High-Performance Java Persistence book, and, back then, I discovered that this setting allows batching plain Statement
by rewriting the SQL string that is sent to the database.
However, the MySQL 6 Connector/J documentation mentioned that:
for prepared statements, server-side prepared statements can not currently take advantage of this rewrite option
So, for a long time, I wrongly assumed this feature was not meant for batching JDBC prepared statements.
It was when I read the MySQL 8.0.30 Connector/J release notes that I realized the documentation had been misleading us:
The description for the connection property
rewriteBatchedStatements
has been corrected, removing the limitation that server-sided prepared statements could not take advantage of the rewrite option. (Bug #34022110)
So, apparently, the rewriteBatchedStatements
was working with JDBC PreparedStatement
s, and, for this reason, I decided to test this functionality and write down my findings in this article.
Using rewriteBatchedStatements with JDBC Statement batching
Most Java developers use the executeUpdate
method of the Statement
interface when having to execute INSERT, UPDATE, and DELETE statements.
However, since Java 1.2, the Statement
interface has been providing the addBatch
that we can use to batch multiple statements so that they are sent on a single request when calling the executeBatch
method, as illustrated by the following example:
String INSERT = "insert into post (id, title) values (%1$d, 'Post no. %1$d')"; try(Statement statement = connection.createStatement()) { for (long id = 1; id <= 10; id++) { statement.addBatch( String.format(INSERT, id) ); } statement.executeBatch(); }
Now, you’d assume that the above example will execute the INSERT statements in a single database roundtrip, but if you debug through the MySQL JDBC Driver, you’ll find the following code block:
if (this.rewriteBatchedStatements.getValue() && nbrCommands > 4) { return executeBatchUsingMultiQueries( multiQueriesEnabled, nbrCommands, individualStatementTimeout ); } updateCounts = new long[nbrCommands]; for (int i = 0; i < nbrCommands; i++) { updateCounts[i] = -3; } int commandIndex = 0; for (commandIndex = 0; commandIndex < nbrCommands; commandIndex++) { try { String sql = (String) batchedArgs.get(commandIndex); updateCounts[commandIndex] = executeUpdateInternal(sql, true, true); ... } catch (SQLException ex) { updateCounts[commandIndex] = EXECUTE_FAILED; ... } }
Because the rewriteBatchedStatements
is false
, each INSERT statement will be executed individually using the executeUpdateInternal
method call.
So, even if we used addBatch
and executeBatch
, by default, MySQL still executes the INSERT statements individually when using the plain JDBC Statement
object.
However, if we enable the rewriteBatchedStatements
JDBC configuration property:
MysqlDataSource dataSource = new MysqlDataSource(); String url = "jdbc:mysql://localhost/high_performance_java_persistence?useSSL=false"; dataSource.setURL(url); dataSource.setUser(username()); dataSource.setPassword(password()); dataSource.setRewriteBatchedStatements(true);
And debug the executeBatch
method execution, you will see that, now, the executeBatchUsingMultiQueries
is called instead:
if (this.rewriteBatchedStatements.getValue() && nbrCommands > 4) { return executeBatchUsingMultiQueries( multiQueriesEnabled, nbrCommands, individualStatementTimeout ); }
And the executeBatchUsingMultiQueries
method is going to concatenate the individual INSERT statements into a StringBuilder
and run a single execute
call instead:
StringBuilder queryBuf = new StringBuilder(); batchStmt = locallyScopedConn.createStatement(); JdbcStatement jdbcBatchedStmt = (JdbcStatement) batchStmt; ... int argumentSetsInBatchSoFar = 0; for (commandIndex = 0; commandIndex < nbrCommands; commandIndex++) { String nextQuery = (String) this.query.getBatchedArgs().get(commandIndex); ... queryBuf.append(nextQuery); queryBuf.append(";"); argumentSetsInBatchSoFar++; } if (queryBuf.length() > 0) { try { batchStmt.execute(queryBuf.toString(), java.sql.Statement.RETURN_GENERATED_KEYS); } catch (SQLException ex) { sqlEx = handleExceptionForBatch( commandIndex - 1, argumentSetsInBatchSoFar, updateCounts, ex ); } ... }
So, for the plain JDBC Statement
batching, the MySQL rewriteBatchedStatements
configuration property is going to append the currently batched statements and execute them in a single database roundtrip.
Using rewriteBatchedStatements with JDBC PreparedStatement batching
When using JPA and Hibernate, all your SQL statements are going to be executed using the JDBC PreparedStatement
, and that’s for very good reasons:
- prepared statements allow you to increase the likelihood of statement caching
- prepared statements allow you to avoid SQL injection attacks because you bind parameter values instead of injecting them as we did with the previous
String.format
call.
However, since Hibernate doesn’t enable JDBC batching by default, we need to provide the following configuration properties to activate the automatic batching mechanism:
spring.jpa.properties.hibernate.jdbc.batch_size=10 spring.jpa.properties.hibernate.order_inserts=true spring.jpa.properties.hibernate.order_updates=true
Therefore, when persisting 10 Post
entities:
for (long i = 1; i <= 10; i++) { entityManager.persist( new Post() .setId(i) .setTitle(String.format("Post no. %d", i)) ); }
Hibernate is going to execute a single JDBC INSERT, as illustrated by the datasource-proxy log entry:
Type:Prepared, Batch:True, QuerySize:1, BatchSize:10, Query:[" insert into post (title, id) values (?, ?) "], Params:[ (Post no. 1, 1), (Post no. 2, 2), (Post no. 3, 3), (Post no. 4, 4), (Post no. 5, 5), (Post no. 6, 6), (Post no. 7, 7), (Post no. 8, 8), (Post no. 9, 9), (Post no. 10, 10) ]
If you are using the
IDENTITY
entity identifier strategy, Hibernate won’t be able to batch the insert statement automatically. Check out this article.
So, using the default MySQL JDBC Driver settings, a single statement was sent to the MySQL database server. However, if you inspect the database server logs, we can see that after the statement arrives, MySQL executes each statement as if they are run in a for-loop:
Query insert into post (title, id) values ('Post no. 1', 1) Query insert into post (title, id) values ('Post no. 2', 2) Query insert into post (title, id) values ('Post no. 3', 3) Query insert into post (title, id) values ('Post no. 4', 4) Query insert into post (title, id) values ('Post no. 5', 5) Query insert into post (title, id) values ('Post no. 6', 6) Query insert into post (title, id) values ('Post no. 7', 7) Query insert into post (title, id) values ('Post no. 8', 8) Query insert into post (title, id) values ('Post no. 9', 9) Query insert into post (title, id) values ('Post no. 10', 10) Query commit
So, after enabling the rewriteBatchedStatements
MySQL JDBC Driver setting:
dataSource.setRewriteBatchedStatements(true);
When we rerun the previous test case that inserts 10 Post
entities, we can see that the following INSERT statement is executed on the database side:
Query insert into post (title, id) values ('Post no. 1', 1),('Post no. 2', 2),('Post no. 3', 3), ('Post no. 4', 4),('Post no. 5', 5),('Post no. 6', 6), ('Post no. 7', 7),('Post no. 8', 8),('Post no. 9', 9), ('Post no. 10', 10) Query commit
The reason why the statement has changed is that the MySQL JDBC Driver now calls the executeBatchWithMultiValuesClause
method that rewrites the batched INSERT statements to a single multi-value INSERT.
if (!this.batchHasPlainStatements && this.rewriteBatchedStatements.getValue()) { if (getQueryInfo().isRewritableWithMultiValuesClause()) { return executeBatchWithMultiValuesClause(batchTimeout); } ... }
Testing Time
For plain statements, there’s no need to test the rewriteBatchedStatements
optimization because most of the SQL statements you will execute using JDBC, JPA, Hibernate, or jOOQ are done using the JDBC PreparedStatement
interface.
So, when running a test that inserts 5000 post
records using a batch size of 100
for a duration of 60 seconds, we get the following results:
And here are the Dropwizard Metrics for both scenarios:
Test MySQL batch insert with rewriteBatchedStatements=false type=TIMER, name=batchInsertTimer, count=55, min=909.9544999999999, max=1743.0735, mean=1072.3787996947426, stddev=128.4560649360703, median=1049.4146, p75=1106.231, p95=1224.2176, p98=1649.8706, p99=1743.0735, p999=1743.0735, mean_rate=0.8612772397894758, m1=0.6330960191792878, m5=0.3192705968508436, m15=0.24209506781664528, rate_unit=events/second, duration_unit=milliseconds Test MySQL batch insert with rewriteBatchedStatements=true type=TIMER, name=batchInsertTimer, count=441, min=80.09599999999999, max=565.4343, mean=112.20623474996226, stddev=29.01211110828766, median=103.52319999999999, p75=120.9807, p95=161.3664, p98=173.9123, p99=182.2464, p999=565.4343, mean_rate=7.263224298238385, m1=6.872524588278418, m5=6.547662085190082, m15=6.453339001683109, rate_unit=events/second, duration_unit=milliseconds
Clearly, the MySQL rewriteBatchedStatements
setting provides an advantage as the total batch execution time is way shorter when activating this property.
As explained in the MySQL documentation, there are some caveats you should be aware of:
Statement.getGeneratedKeys()
works only when the rewritten statement consists of INSERT or REPLACE statements only. That’s not really a problem when using JPA and Hibernate since only INSERT will be batched during flush.- Rewriting
INSERT ... ON DUPLICATE KEY UPDATE
statements might not work as expected, but, again, this is not a problem for JPA and Hibernate since the default INSERT does not use theON DUPLICATE KEY UPDATE
clause.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
While the MySQL JDBC Driver has been providing the rewriteBatchedStatements
setting for a long time, because the documentation was rather misleading, it was not clear that this property applies to PreparedStatement
batching.
Therefore, if your batch processing task is running on MySQL, enabling the rewriteBatchedStatements
setting may provide better performance.
