How to find which statement failed in a JDBC Batch Update

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

Yesterday, my Danish friend, Flemming Harms, asked my a very interesting question related to when a JDBC batch update fails.

Basically, considering we are going to group several DML statements in a batch, we need a way to tell which statement is the cause of the failure. This post is going to answer this question in more detail.

Simulating a failure

Considering we have a Post entity, whose identifiers are manually assigned:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    //getters and setters omitted for brevity
}

Now, we are going to use JDBC batching for grouping several INSERT statements. For this purpose, we will use a PreparedStatement since it responds better to JDBC batching than a simple java.sql.Statement.

In the following example, to simulate the failure, we are going to assign the same Primary Key to multiple records so that the database can raise a ConstraintViolationException:

Session session = entityManager.unwrap(Session.class);
session.doWork(connection -> {

    try (PreparedStatement st = connection.prepareStatement(
            "INSERT INTO post (id, title) " +
            "VALUES (?, ?)")) {
        for (long i = 0; i < 5; i++) {
            st.setLong(
                1, 
                i % 2
            );
            st.setString(
                2, 
                String.format(
                    "High-Performance Java Persistence, Part %d", 
                    i
                )
            );
            st.addBatch();
        }
        st.executeBatch();
    } catch (BatchUpdateException e) {
        LOGGER.info(
            "Batch has managed to process {} entries", 
            e.getUpdateCounts().length
        );
    }
});

When running the test case above, Hibernate generates the following output:

c.v.b.h.h.b.BatchExceptionTest - testInsertPosts
n.t.d.l.SLF4JQueryLoggingListener - Name:DATA_SOURCE_PROXY, Time:0, 
Success:False, 
Type:Prepared, 
Batch:True, 
QuerySize:1, 
BatchSize:5, 
Query:[
    "INSERT INTO post (id, title) VALUES (?, ?)"], 
    Params:[
        (0, High-Performance Java Persistence, Part 0), 
        (1, High-Performance Java Persistence, Part 1), 
        (0, High-Performance Java Persistence, Part 2), 
        (1, High-Performance Java Persistence, Part 3), 
        (0, High-Performance Java Persistence, Part 4)
    ]
c.v.b.h.h.b.BatchExceptionTest - Batch has managed to process 2 entries

So, from the logged JDBC output, we can see that the third statement is going to conflict with the first one. However, we can pinpoint the failing statements by introspecting the result of the getUpdateCounts method on the java.sql.BatchUpdateException that is thrown by the JDBC Driver.

Because the getUpdateCounts method returns an int[] array with two entries, we know that only two statements were successfully processed. So, the third statement was the one causing the failure.

I'm running an online workshop on the 11th of October about High-Performance SQL.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

To wrap it up, we need to things:

  1. A good way to log JDBC statements, and datasource-proxy is one of the best ways of logging JDBC PreparedStatement(s)
  2. We also need to catch the BatchUpdateException and find the number of statements that were sucessfully executed by calling the getUpdateCounts method

This way, you will be able to locate which batched statement is causing an issue, and this technique is applicable to any production system. You just have to make sure you collect all logs properly (e.g. Logstash) so that you can query them when a problem occurs.

Code available on GitHub.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.