How to find which statement failed in a JDBC Batch Update

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

Yesterday, my Danish friend, Flemming Harms, asked my a very interesting question related to when a JDBC batch update fails.

Basically, considering we are going to group several DML statements in a batch, we need a way to tell which statement is the cause of the failure. This post is going to answer this question in more detail.

Simulating a failure

Considering we have a Post entity, whose identifiers are manually assigned:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    //getters and setters omitted for brevity
}

Now, we are going to use JDBC batching for grouping several INSERT statements. For this purpose, we will use a PreparedStatement since it responds better to JDBC batching than a simple java.sql.Statement.

In the following example, to simulate the failure, we are going to assign the same Primary Key to multiple records so that the database can raise a ConstraintViolationException:

Session session = entityManager.unwrap(Session.class);
session.doWork(connection -> {

    try (PreparedStatement st = connection.prepareStatement(
            "INSERT INTO post (id, title) " +
            "VALUES (?, ?)")) {
        for (long i = 0; i < 5; i++) {
            st.setLong(
                1, 
                i % 2
            );
            st.setString(
                2, 
                String.format(
                    "High-Performance Java Persistence, Part %d", 
                    i
                )
            );
            st.addBatch();
        }
        st.executeBatch();
    } catch (BatchUpdateException e) {
        LOGGER.info(
            "Batch has managed to process {} entries", 
            e.getUpdateCounts().length
        );
    }
});

When running the test case above, Hibernate generates the following output:

c.v.b.h.h.b.BatchExceptionTest - testInsertPosts
n.t.d.l.SLF4JQueryLoggingListener - Name:DATA_SOURCE_PROXY, Time:0, 
Success:False, 
Type:Prepared, 
Batch:True, 
QuerySize:1, 
BatchSize:5, 
Query:[
    "INSERT INTO post (id, title) VALUES (?, ?)"], 
    Params:[
        (0, High-Performance Java Persistence, Part 0), 
        (1, High-Performance Java Persistence, Part 1), 
        (0, High-Performance Java Persistence, Part 2), 
        (1, High-Performance Java Persistence, Part 3), 
        (0, High-Performance Java Persistence, Part 4)
    ]
c.v.b.h.h.b.BatchExceptionTest - Batch has managed to process 2 entries

So, from the logged JDBC output, we can see that the third statement is going to conflict with the first one. However, we can pinpoint the failing statements by introspecting the result of the getUpdateCounts method on the java.sql.BatchUpdateException that is thrown by the JDBC Driver.

Because the getUpdateCounts method returns an int[] array with two entries, we know that only two statements were successfully processed. So, the third statement was the one causing the failure.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

To wrap it up, we need to things:

  1. A good way to log JDBC statements, and datasource-proxy is one of the best ways of logging JDBC PreparedStatement(s)
  2. We also need to catch the BatchUpdateException and find the number of statements that were sucessfully executed by calling the getUpdateCounts method

This way, you will be able to locate which batched statement is causing an issue, and this technique is applicable to any production system. You just have to make sure you collect all logs properly (e.g. Logstash) so that you can query them when a problem occurs.

Code available on GitHub.

FREE EBOOK
Newsletter logo
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence,
  • a 10% discount coupon for my book.

8 Comments on “How to find which statement failed in a JDBC Batch Update

  1. How can I get the getUpdateCounts after calling persist in Hibernate 5.4.4?

    I can observe the BatchUpdateException logged in ERROR log, but I only catch a
    org.hibernate.exception.GenericJDBCException: could not execute batch exception.

    Thanks!

  2. Also, is datasourceProxy a production-grade library. I mean can this be enabled in production or is it meant only for non prod? Will, there be any performance issues, by using a proxydatasource, instead of actual datasource.

    • It can be used in production and the extra overhead is negligible. In reality, Spring already adds tens of proxies anyway, and that wasn’t an issue, right?

  3. I followed the example provided above and it helps me to identify the record causing failure in that particular batch operation.

    But If I have a jpa batch size = 10, but I collect 100 records and call saveAll(), if the failure happens during batch, updateCount does not exactly give the records that were persisted. For eg, if the updateCount = 5, it means 5 records persisted and 6th record failed. But this 6th record could be any 6th record in any one of 10 record iterations until all 100 are completed.

    I’m trying to see how to achieve this. Any pointers are appreciated

    • If you log the items that are saver and the number of processed banches, then you can determine the actual item that failed.

      • But this would be helpful for analytics and to find more details from logs. This does not help in transaction itself. I’m trying to see, if the exact record can be returned back to the users as an exception, when they made a call using webservice. Does Hibernate/BatchUpdatException or any other library help in finding that information.

      • Yes, this is good for log analysis. Sending back the record to the user sounds like the validation was not done on the client or server side.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.