How to batch INSERT statements with MySQL and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see how we can batch INSERT statements when using MySQL and Hibernate.

While Hibernate has long supported automated JDBC batch inserts, this feature doesn’t work when using the IDENTITY identifier generator strategy. Unfortunately, MySQL doesn’t support SEQUENCE objects, so using IDENTITY is the only reasonable option.

Therefore, I’m going to show you a technique you can use to get Hibernate batch INSERT statements for entities that use the IDENTITY generator.

Default batch INSERT with MySQL and Hibernate

Let’s say we have created the following Post entity into our application.

Post entity

The Post entity is mapped as follows:

@Entity
@Table(name = "post")
public class Post extends AbstractPost<Post> {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    public Long getId() {
        return id;
    }

    public Post setId(Long id) {
        this.id = id;
        return this;
    }
}

The AbstractPost is the base class of the Post entity and maps the rest of the post table columns:

@MappedSuperclass
public abstract class AbstractPost<T extends AbstractPost> {

    private String title;

    @Column(name = "created_on")
    private LocalDateTime createdOn = LocalDateTime.now();

    @Column(name = "created_by")
    private String createdBy;

    @Column(name = "updated_on")
    private LocalDateTime updatedOn = LocalDateTime.now();

    @Column(name = "updated_by")
    private String updatedBy;

    @Version
    private Short version;

    public String getTitle() {
        return title;
    }

    public T setTitle(String title) {
        this.title = title;
        return (T) this;
    }

    public LocalDateTime getCreatedOn() {
        return createdOn;
    }

    //Getters and setters omitted for brevity
}

The Post entity has an associated PostRepository:

@Repository
public interface PostRepository extends BaseJpaRepository<Post, Long> {

}

The BaseJpaRepository is a replacement for the default Spring Data JPA JpaRepository that removes the dangerous findAll and replaces the annoying save method with persist and merge.

For more details about the BaseJpaRepository, check out this article.

We als ohave a ForumService that provides a createPosts method that splits the List of Post entities into batches and calls the transactional persistAll method on the PostRepository:

@Service
@Transactional(readOnly = true)
public class ForumService {

    private static final Logger LOGGER = LoggerFactory.getLogger(
        ForumService.class
    );

    private static final ExecutorService executorService = Executors
        .newFixedThreadPool(
            Runtime.getRuntime().availableProcessors()
        );

    private final PostRepository postRepository;

    private final TransactionTemplate transactionTemplate;

    private final int batchProcessingSize;

    public ForumService(
        @Autowired PostRepository postRepository,
        @Autowired TransactionTemplate transactionTemplate,
        @Autowired int batchProcessingSize) {
        this.postRepository = postRepository;
        this.transactionTemplate = transactionTemplate;
        this.batchProcessingSize = batchProcessingSize;
    }
    
    public List<Post> findByIds(List<Long> ids) {
        return postRepository.findAllById(ids);
    }

    public Post findById(Long id) {
        return postRepository.findById(id).orElse(null);
    }

    @Transactional(propagation = Propagation.NEVER)
    public void createPosts(List<Post> posts) {
        CollectionUtils.spitInBatches(posts, batchProcessingSize)
            .map(postBatch -> executorService.submit(() -> {
                try {
                    transactionTemplate.execute(
                        (status) -> postRepository.persistAll(postBatch)
                    );
                } catch (TransactionException e) {
                    LOGGER.error("Batch transaction failure", e);
                }
            }))
            .forEach(future -> {
                try {
                    future.get();
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                } catch (ExecutionException e) {
                    LOGGER.error("Batch execution failure", e);
                }
            });
    }
}

To enable the automatic batching feature, we are going to set the following Hibernate configuration properties:

properties.setProperty(
    AvailableSettings.STATEMENT_BATCH_SIZE,
    String.valueOf(batchProcessingSize())
);
properties.setProperty(
    AvailableSettings.ORDER_INSERTS,
    Boolean.TRUE.toString()
);
properties.setProperty(
    AvailableSettings.ORDER_UPDATES,
    Boolean.TRUE.toString()
);

Now, when persisting several Post entities:

List<Post> posts = LongStream.rangeClosed(1, POST_COUNT)
    .mapToObj(postId -> new Post()
        .setId(postId)
        .setTitle(
            String.format(
                "High-Performance Java Persistence - Page %d",
                postId
            )
        )
        .setCreatedBy("Vlad Mihalcea")
        .setUpdatedBy("Vlad Mihalcea")
    )
    .toList();

forumService.createPosts(posts);

Hibernate will generate the following SQL INSERT statements:

[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 1, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 2, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 3, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 4, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 5, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 6, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 7, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 8, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 9, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7560986, High-Performance Java Persistence - Page 10, Vlad Mihalcea, 2023-10-18 18:25:57.7560986, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7560986, High-Performance Java Persistence - Page 11, Vlad Mihalcea, 2023-10-18 18:25:57.7560986, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7560986, High-Performance Java Persistence - Page 12, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7561532, High-Performance Java Persistence - Page 13, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7561532, High-Performance Java Persistence - Page 14, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)]
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7561532, High-Performance Java Persistence - Page 15, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)]

So, while we enabled JDBC batching, it obviously doesn’t work since our Post entity uses the IDENTITY identifier generator.

The problem with the IDENTITY generator is that the entity identifier is assigned when calling persist by firing the INSERT statement right away, so when the Persistence Context is flushed and the ActionQueue attempts to batch the pending statements, there won’t be any INSERT statement to be batched.

Automatic batch INSERT with MySQL and Hibernate

To fix the aforementioned issue, we will create the following BatchInsertPost entity that maps the post table, just like the Post entity:

@Entity
@Table(name = "post")
@SQLInsert(sql = """
INSERT INTO post (
    created_by, created_on, title,
    updated_by, updated_on, version
)
VALUES (
    ?, ?, ?,
    ?, ?, ?
)
""")
public class BatchInsertPost extends AbstractPost<BatchInsertPost> {

    @Id
    @Column(insertable = false)
    @GeneratedValue(generator = "noop_generator")
    @GenericGenerator(
        name = "noop_generator",
        strategy = "com.vladmihalcea.hpjp.spring.stateless.domain.NoOpGenerator"
    )
    private Long id;

    public Long getId() {
        return id;
    }

    public BatchInsertPost setId(Long id) {
        this.id = id;
        return this;
    }

    public static BatchInsertPost valueOf(Post post) {
        return new BatchInsertPost()
            .setId(post.getId())
            .setTitle(post.getTitle())
            .setCreatedBy(post.getCreatedBy())
            .setCreatedOn(post.getCreatedOn())
            .setUpdatedBy(post.getUpdatedBy())
            .setUpdatedOn(post.getUpdatedOn())
            .setVersion(post.getVersion());
    }
}

We use the @SQLInsert annotation to define an explicit INSERT statement so that Hibernate doesn’t try to set the null identifier provided by the NoOpGenerator, which looks as follows:

public class NoOpGenerator implements StandardGenerator {
    @Override
    public Object generate(
            SharedSessionContractImplementor session, 
            Object obj) {
        return null;
    }
}

The NoOpGenerator returns a null value, as we will not really use the identifier generator at all to provide the entity identifiers.

Next, we need to change the PostRepository so that we extend a CustomPostRepository interface that allows us to override the default BaseJpaRepository implementation methods:

@Repository
public interface PostRepository extends BaseJpaRepository<Post, Long>, 
    CustomPostRepository<Post> {

}

The CustomPostRepository defines the persistAll method that we wish to override from the BaseJpaRepositoryImpl:

public interface CustomPostRepository<T> {
    <S extends T> List<S> persistAll(Iterable<S> entities);
}

The CustomPostRepositoryImpl class implements the CustomPostRepository interface like this:

public class CustomPostRepositoryImpl implements CustomPostRepository<Post> {

    private final EntityManager entityManager;

    private final StatelessSessionBuilder statelessSessionBuilder;

    private final Integer batchProcessingSize;

    public CustomPostRepositoryImpl(
            EntityManager entityManager,
            StatelessSessionBuilder statelessSessionBuilder,
            Integer batchProcessingSize) {
        this.entityManager = entityManager;
        this.statelessSessionBuilder = statelessSessionBuilder;
        this.batchProcessingSize = batchProcessingSize;
    }

    @Override
    public <S extends Post> List<S> persistAll(Iterable<S> entities) {
        final StatelessSession statelessSession = statelessSessionBuilder
            .connection(
                entityManager
                    .unwrap(Session.class)
                    .doReturningWork(connection -> connection)
            )
            .openStatelessSession();
        try {
            statelessSession.setJdbcBatchSize(batchProcessingSize);

            return StreamSupport.stream(entities.spliterator(), false)
                .peek(entity -> {
                    statelessSession.insert(BatchInsertPost.valueOf(entity));
                })
                .collect(Collectors.toList());
        } catch (Exception e) {
            throw new HibernateException(e);
        } finally {
            JdbcSessionOwner jdbcSessionOwner = ((JdbcSessionOwner) statelessSession);
            jdbcSessionOwner.flushBeforeTransactionCompletion();
            statelessSession.close();
        }
    }
}

The persistAll method iterates over the provided Post entities and calls insert on a Hibernate StatelessSession, providing a BatchInsertPost instance instead of the original Post entity.

And, when rerunning our test case, we can see that Hibernate now generates the batch INSERT statement instead:

[pool-1-thread-1]: Batch:True, QuerySize:1, BatchSize:15, 
Query:["
	INSERT INTO post (
		created_by, created_on, title,
		updated_by, updated_on, version
	)
	VALUES (
		?, ?, ?,
		?, ?, ?
	)
"], 
Params:[
	(Vlad Mihalcea, 2023-10-18 18:57:02.9516365, High-Performance Java Persistence - Page 1, Vlad Mihalcea, 2023-10-18 18:57:02.9516365, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9516365, High-Performance Java Persistence - Page 2, Vlad Mihalcea, 2023-10-18 18:57:02.9516365, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9516365, High-Performance Java Persistence - Page 3, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 4, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 5, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 6, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 7, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 8, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 9, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 10, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 11, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 12, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 13, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 14, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), 
	(Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 15, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0)
]

Awesome, right?

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While Hibernate can automatically batch INSERT statements when you are using the SEQUENCE identifier generator, this is not the case for the IDENTITY strategy.

Since MySQL doesn’t have SEQUENCE objects, like MariaDB, you are stuck with the IDENTITY generator that cannot benefit from automatic batch inserts.

Luckily, there is a way for you to overcome this issue, so you can have Hibernate batch INSERT statements even for MySQL.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.