How to batch INSERT statements with MySQL and Hibernate
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to see how we can batch INSERT statements when using MySQL and Hibernate.
While Hibernate has long supported automated JDBC batch inserts, this feature doesn’t work when using the IDENTITY identifier generator strategy. Unfortunately, MySQL doesn’t support SEQUENCE objects, so using IDENTITY is the only reasonable option.
Therefore, I’m going to show you a technique you can use to get Hibernate batch INSERT statements for entities that use the IDENTITY generator.
Default batch INSERT with MySQL and Hibernate
Let’s say we have created the following Post
entity into our application.
The Post
entity is mapped as follows:
@Entity @Table(name = "post") public class Post extends AbstractPost<Post> { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; public Long getId() { return id; } public Post setId(Long id) { this.id = id; return this; } }
The AbstractPost
is the base class of the Post
entity and maps the rest of the post
table columns:
@MappedSuperclass public abstract class AbstractPost<T extends AbstractPost> { private String title; @Column(name = "created_on") private LocalDateTime createdOn = LocalDateTime.now(); @Column(name = "created_by") private String createdBy; @Column(name = "updated_on") private LocalDateTime updatedOn = LocalDateTime.now(); @Column(name = "updated_by") private String updatedBy; @Version private Short version; public String getTitle() { return title; } public T setTitle(String title) { this.title = title; return (T) this; } public LocalDateTime getCreatedOn() { return createdOn; } //Getters and setters omitted for brevity }
The Post
entity has an associated PostRepository
:
@Repository public interface PostRepository extends BaseJpaRepository<Post, Long> { }
The
BaseJpaRepository
is a replacement for the default Spring Data JPAJpaRepository
that removes the dangerousfindAll
and replaces the annoyingsave
method withpersist
andmerge
.For more details about the
BaseJpaRepository
, check out this article.
We als ohave a ForumService
that provides a createPosts
method that splits the List
of Post
entities into batches and calls the transactional persistAll
method on the PostRepository
:
@Service @Transactional(readOnly = true) public class ForumService { private static final Logger LOGGER = LoggerFactory.getLogger( ForumService.class ); private static final ExecutorService executorService = Executors .newFixedThreadPool( Runtime.getRuntime().availableProcessors() ); private final PostRepository postRepository; private final TransactionTemplate transactionTemplate; private final int batchProcessingSize; public ForumService( @Autowired PostRepository postRepository, @Autowired TransactionTemplate transactionTemplate, @Autowired int batchProcessingSize) { this.postRepository = postRepository; this.transactionTemplate = transactionTemplate; this.batchProcessingSize = batchProcessingSize; } public List<Post> findByIds(List<Long> ids) { return postRepository.findAllById(ids); } public Post findById(Long id) { return postRepository.findById(id).orElse(null); } @Transactional(propagation = Propagation.NEVER) public void createPosts(List<Post> posts) { CollectionUtils.spitInBatches(posts, batchProcessingSize) .map(postBatch -> executorService.submit(() -> { try { transactionTemplate.execute( (status) -> postRepository.persistAll(postBatch) ); } catch (TransactionException e) { LOGGER.error("Batch transaction failure", e); } })) .forEach(future -> { try { future.get(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } catch (ExecutionException e) { LOGGER.error("Batch execution failure", e); } }); } }
To enable the automatic batching feature, we are going to set the following Hibernate configuration properties:
properties.setProperty( AvailableSettings.STATEMENT_BATCH_SIZE, String.valueOf(batchProcessingSize()) ); properties.setProperty( AvailableSettings.ORDER_INSERTS, Boolean.TRUE.toString() ); properties.setProperty( AvailableSettings.ORDER_UPDATES, Boolean.TRUE.toString() );
Now, when persisting several Post
entities:
List<Post> posts = LongStream.rangeClosed(1, POST_COUNT) .mapToObj(postId -> new Post() .setId(postId) .setTitle( String.format( "High-Performance Java Persistence - Page %d", postId ) ) .setCreatedBy("Vlad Mihalcea") .setUpdatedBy("Vlad Mihalcea") ) .toList(); forumService.createPosts(posts);
Hibernate will generate the following SQL INSERT statements:
[pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 1, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 2, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 3, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 4, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 5, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 6, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 7, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 8, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7550989, High-Performance Java Persistence - Page 9, Vlad Mihalcea, 2023-10-18 18:25:57.7550989, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7560986, High-Performance Java Persistence - Page 10, Vlad Mihalcea, 2023-10-18 18:25:57.7560986, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7560986, High-Performance Java Persistence - Page 11, Vlad Mihalcea, 2023-10-18 18:25:57.7560986, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7560986, High-Performance Java Persistence - Page 12, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7561532, High-Performance Java Persistence - Page 13, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7561532, High-Performance Java Persistence - Page 14, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)] [pool-1-thread-1]: Batch:False, QuerySize:1, BatchSize:0, Query:["insert into post (created_by, created_on, title, updated_by, updated_on, version) values (?, ?, ?, ?, ?, ?)"], Params:[(Vlad Mihalcea, 2023-10-18 18:25:57.7561532, High-Performance Java Persistence - Page 15, Vlad Mihalcea, 2023-10-18 18:25:57.7561532, 0)]
So, while we enabled JDBC batching, it obviously doesn’t work since our Post
entity uses the IDENTITY
identifier generator.
The problem with the IDENTITY
generator is that the entity identifier is assigned when calling persist
by firing the INSERT
statement right away, so when the Persistence Context is flushed and the ActionQueue
attempts to batch the pending statements, there won’t be any INSERT statement to be batched.
Automatic batch INSERT with MySQL and Hibernate
To fix the aforementioned issue, we will create the following BatchInsertPost
entity that maps the post
table, just like the Post
entity:
@Entity @Table(name = "post") @SQLInsert(sql = """ INSERT INTO post ( created_by, created_on, title, updated_by, updated_on, version ) VALUES ( ?, ?, ?, ?, ?, ? ) """) public class BatchInsertPost extends AbstractPost<BatchInsertPost> { @Id @Column(insertable = false) @GeneratedValue(generator = "noop_generator") @GenericGenerator( name = "noop_generator", strategy = "com.vladmihalcea.hpjp.spring.stateless.domain.NoOpGenerator" ) private Long id; public Long getId() { return id; } public BatchInsertPost setId(Long id) { this.id = id; return this; } public static BatchInsertPost valueOf(Post post) { return new BatchInsertPost() .setId(post.getId()) .setTitle(post.getTitle()) .setCreatedBy(post.getCreatedBy()) .setCreatedOn(post.getCreatedOn()) .setUpdatedBy(post.getUpdatedBy()) .setUpdatedOn(post.getUpdatedOn()) .setVersion(post.getVersion()); } }
We use the @SQLInsert
annotation to define an explicit INSERT
statement so that Hibernate doesn’t try to set the null
identifier provided by the NoOpGenerator
, which looks as follows:
public class NoOpGenerator implements StandardGenerator { @Override public Object generate( SharedSessionContractImplementor session, Object obj) { return null; } }
The NoOpGenerator
returns a null
value, as we will not really use the identifier generator at all to provide the entity identifiers.
Next, we need to change the PostRepository
so that we extend a CustomPostRepository
interface that allows us to override the default BaseJpaRepository
implementation methods:
@Repository public interface PostRepository extends BaseJpaRepository<Post, Long>, CustomPostRepository<Post> { }
The CustomPostRepository
defines the persistAll
method that we wish to override from the BaseJpaRepositoryImpl
:
public interface CustomPostRepository<T> { <S extends T> List<S> persistAll(Iterable<S> entities); }
The CustomPostRepositoryImpl
class implements the CustomPostRepository
interface like this:
public class CustomPostRepositoryImpl implements CustomPostRepository<Post> { private final EntityManager entityManager; private final StatelessSessionBuilder statelessSessionBuilder; private final Integer batchProcessingSize; public CustomPostRepositoryImpl( EntityManager entityManager, StatelessSessionBuilder statelessSessionBuilder, Integer batchProcessingSize) { this.entityManager = entityManager; this.statelessSessionBuilder = statelessSessionBuilder; this.batchProcessingSize = batchProcessingSize; } @Override public <S extends Post> List<S> persistAll(Iterable<S> entities) { final StatelessSession statelessSession = statelessSessionBuilder .connection( entityManager .unwrap(Session.class) .doReturningWork(connection -> connection) ) .openStatelessSession(); try { statelessSession.setJdbcBatchSize(batchProcessingSize); return StreamSupport.stream(entities.spliterator(), false) .peek(entity -> { statelessSession.insert(BatchInsertPost.valueOf(entity)); }) .collect(Collectors.toList()); } catch (Exception e) { throw new HibernateException(e); } finally { JdbcSessionOwner jdbcSessionOwner = ((JdbcSessionOwner) statelessSession); jdbcSessionOwner.flushBeforeTransactionCompletion(); statelessSession.close(); } } }
The persistAll
method iterates over the provided Post
entities and calls insert
on a Hibernate StatelessSession
, providing a BatchInsertPost
instance instead of the original Post
entity.
And, when rerunning our test case, we can see that Hibernate now generates the batch INSERT statement instead:
[pool-1-thread-1]: Batch:True, QuerySize:1, BatchSize:15, Query:[" INSERT INTO post ( created_by, created_on, title, updated_by, updated_on, version ) VALUES ( ?, ?, ?, ?, ?, ? ) "], Params:[ (Vlad Mihalcea, 2023-10-18 18:57:02.9516365, High-Performance Java Persistence - Page 1, Vlad Mihalcea, 2023-10-18 18:57:02.9516365, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9516365, High-Performance Java Persistence - Page 2, Vlad Mihalcea, 2023-10-18 18:57:02.9516365, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9516365, High-Performance Java Persistence - Page 3, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 4, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 5, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 6, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 7, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 8, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 9, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 10, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 11, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 12, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 13, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 14, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0), (Vlad Mihalcea, 2023-10-18 18:57:02.9526331, High-Performance Java Persistence - Page 15, Vlad Mihalcea, 2023-10-18 18:57:02.9526331, 0) ]
Awesome, right?
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
While Hibernate can automatically batch INSERT statements when you are using the SEQUENCE
identifier generator, this is not the case for the IDENTITY
strategy.
Since MySQL doesn’t have SEQUENCE objects, like MariaDB, you are stuck with the IDENTITY
generator that cannot benefit from automatic batch inserts.
Luckily, there is a way for you to overcome this issue, so you can have Hibernate batch INSERT statements even for MySQL.
