The best way to clean up test data with Spring and Hibernate

Are you struggling with performance issues in your Spring, Jakarta EE, or Java EE application?

What if there were a tool that could automatically detect what caused performance issues in your JPA and Hibernate data access layer?

Wouldn’t it be awesome to have such a tool to watch your application and prevent performance issues during development, long before they affect production systems?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, Micronaut, or Play Framework.

So, rather than fixing performance issues in your production system on a Saturday night, you are better off using Hypersistence Optimizer to help you prevent those issues so that you can spend your time on the things that you love!

Introduction

In this article, I’m going to show you the best way to clean up test data when using Spring and Hibernate.

While it’s very common to use the @DataJpaTest annotation when implementing integration tests, in this blog post, you will see why you are better off avoiding using this annotation.

Why I don’t use @DataJpaTest

As explained by the manual, Spring Boot provides the @DataJpaTest annotation for Spring Data JPA integration testing.

Once the @DataJpaTest annotation is added to an integration test, Spring Boot will make the following changes to the test configuration:

  • It auto-configures Spring beans, such as JpaRepository, DataSource, Flyway, JPA EntityManagerFactory, Spring JpaTransactionManager, etc.
  • It logs SQL statements to the console using the spring.jpa.show-sql property.
  • It adds @Transactional to every test and rolls back the transaction after each test execution.

While autoconfiguring the JPA-related Spring beans is useful, enabling the spring.jpa.show-sql property and wrapping each test in a @Transactional is not.

Logging best practices

As I explained in this article, the spring.jpa.show-sql property is very limited. Not only does it print SQL statements to the console only, but it cannot print the bind parameter values or highlight whether batching is used or not.

Therefore, you are better off using a better statement logging framework, like DataSource-Proxy, which can also help you detect N+1 query issues automatically during testing.

Transaction management best practices

While wrapping the test execution in a @Transactional context that is rolled back after the test run may seem like a very convenient strategy, in reality, this can be a very problematic setup.

The closer you get to testing the actual logic that will run in production, the more resilient the test suite becomes. For the data access logic, we are interested in testing business methods that start in the service layer and that may call multiple Repositories.

By closing the EntityManager at the end of the service method call, we can make sure that we don’t have any uninitialized Proxy that could trigger a LazyInitializationException in production. If we wrap the entire test method in a @Transactional block, the transaction context will get propagated to the service layer, which will inherit the EntityManager from the test, and this EntityManager can still initialize Proxies after the service method call.

Moreover, the default FlushModeType.AUTO triggers a flush before the transaction commit, which doesn’t happen at the end of the service method call, leading to a very different behavior during testing, as opposed to what happens at runtime in the production environment. By changing the transaction boundaries, some statements might not get executed, and database consistency might not be well covered by the integration test.

Therefore, we should not wrap the test execution in a @Transactional block at all, as we should leave the transaction management to the service layer. However, by doing that, the transaction will commit, and the data will be changed by the currently executing test, meaning that we should clean it up after the test execution.

The best way to clean up test data with Spring and Hibernate

Let’s say that we want to test the transfer method from the following TransferService Spring component:

@Service
public class TransferService {

    private final AccountRepository accountRepository;

    public TransferService(AccountRepository accountRepository) {
        this.accountRepository = accountRepository;
    }

    @Transactional
    public long transfer(String fromIban, String toIban, long cents) {
        Account fromAccount = accountRepository.findByIbanWithHolder(fromIban);

        long fromBalance = fromAccount.getBalance();

        if (fromBalance >= cents) {
            accountRepository.setBalance(fromIban, Math.negateExact(cents));
            accountRepository.setBalance(toIban, cents);
        }

        return accountRepository.getBalance(fromIban);
    }
}

We want to create two test scenarios in the TransferServiceTest class, one for the serial execution and another one for the parallel execution, which will look as follows:

@Test
public void testSerialExecution() {
    assertEquals(10L, accountRepository.getBalance("Alice-123"));
    assertEquals(0L, accountRepository.getBalance("Bob-456"));

    transferService.transfer("Alice-123", "Bob-456", 5L);

    assertEquals(5L, accountRepository.getBalance("Alice-123"));
    assertEquals(5L, accountRepository.getBalance("Bob-456"));

    transferService.transfer("Alice-123", "Bob-456", 5L);

    assertEquals(0L, accountRepository.getBalance("Alice-123"));
    assertEquals(10L, accountRepository.getBalance("Bob-456"));

    transferService.transfer("Alice-123", "Bob-456", 5L);

    assertEquals(0L, accountRepository.getBalance("Alice-123"));
    assertEquals(10L, accountRepository.getBalance("Bob-456"));
}

//Maximum connection count is limited to 64 due to Hikari maximum pool size
private int threadCount = 16;

@Test
public void testParallelExecution() throws InterruptedException {
    assertEquals(10L, accountRepository.getBalance("Alice-123"));
    assertEquals(0L, accountRepository.getBalance("Bob-456"));

    CountDownLatch startLatch = new CountDownLatch(1);
    CountDownLatch endLatch = new CountDownLatch(threadCount);

    for (int i = 0; i < threadCount; i++) {
        new Thread(() -> {
            try {
                startLatch.await();

                transferService.transfer("Alice-123", "Bob-456", 5L);
            } catch (Exception e) {
                LOGGER.error("Transfer failed", e);
            } finally {
                endLatch.countDown();
            }
        }).start();
    }
    LOGGER.info("Starting threads");
    startLatch.countDown();
    endLatch.await();

    LOGGER.info("Alice's balance {}", accountRepository.getBalance("Alice-123"));
    LOGGER.info("Bob's balance {}", accountRepository.getBalance("Bob-456"));
}

As you can see, both tests need two Account records, one for Alice, who has a balance value of 10, and another one for Bob, who has a balance value of 0.

Since both tests modify the initial data and commit the changes after the transfer method is called, we need a way to clean up the test data, and that could be done like this:

@BeforeEach
public void init() {
    entityManagerFactory
        .unwrap(SessionFactoryImplementor.class)
        .getSchemaManager()
        .truncateMappedObjects();
⠀
    transactionTemplate.execute((TransactionCallback<Void>)
            transactionStatus -> {
        AccountHolder alice = new AccountHolder()
            .setId(1L).setCountry(Country.ROMANIA)
            .setFirstName("Alice").setLastName("Smith");
⠀
        AccountHolder bob = new AccountHolder()
            .setId(2L).setCountry(Country.US)
            .setFirstName("Bob").setLastName("Johnson");
⠀
        entityManager.persist(alice);
        entityManager.persist(bob);
⠀
        entityManager.persist(
            new Account()
                .setIban("Alice-123").setHolder(alice).setBalance(10)
        );
⠀
        entityManager.persist(
            new Account()
                .setIban("Bob-456").setHolder(bob).setBalance(0)
        );
⠀
        return null;
    });
}

Since version 6.2, Hibernate ORM has been providing the SchemaManager utility that allows us to interact with the database schema programmatically. For example, as illustrated in the init method above, we can use the truncateMappedObjects method to truncate all database tables associated with the JPA entities mapped in our project.

While we could execute the cleanup in the @AfterEach method, this is not recommended since, in case of a failure or if we stop the test execution during debug, the cleanup would not execute. So, to make sure that the cleanup is executed properly, it’s much better to run it at the beginning of each test execution prior to adding the test data.

If you want to check out this example, you can find it on GitHub.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

As I explained in this article, for the data access layer, integration testing is the only reasonable way to validate the data access logic. Unit testing and mocking the database or the Repositories is not very helpful since you want to test the query execution and the way in which the query response is constructed from the JDBC ResultSet.

And, to validate the transaction management logic, integration tests need to emulate the same behavior that would run in production, meaning that we need the service layer to commit transactions. To clean up the test data after every test execution, it’s best to truncate the underlying database tables and reinsert the initial data prior to every test run.

While you could also drop the database schema and recreate it for every test execution, this would slow down the test execution significantly and would also interfere with the Spring ApplicationContext caching, so it’s better to create the schema only once using an automatic schema migration tool like Flyway and clean up only the data instead.

Transactions and Concurrency Control eBook

2 Comments on “The best way to clean up test data with Spring and Hibernate

  1. Hello, very interesting.

    It enabled us to avoid having to track all objects created within our IT tests and delete them individually through our APIs. This has the advantage of a big simplification of our before / after test logics as well as sparing us to produce a lot of logs regarding the deletion process.

    BUT we went into a big rabbit hole because a couple of our tests were annotated with @Transactional which led to some very silent hanging of the maven surefire process. It was also due to the fact that we were doing the cleaning in the afterEach instead of the beforeEach callback.

    Also postgresql truncate process can be less efficient than simple DELETE calls when you consider tables with a small amount of data in them (cf https://stackoverflow.com/questions/11419536/postgresql-truncation-speed/11423886#11423886)

    Without optimization, the truncate strategy was nearly doubling our full test pass duration.

    So we managed to get the same performance as our previous test cleaning implementation using some postgresql fine tuning to deactivate the data corruption options in case of crash which boils down to : fsync=off, full_page_writes=false, synchronous_commit=off (also pg18 seems to offer the ability to deactivate the vacuuming-like pass of a truncate call).

    For the records Postgresql Testcontainers implementation does already set the fsync=off options.

    • Thanks for the tips about PostgreSQL test optimizations. You can also map the data folder on tmpfs in Tetscontainers. That might speed up the tests sufficiently fast for TRUNCATE so that you won’t need the rest of the settings since all changes are done in RAM, not on Disk.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.