The best way to test the data access layer

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see what is the best way to test the data access layer when using a relational database system.

First, we will talk about the pros and cons of unit testing. While unit testing is fine for validating independent logic units, when it comes to testing the data access layer, integration testing is more useful.

Next, we will discuss about the difference between integration testing and system integration testing and see the benefits of using a data platform like Aiven. With Aiven, not only can we replicate the actual behavior of our data access layer in a production-like environment, but we can also simplify our architecture since we would be using the same database platform for both system integration testing and the production system.

Unit Testing

Unit testing is the process of testing an individual functionality.

For instance, let’s assume we have the following String join method:

public static String join(
        CharSequence delimiter, 
        CharSequence... elements) {
    StringBuilder builder = new StringBuilder();
⠀
    boolean first = true;
⠀
    for (CharSequence element : elements) {
        if(first) {
            first = false;
        } else {
            builder.append(delimiter);
        }
        builder.append(element);
    }
⠀
    return builder.toString();
}

The join method is from the StringUtils utility provided by the Hypersistence Utils project, which allows us to join several String objects.

A unit test can check the correctness of this join method since this particular method doesn’t have any external dependency we need to set in order for the method to be executed.

Therefore, for the join method, it’s fine to use the following unit test:

@Test
public void testJoin(){
    assertEquals(
        "Oracle,PostgreSQL,MySQL,SQL Server",
        StringUtils.join(
            ",", 
            "Oracle", 
            "PostgreSQL", 
            "MySQL", 
            "SQL Server"
        )
    );
}

The unit test above checks if the return value of the join method call matches the expected value for the provided arguments.

While unit testing is extremely useful when validating the correctness of individual logic units, it falls short when it comes to testing your data access layer. That’s because the data access layer cannot function without a database system, and a unit test will require us to mock any external dependency. Therefore, we would simply be testing the way we called the underlying data access framework API, not the way the overall flow of the code will be executed at runtime.

For example, let’s consider we have the following firstLatestPosts method in a ForumService Spring bean:

@Service
@Transactional(readOnly = true)
public class ForumService {
⠀
    private final PostRepository postRepository;
⠀
    public ForumService(
            PostRepository postRepository) {
        this.postRepository = postRepository;
    }
⠀
    public PagedList<Post> firstLatestPosts(
            int pageSize) {
        return postRepository.findTopN(
            Sort
                .by(Post_.CREATED_ON)
                .descending()
                .and(
                    Sort
                        .by(Post_.ID)
                        .descending()
                ),
            pageSize
        );
    }
}

The firstLatestPosts method calls the findTopN method from the PostRepository using a given sorting criteria and a maximum size for the returning result set.

A unit test for the firstLatestPosts method might look like this:

@Test
public void testFirstLatestPosts() {
    final int PAGE_SIZE = 25;
⠀
    PostRepository postRepository = Mockito.mock(PostRepository.class);
    ForumService forumService = new ForumService(postRepository);
⠀
    PagedList<Post> pagedList = Mockito.mock(PagedList.class);
    when(
        postRepository.findTopN(
            any(Sort.class),
            eq(PAGE_SIZE)
        )
    ).thenReturn(pagedList);
⠀
    PagedList<Post> topPage = forumService.firstLatestPosts(PAGE_SIZE);
⠀
    assertSame(pagedList, topPage);
}

In the above test case, we used Mockito to create Mocks for our dependencies. The Mocks allow us to control the behavior of the dependency so that it executes the logic demanded by our unit test. For example, the when call allows us to specify that we want the findTopN method call to return the pagedList Object reference.

However, using Mocks is not very useful since we are not really testing the query execution and whether the database engine really applies the desired filtering and sorting criteria.

So, integration testing is much more valuable when we have to test the data access layer.

Integration Testing

Integration Testing allows us to verify the data exchange between the data access layer and the database system. Therefore, when using integration testing, we can validate:

The correctness of the generated SQL statements.
The number of generated SQL statements to validate batching or to prevent N+1 query issues.
The database transaction and concurrency control guarantees. Otherwise, we risk race conditions, as explained in this article.

To write an integration test for our firstLatestPosts, we can use Spring to define the DataSource that will be used by the Spring Data JPA PostRepository in order to get a database connection on which we send the SQL queries to the database server:

@Bean
public Database database() {
    return Database.POSTGRESQL;
}

@Bean
public DataSourceProvider dataSourceProvider() {
    return database().dataSourceProvider();
}

@Bean(destroyMethod = "close")
public HikariDataSource actualDataSource() {
    HikariConfig hikariConfig = new HikariConfig();
    hikariConfig.setMaximumPoolSize(64);
    hikariConfig.setAutoCommit(false);
    hikariConfig.setDataSource(dataSourceProvider().dataSource());
    return new HikariDataSource(hikariConfig);
}

Now, when calling the firstLatestPosts on the ForumService bean that the Spring Application Context has injected, the underlying PostRepository will execute the SQL query against the previously configured database system:

@Autowired
private ForumService forumService;
⠀
@Test
public void testFirstLatestPosts() {
    PagedList<Post> topPage = forumService.firstLatestPosts(PAGE_SIZE);
⠀
    assertEquals(POST_COUNT, topPage.getTotalSize());
    assertEquals(POST_COUNT / PAGE_SIZE, topPage.getTotalPages());
    assertEquals(1, topPage.getPage());
    List<Long> topIds = topPage.stream()
        .map(Post::getId)
        .toList();
    assertEquals(Long.valueOf(50), topIds.get(0));
    assertEquals(Long.valueOf(49), topIds.get(1));
}

When running this integration test on PostgreSQL, we can see that Hibernate generates the following SQL query:

Query:["
    SELECT 
        p1_0.id,
        p1_0.created_on,
        p1_0.title,
        (
            SELECT count(*) 
            FROM post p2_0
        ) 
    FROM 
        post p1_0 
    ORDER BY 
        p1_0.created_on DESC NULLS LAST,
        p1_0.id DESC 
    OFFSET ? ROWS 
    FETCH FIRST ? ROWS ONLY
"], 
Params:[(0, 25)]

When it comes to using a relational database for testing, there are several common options:

an in-memory database, like H2 or HSQLDB
a local DB or a local Docker container
Testcontainers

Although it’s rather common to use in-memory databases for integration testing, as I explained in this article, this is a rather bad choice since you won’t be able to test database-specific queries or stored procedures that you need to run in production.

It’s very important to use the same database engine that you also run in production to validate the very same behavior that the users are going to experience.

While using an in-memory database system could help you test your portable queries, in reality, you will still need to execute your integration tests on the actual database engine that the production system uses.

One option is to install a local database or use a Docker container that runs the database system your integration tests could use.

Another option is to use Testcontainers, which helps you start and stop the Docker container programmatically, as I explained in this article.

In the High-Performance Java Persistence GitHub repository, the AbstractContainerDataSourceProvider tries to configure a JDBC DataSource if there is a local database matching the JDBC connection configuration settings or bootstrap a Testcontainers database if no local database was found:

public DataSource dataSource() {
    DataSource dataSource = newDataSource();
    try(Connection connection = dataSource.getConnection()) {
        return dataSource;
    } catch (SQLException e) {
        Database database = database();
        if(database.getContainer() == null) {
            database.initContainer(username(), password());
        }
        return newDataSource();
    }
}

The downside of testing against a local database or a local Docker container is that we cannot include the impact of networking since both the integration test and the database system run on the same machine.

If you want to check the networking impact, check out this website, which measures the roundtrip time between your current location and various AWS data centers.

Without batching enabled, inserting data is significantly faster if the database is hosted locally, compared to when a networking call is involved in every statement execution. On the other hand, if you are using an external database system, then most likely you will want to enable JDBC batching, as explained in this article.

System Integration Testing

System integration testing is the next level of testing and complements the existing unit and integration testing suites.

A system integration test involves all the product layers, and it’s usually done in a QA environment that needs to use a database system that’s very close to the one used in production. Unlike an integration test that uses a local database, a system integration test will use an external database system, therefore including the cost of networking in our tests.

To simplify the management and hosting of your database system, a platform like Aiven can really help you achieve your goal.

Aiven is a data platform that allows you to run relational databases (e.g., PostgreSQL and MySQL), NoSQL databases (e.g., Redis, Cassandra), event streaming services (e.g., Kafka, Flink), or analytics services (e.g., ClickHouse, InfluxDB).

Not only is it easy to host and manage your database system when using Aiven, but you can use the same platform for both the production and the QA environments.

The closer you are to the production system, the more relevant the QA environment gets. While it’s common to use Docker in a QA environment, unless you are using the very same engine with the same database settings and statistics, then you are not going to test the same behavior the clients will experience.

As illustrated by this AivenPostgreSQLDataSourceProvider configuration class, using Aiven is the same as using any other relational database system.

If we configure our system integration test to use the AivenPostgreSQLDataSourceProvider:

@Bean
public DataSourceProvider dataSourceProvider() {
    return new AivenPostgreSQLDataSourceProvider();
}

When running the previous test case on Aiven, we can see that the query execution takes just 42 milliseconds even if the database server is hosted in the cloud:

Time:42, 
Success:True,
Query:["
    SELECT 
        p1_0.id,
        p1_0.created_on,
        p1_0.title,
        (
            SELECT count(*) 
            FROM post p2_0
        ) 
    FROM 
        post p1_0 
    ORDER BY 
        p1_0.created_on DESC NULLS LAST,
        p1_0.id DESC 
    OFFSET ? ROWS 
    FETCH FIRST ? ROWS ONLY
"], 
Params:[(0, 25)]

And that’s not all. A lesser-known fact is that database systems, like PostgreSQL or MySQL, come with default settings that are meant for desktops or notebooks, not for dedicated database server systems. A platform like Aiven can help you avoid this issue since you would be using the same database service for both your QA and production environments.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

When it comes to software development, a well-thought-out testing suite can help you provide a high level of quality that your users expect and demand. The better the user experience, the lower the churn rate, and the greater the business revenue.

To test your data access layer, it’s very important to use both integration testing and system integration testing. Integration tests can be run locally by developers prior to submitting a Pull Request, while the system integration tests can run in a QA environment that validates the overall functionality of the system.

If you want to give Aiven a try, then you can use this link to signup for free.

This research was funded by Aiven and conducted in accordance with the blog ethics policy.

While the article was written independently and reflects entirely my opinions and conclusions, the amount of work involved in making this article happen was compensated by Aiven.