How to optimize JPQL and Criteria API query plans with Hibernate Statistics

(Last Updated On: November 12, 2018)

Introduction

Every entity query, be it JPQL or Criteria API, needs to be parsed and compiled to an AST (Abstract Syntax Tree) in order to generate the associated SQL query. The entity query compilation takes time, as explained in this article so Hibernate provides a QueryPlanCache to store already-compiled plans.

Starting with Hibernate 5.4, the Hibernate Statistics object allows you to monitor the Query Plan Cache and this article will show you how to take advantage of this feature to speed up IN query performance.

Domain Model

Assuming we have the following Post entity:

Post entity with an identifier and a title basic property

Which is mapped like this:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Integer id;

    private String title;

    //Getters and setters omitted for brevity

}

JPQL IN query

Now, let’s see what happens when Hibernate executes a JPQL query which contains an IN clause:

SessionFactory sessionFactory = entityManagerFactory()
    .unwrap(SessionFactory.class);
    
Statistics statistics = sessionFactory.getStatistics();
statistics.clear();

doInJPA(entityManager -> {
    List<Post> posts = entityManager.createQuery(
        "select p " +
        "from Post p " +
        "where p.id in :ids", Post.class)
    .setParameter("ids", Arrays.asList(1, 2, 3))
    .getResultList();
});

for (String query : statistics.getQueries()) {
    LOGGER.info("Executed query: {}", query);
}

To check the executed entity queries, we can use the Statistics#getQueries method. However, by default, Hibernate does not collect any statistics, so we need to enable this feature by setting the hibernate.generate_statistics configuration property to true.

<property name="hibernate.generate_statistics" value="true"/>

Now, when executing the JPQL query above, the following queries are going to be logged:

-- Executed query: 
select p 
from Post p 
where p.id in :ids

-- Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2)

The first query is the one we supplied, while the second one is the expanded query since JDBC does not support binding an array of values as individual IN query parameter values.

Criteria API IN query

Now, let’s translate the previous JPQL query to its Criteria API equivalent:

SessionFactory sessionFactory = entityManagerFactory()
    .unwrap(SessionFactory.class);

Statistics statistics = sessionFactory.getStatistics();
statistics.clear();

doInJPA(entityManager -> {
    CriteriaBuilder builder = entityManager.getCriteriaBuilder();

    CriteriaQuery<Post> criteria = builder.createQuery(Post.class);
    Root<Post> fromPost = criteria.from(Post.class);

    criteria.where(
        builder.in(
            fromPost.get("id")).value(Arrays.asList(1, 2, 3)
        )
    );

    List<Post> posts = entityManager
    .createQuery(criteria)
    .getResultList();
});

for (String query : statistics.getQueries()) {
    LOGGER.info("Executed query: {}", query);
}

When executing the Criteria API query, Hibernate logs the following executing entity queries:

-- Executed query: 
select generatedAlias0 from Post as generatedAlias0 where generatedAlias0.id in (:param0)

-- Executed query: 
select generatedAlias0 from Post as generatedAlias0 where generatedAlias0.id in (:param0_0, :param0_1, :param0_2)

So, every entity query, be it JPQL or Criteria API, will have to be expanded if it contains an IN clause.

Query plan cache and IN clause expansion

Now, if the number of IN clause parameter varies, Hibernate will have to compile a significant number of entity queries, as illustrated by the following example:

SessionFactory sessionFactory = entityManagerFactory()
    .unwrap(SessionFactory.class);
    
Statistics statistics = sessionFactory.getStatistics();
statistics.clear();

doInJPA(entityManager -> {
    for (int i = 1; i < 16; i++) {
        getPostByIds(
            entityManager,
            IntStream
                .range(1, i + 1)
                .boxed()
                .toArray(Integer[]::new)
        );
    }
});

assertEquals(
    16L, 
    statistics.getQueryPlanCacheMissCount()
);

for (String query : statistics.getQueries()) {
    LOGGER.info("Executed query: {}", query);
}

When executing the test case above, Hibernate generates the following output:

--Executed query: 
select p 
from Post p 
where p.id in :ids

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8)

--Executed query: 
select p from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8, :ids_9)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8, :ids_9, :ids_10)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8, :ids_9, :ids_10, :ids_11)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8, :ids_9, :ids_10, :ids_11, :ids_12)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8, :ids_9, :ids_10, :ids_11, :ids_12, :ids_13)

--Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8, :ids_9, :ids_10, :ids_11, :ids_12, :ids_13, :ids_14)

So, although the same JPQL is executed multiple times because of IN clause expansion,
Hibernate generates 15 JPQL queries that need to be parsed and compiled separately.

The statistics.getQueryPlanCacheMissCount() returns a value of 16 which includes the original JPQL query and the 15 JPQL queries resulted after IN clause expansion. For this reason, the Query Plan Cache does not help in this situation.

IN clause parameter padding

As already explained in this article, Hibernate offers the IN clause parameter padding feature which can help us reduce the Query Plan Cache miss count we saw previously.

To enable IN clause parameter padding, you need to enable the following configuration property:

<property name="hibernate.query.in_clause_parameter_padding" value="true"/>

After enabling the hibernate.query.in_clause_parameter_padding property,
we can expect just 6 IN clause query expansions:

Statistics statistics = sessionFactory.getStatistics();
statistics.clear();

doInJPA(entityManager -> {
    for (int i = 1; i < 16; i++) {
        getPostByIds(
            entityManager,
            IntStream
                .range(1, i + 1)
                .boxed()
                .toArray(Integer[]::new)
        );
    }
});

assertEquals(
    6L, 
    statistics.getQueryPlanCacheMissCount()
);

for (String query : statistics.getQueries()) {
    LOGGER.info("Executed query: {}", query);
}

When running the test case above, we can indeed see that only 6 query expansions took place:

-- Executed query: 
select p 
from Post p 
where p.id in :ids

-- Executed query: 
select p 
from Post p 
where p.id in (:ids_0)

-- Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1)

-- Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3)

-- Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7)

-- Executed query: 
select p 
from Post p 
where p.id in (:ids_0, :ids_1, :ids_2, :ids_3, :ids_4, :ids_5, :ids_6, :ids_7, :ids_8, :ids_9, :ids_10, :ids_11, :ids_12, :ids_13, :ids_14, :ids_15)

Cool, right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

So, when JPQL or Criteria API entity queries are executed with variable numbers of IN clause bind parameter values, you can improve application performance by enabling the hibernate.query.in_clause_parameter_padding configuration property.

Also, thanks to the Query Plan Cache statistics included in Hibernate 5.4 Statistics, you can get a better understanding of what Hibernate does behind the scenes.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.