JPA Criteria API can generate unexpected SQL statements

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to learn why you should always check the SQL statements generated by JPA Criteria API queries when using Hibernate.

Criteria API is very useful for dynamically building queries, but that’s the only use case where I’d use it. Whenever you have a UI with N filters that may arrive in any M combinations, it makes sense to have an API to construct queries dynamically since concatenating strings is always a path I’m running away from.

The question is, are you aware of the SQL queries your Criteria API generates behind the scenes? I’ve been reviewing many such queries lately, and I’ve been struck by how easy it is to get it wrong.

Domain Model

Let’s start from the following entities diagram:

Product

So we have a Product with a ToOne association to a WareHouseProductInfo and a ToMany association to an Image entity.

JPA Criteria API SQL statements

Now let’s start with this Criteria API query:

CriteriaBuilder cb = entityManager
    .getCriteriaBuilder();

CriteriaQuery<Product> query = cb
    .createQuery(Product.class);

Root<Product> productRoot = query
    .from(Product.class);

query
.select(productRoot)
.where(
    cb.and(
        cb.equal(
            productRoot.get(Product_.code), 
            "tvCode"
        ), 
        cb.gt(
            productRoot
                .get(Product_.warehouseProductInfo)
                .get(WarehouseProductInfo_.quantity), 
            50
        )
    )
);

Product product = entityManager
    .createQuery(query)
    .getSingleResult();

Can you spot any issue with this previous query? Let’s check the generated SQL:

SELECT product0_.id AS id1_14_,
       product0_.code AS code2_14_,
       product0_.company_id AS company_5_14_,
       product0_.importer_id AS importer6_14_,
       product0_.name AS name3_14_,
       product0_.version AS version4_14_
FROM product product0_
CROSS JOIN warehouseproductinfo warehousep1_
WHERE product0_.id = warehousep1_.id
  AND product0_.code = ?
  AND warehousep1_.quantity > 50

I was expecting an INNER JOIN, and I got a CROSS JOIN that gets filtered in the WHERE clause instead (e.g., product0_.id = warehousep1_.id). Even if the filtered CROSS JOIN is basically equivalent to an INNER JOIN, it still breaks the Principle of least astonishment.

So, writing Criteria API is not a walk in the park after all.

Luckily, this example can be fixed as follows:

CriteriaBuilder cb = entityManager
    .getCriteriaBuilder();

CriteriaQuery<Product> query = cb
    .createQuery(Product.class);

Root<Product> productRoot = query
    .from(Product.class);

Join<Product, WarehouseProductInfo> warehouseProductInfoJoin = productRoot
    .join(Product_.warehouseProductInfo);

query
.select(productRoot)
.where(
    cb.and(
        cb.equal(
            productRoot.get(Product_.code), 
            "tvCode"
        ),
        cb.gt(
            warehouseProductInfoJoin
                .get(WarehouseProductInfo_.quantity), 
            50
        )
    )
);

Product product = entityManager
    .createQuery(query)
    .getSingleResult();

which yields the expected SQL query:

SELECT product0_.id AS id1_14_,
       product0_.code AS code2_14_,
       product0_.company_id AS company_5_14_,
       product0_.importer_id AS importer6_14_,
       product0_.name AS name3_14_,
       product0_.version AS version4_14_
FROM product product0_
INNER JOIN warehouseproductinfo warehousep1_ ON product0_.id = warehousep1_.id
WHERE product0_.code = ?
  AND warehousep1_.quantity > 50

Writing JPA Criteria API queries is not very easy. The Codota IDE plugin can guide you on how to write such queries, therefore increasing your productivity.

For more details about how you can use Codota to speed up the process of writing Criteria API queries, check out this article.

So, be careful with how you define your joins in Criteria API, as you might get some unexpected SQL queries. Now, let’s compare the previous Criteria API query to its JPQL counterpart:

Product product = entityManager.createQuery(
	"select p " +
	"from Product p " +
	"inner join p.warehouseProductInfo w " +
	"where " +
	"   p.code = :code and " +
	"   w.quantity > :quantity ", Product.class)
.setParameter("code", "tvCode")
.setParameter("quantity", 50)
.getSingleResult();

I’ve always found the JPQL more descriptive than Criteria API, but there are projects where the Criteria API is the default JPA querying mechanism, so it’s not only employed for dynamic filter queries but even for the ones with fixed where clauses.

Well, you can eventually achieve the same results, but while I can predict the SQL query out of a JPQL one when it comes to Criteria API, I am completely clueless. Whenever I review a Criteria query, I always have to run an Integration Test to check the outputted SQL, for small changes can really make big differences.

Even if Criteria API usage is imposed, you can still work around it, considering you are extra careful and you review all your queries.

Now let’s get back to one of the most exotic joining subterfuge (yet sub-optimal) Criteria query I happened to bump into recently. If you work on a large project with many developers, you’ll inevitably run into this type of construct. That’s one extra reason why I prefer JPQL over Criteria API. With JPQL, you can’t get it so off-the-rails like in the following example:

CriteriaBuilder cb = entityManager
    .getCriteriaBuilder();

CriteriaQuery<Product> query = cb
    .createQuery(Product.class);

Root<Product> product = query
    .from(Product.class);

query.select(product);
query.distinct(true);

List<Predicate> criteria = new ArrayList<Predicate>();

criteria.add(
    cb.like(
        cb.lower(product.get(Product_.name)), 
        "%tv%"
    )
);

Subquery<Long> subQuery = query
    .subquery(Long.class);

Root<Image> infoRoot = subQuery.from(Image.class);

Join<Image, Product> productJoin = infoRoot
    .join("product");

subQuery
    .select(productJoin.<Long>get(Product_.id));

subQuery
    .where(cb.gt(infoRoot.get(Image_.index), 0));

criteria.add(
    cb.in(
        product.get(Product_.id)
    )
    .value(subQuery)
);

query.where(
    cb.and(
        criteria.toArray(new Predicate[criteria.size()])
    )
);

List<Product> products = entityManager.createQuery(query)
    .getResultList();

I find these types of queries too difficult to parse through viewing them only, but there is a sub-select smelling like trouble, so let’s see the generated SQL query:

SELECT DISTINCT 
    product0_.id AS id1_14_,
    product0_.code AS code2_14_,
    product0_.company_id AS company_5_14_,
    product0_.importer_id AS importer6_14_,
    product0_.name AS name3_14_,
    product0_.version AS version4_14_
FROM product product0_
WHERE 
    (Lower(product0_.name) LIKE ?) 
    AND ( 
        product0_.id IN (
            SELECT product2_.id
            FROM image image1_
            INNER JOIN product product2_ 
                ON image1_.product_id = product2_.id
            WHERE image1_.index > 0
        )
    ) 

While some use-cases call for a SQL subquery, here it’s just completely unnecessary, and it only slows your query down. But this time we were actually requiring a dynamic filtering query, so JPQL was out of the question. The only way to fix it was to write the proper Criteria query.

So here it is, after a refactoring:

CriteriaBuilder cb = entityManager
    .getCriteriaBuilder();
    
CriteriaQuery<Product> query = cb
    .createQuery(Product.class);
    
Root<Image> imageRoot = query.from(Image.class);

Join<Image, Product> productJoin = imageRoot.join("product");

query.select(productJoin);
query.distinct(true);

List<Predicate> criteria = new ArrayList<Predicate>();

criteria.add(
    cb.like(
        cb.lower(productJoin.get(Product_.name)), 
        "%tv%"
    )
);

criteria.add(
    cb.gt(
        imageRoot.get(Image_.index), 
        0
    )
);

query.where(
    cb.and(
        criteria.toArray(new Predicate[criteria.size()])
    )
);

List<Product> products = entityManager
    .createQuery(query)
    .getResultList();

And now our SQL query looks much better:

SELECT DISTINCT product1_.id AS id1_14_,
                product1_.code AS code2_14_,
                product1_.company_id AS company_5_14_,
                product1_.importer_id AS importer6_14_,
                product1_.name AS name3_14_,
                product1_.version AS version4_14_
FROM image image0_
INNER JOIN product product1_ ON image0_.product_id = product1_.id
WHERE (Lower(product1_.name) LIKE ?)
  AND image0_.index > 0

I’d reasoned on why the developer would have chosen the sub-query in this particular context, and I believed it was because he hadn’t known that he could project a different entity than the Root one, in a similar fashion with a JPQL query.

Now let’s do a DTO projection, since there are times when we don’t need fetching entire Entities, but just enough info to serve our business needs. This time we will create the following query:

CriteriaBuilder cb = entityManager
    .getCriteriaBuilder();
    
CriteriaQuery<ImageProductDTO> query = cb
    .createQuery(ImageProductDTO.class);
    
Root<Image> imageRoot = query.from(Image.class);

Join<Image, Product> productJoin = imageRoot.join(Image_.product);

query.distinct(true);

List<Predicate> criteria = new ArrayList<Predicate>();

criteria.add(
    cb.like(
        cb.lower(
            productJoin.get(Product_.name)
        ), 
        "%tv%"
    )
);

criteria.add(
    cb.gt(
        imageRoot.get(Image_.index), 
        0
    )
);

query.where(
    cb.and(
        criteria.toArray(new Predicate[criteria.size()])
    )
);

query.select(
    cb.construct(
        ImageProductDTO.class, 
        imageRoot.get(Image_.name), 
        productJoin.get(Product_.name)
    )
)
.orderBy(
    cb.asc(
        imageRoot.get(Image_.name)
    )
);

List<ImageProductDTO> productImages = entityManager
    .createQuery(query)
    .getResultList();

Generating a clean SQL:

SELECT DISTINCT image0_.name AS col_0_0_,
                product1_.name AS col_1_0_
FROM image image0_
INNER JOIN product product1_ ON image0_.product_id = product1_.id
WHERE (Lower(product1_.name) LIKE ?)
  AND image0_.index > 0
ORDER  BY image0_.name ASC

Well, this query does not even need Criteria API, as it can be easily be expressed using the following JPQL:

List<ImageProductDTO> productImages = entityManager.createQuery("""
    select 
        new ImageProductDTO(
            i.name,
            p.name
        )
    from Image i
    join i.product p
    where 
        lower(p.name) like :productName and 
        i.index = 0
    order by i.name
    """, ImageProductDTO.class)
.setParameter("productName", "%tv%")
.getResultList();

If you need to build native SQL queries dynamically, then you can use JOOQ. The JPQL query above can be rewritten as follows:

List<ImageProductDTO> productImages = jooqContext
.select(IMAGE.NAME, PRODUCT.NAME)
.from(IMAGE)
.join(PRODUCT).on(IMAGE.PRODUCT_ID.equal(PRODUCT.ID))
.where(PRODUCT.NAME.likeIgnoreCase("%tv%"))
	.and(IMAGE.INDEX.greaterThan(0))
.orderBy(IMAGE.NAME.asc())
.fetch().into(ImageProductDTO.class);

This is way more readable, you don’t really have to guess what’s the output SQL query, and it even generates binding parameters, that I found extremely valuable:

SELECT "PUBLIC"."image"."name",
       "PUBLIC"."product"."name"
FROM   "PUBLIC"."image"
       JOIN "PUBLIC"."product"
         ON "PUBLIC"."image"."product_id" = "PUBLIC"."product"."id"
WHERE  ( Lower("PUBLIC"."product"."name") LIKE Lower('%tv%')
         AND "PUBLIC"."image"."index" > 0 )
ORDER  BY "PUBLIC"."image"."name" ASC  

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

When using Hibernate, it’s very important to validate the SQL statements generated by JPA Criteria API queries.

The first case I showed you is one of the very first mistakes I did while trying to learn the Criteria API. I discovered that I have to be extra cautious when writing such queries since you can easily get unexpected SQL queries.

If you’ve chosen to use Criteria API for all your queries, then you might be interested in checking JOOQ too. Even if you choose JPQL, whenever you want to build advanced dynamic filtered queries, JOOQ can better help you with that.

You will still be using a fluent API, you won’t write any String, and you will get more SQL features than what Hibernate currently offers. So, whenever your use cases don’t require querying for managed Entities, you can use JOOQ instead. I like it because I can predict the generated SQL much better than with Criteria API, and when an API is easier to use, there are fewer “surprises” waiting to “wow” you.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.