How to query parent rows when all children must match the filtering criteria with SQL and Hibernate
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
The Hibernate forum is a never-ending source of inspiration when it comes to solving real-life problems you might bump into when developing an enterprise application.
For instance, this post asks about a JPQL query which should fetch a given parent entity when all its child entities match the given filtering criteria.
How to query parent rows when all children must match the filtering criteria with SQL and Hibernate @vlad_mihalcea https://t.co/lXIDCJXnL0 pic.twitter.com/SL4N0hvjkF
— Java (@java) July 29, 2018
Domain Model
Consider our database contains the following tables which form a many-to-many relationship:
Both the cluster
and the tag
tables are independent relations. For this purpose, they are associated through the cluster_tag
join table.
Our database tables contain the following entries:
The cluster table
id | name |
---|---|
1 | Cluster 1 |
2 | Cluster 2 |
3 | Cluster 3 |
The tag table
id | tag_name | tag_value |
---|---|---|
1 | Spark | 2.2 |
2 | Hadoop | 2.7 |
3 | Spark | 2.3 |
4 | Hadoop | 2.6 |
The cluster_tag table
cluster_id | tag_id |
---|---|
1 | 1 |
1 | 2 |
2 | 1 |
2 | 4 |
3 | 3 |
3 | 4 |
JPA entities
As I explained in this article, a very efficient way of mapping the many-to-many table relationship is to map the join table as a JPA entity.
The Tag
entity looks as follows:
@Entity(name = "Tag") @Table( name = "tag", uniqueConstraints = @UniqueConstraint( columnNames = { "tag_name", "tag_value" } ) ) public class Tag { @Id private Long id; @Column(name = "tag_name") private String name; @Column(name = "tag_value") private String value; //Getters and setters omitted for brevity }
The Cluster
entity is mapped like this:
@Entity(name = "Cluster") @Table(name = "cluster") public class Cluster { @Id private Long id; private String name; @OneToMany( mappedBy = "cluster", cascade = CascadeType.ALL, orphanRemoval = true ) private List<ClusterTag> tags = new ArrayList<>(); //Getters and setters omitted for brevity public void addTag(Tag tag) { tags.add(new ClusterTag(this, tag)); } }
The ClusterTag
entity is mapped as follows:
@Entity(name = "ClusterTag") @Table(name = "cluster_tag") public class ClusterTag { @EmbeddedId private ClusterTagId id; @ManyToOne @MapsId("clusterId") private Cluster cluster; @ManyToOne @MapsId("tagId") private Tag tag; private ClusterTag() {} public ClusterTag(Cluster cluster, Tag tag) { this.cluster = cluster; this.tag = tag; this.id = new ClusterTagId( cluster.getId(), tag.getId() ); } //Getters and setters omitted for brevity }
As explained in this article, because the ClusterTag
entity has a composite identifier, we are using the ClusterTagId
embeddable which looks as follows:
@Embeddable public class ClusterTagId implements Serializable { @Column(name = "cluster_id") private Long clusterId; @Column(name = "tag_id") private Long tagId; public ClusterTagId() {} public ClusterTagId( Long clusterId, Long tagId) { this.clusterId = clusterId; this.tagId = tagId; } //Getters omitted for brevity @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; ClusterTagId that = (ClusterTagId) o; return Objects.equals(clusterId, that.clusterId) && Objects.equals(tagId, that.tagId); } @Override public int hashCode() { return Objects.hash(clusterId, tagId); } }
That’s it.
The problem
We now want to fetch cluster
records having tag
entries that match the following two conditions:
- The
tag
name
isSpark
and thevalue
is2.2
or - The
tag
name
isHadoop
and thevalue
is2.7
Now, let’s try various ways to solve this problem.
A broken first attempt
One of the first queries that might come to your mind is to simply join all associations and filter by the required criteria:
List<Cluster> clusters = entityManager .createQuery( "select distinct c " + "from ClusterTag ct " + "join ct.cluster c " + "join ct.tag t " + "where " + " (t.name = :tagName1 and t.value = :tagValue1) or " + " (t.name = :tagName2 and t.value = :tagValue2) " , Cluster.class) .setParameter("tagName1", "Spark") .setParameter("tagValue1", "2.2") .setParameter("tagName2", "Hadoop") .setParameter("tagValue2", "2.7") .getResultList();
However, this query returns 2 results: Cluster1
and Cluster2
because both of them have a tag
row matching one of the two conditions.
But, we don’t want that! We want the cluster
records for which all associated tag
match either the first or the second predicate.
Native SQL – JOIN solution
Before figuring out how to solve this problem in JPQL, it’s better to try it with plain SQL.
One solution would be to select the cluster_tag
entries whose associated tag
rows match the filtering criteria and since we expect 2 matches, count the number of matches so that we filter out the tag
not matching all conditions.
By joining the cluster
table with the table result of the cluster_tag
inner query, we can get the desired result:
List<Cluster> clusters = entityManager .createNativeQuery( "SELECT * " + "FROM cluster c " + "JOIN (" + " SELECT ct.cluster_id AS c_id " + " FROM cluster_tag ct " + " JOIN tag t ON ct.tag_id = t.id " + " WHERE " + " (t.tag_name = :tagName1 AND t.tag_value = :tagValue1) OR " + " (t.tag_name = :tagName2 AND t.tag_value = :tagValue2) " + " GROUP BY ct.cluster_id " + " HAVING COUNT(*) = 2" + ") ct1 on c.id = ct1.c_id ", Cluster.class) .setParameter("tagName1", "Spark") .setParameter("tagValue1", "2.2") .setParameter("tagName2", "Hadoop") .setParameter("tagValue2", "2.7") .getResultList();
However, this query has one big disadvantage. We just want the Cluster
entity, so the topmost join will generate more work on the database side (especially if the underlying database only supports Nested Loops) which could be avoided if we rewrite the query to use a semi-join instead. More, the SQL query above cannot be expressed in JPQL, which might be a problem if we want to build the query dynamically using Criteria API.
Native SQL – SEMI-JOIN solution
The SEMI-JOIN query looks like this:
List<Cluster> clusters = entityManager .createNativeQuery( "SELECT * " + "FROM cluster c " + "WHERE EXISTS (" + " SELECT ct.cluster_id as c_id " + " FROM cluster_tag ct " + " JOIN tag t ON ct.tag_id = t.id " + " WHERE " + " c.id = ct.cluster_id AND ( " + " (t.tag_name = :tagName1 AND t.tag_value = :tagValue1) OR " + " (t.tag_name = :tagName2 AND t.tag_value = :tagValue2) " + " )" + " GROUP BY ct.cluster_id " + " HAVING COUNT(*) = 2 " + ") ", Cluster.class) .setParameter("tagName1", "Spark") .setParameter("tagValue1", "2.2") .setParameter("tagName2", "Hadoop") .setParameter("tagValue2", "2.7") .getResultList();
Not only that this is more efficient since, ultimately, we are only selecting and projecting the cluster
records, but the query is even easier to read and can be adapted to JPQL or Criteria API as well.
JPQL – SEMI-JOIN solution with explicit association joining
As already stated, the SEMI-JOIN query can be rewritten to JPQL as follows:
List<Cluster> clusters = entityManager.createQuery( "select c " + "from Cluster c " + "where exists (" + " select ctc.id " + " from ClusterTag ct " + " join ct.cluster ctc " + " join ct.tag ctt " + " where " + " c.id = ctc.id and ( " + " (ctt.name = :tagName1 and ctt.value = :tagValue1) or " + " (ctt.name = :tagName2 and ctt.value = :tagValue2) " + " )" + " group by ctc.id " + " having count(*) = 2" + ") ", Cluster.class) .setParameter("tagName1", "Spark") .setParameter("tagValue1", "2.2") .setParameter("tagName2", "Hadoop") .setParameter("tagValue2", "2.7") .getResultList();
Although explicit joins are usually preferred when writing JPQL queries, this time, it looks like Hibernate issues a useless JOIN between cluster_tag
and cluster
in the inner query:
SELECT c.id AS id1_0_, c.NAME AS name2_0_ FROM cluster c WHERE EXISTS ( SELECT ctc.id FROM cluster_tag ct INNER JOIN cluster ctc ON ct.cluster_id = ctc.id INNER JOIN tag ctt ON ct.tag_id = ctt.id WHERE c.id = ctc.id AND ( ctt.tag_name = ? AND ctt.tag_value = ? OR ctt.tag_name = ? AND ctt.tag_value = ? ) GROUP BY ctc.id HAVING COUNT(*) = 2 )
Notice the INNER JOIN cluster ctc ON ct.cluster_id = ctc.id
redundant join which we’d like to avoid.
JPQL – SEMI-JOIN solution with implicit association joining
Rewriting the previous query to use an implicit join for ClusterTag.cluster
can be done as follows:
List<Cluster> clusters = entityManager .createQuery( "select c " + "from Cluster c " + "where exists (" + " select ct.cluster.id " + " from ClusterTag ct " + " join ct.tag ctt " + " where " + " c.id = ct.cluster.id and ( " + " (ctt.name = :tagName1 and ctt.value = :tagValue1) or " + " (ctt.name = :tagName2 and ctt.value = :tagValue2) " + " )" + " group by ct.cluster.id " + " having count(*) = 2" + ") ", Cluster.class) .setParameter("tagName1", "Spark") .setParameter("tagValue1", "2.2") .setParameter("tagName2", "Hadoop") .setParameter("tagValue2", "2.7") .getResultList();
When running the JPQL query above, Hibernate generates the following SQL query:
SELECT c.id AS id1_0_, c.NAME AS name2_0_ FROM cluster c WHERE EXISTS ( SELECT ct.cluster_id FROM cluster_tag ct INNER JOIN tag ctt ON ct.tag_id = ctt.id WHERE c.id = ct.cluster_id AND ( ctt.tag_name = ? AND ctt.tag_value = ? OR ctt.tag_name = ? AND ctt.tag_value = ? ) GROUP BY ct.cluster_id HAVING COUNT(*) = 2 )
Great!
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
When it comes to fetching data, it’s best to envision the SQL query first, and only afterward, use a JPQL or Criteria API query to fetch entities that we plan on modifying.
