How to query parent rows when all children must match the filtering criteria with SQL and Hibernate
Are you struggling with performance issues in your Spring, Jakarta EE, or Java EE application?
What if there were a tool that could automatically detect what caused performance issues in your JPA and Hibernate data access layer?
Wouldn’t it be awesome to have such a tool to watch your application and prevent performance issues during development, long before they affect production systems?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, Micronaut, or Play Framework.
So, rather than fixing performance issues in your production system on a Saturday night, you are better off using Hypersistence Optimizer to help you prevent those issues so that you can spend your time on the things that you love!
Introduction
The Hibernate forum is a never-ending source of inspiration when it comes to solving real-life problems you might bump into when developing an enterprise application.
For instance, this post asks about a JPQL query which should fetch a given parent entity when all its child entities match the given filtering criteria.
How to query parent rows when all children must match the filtering criteria with SQL and Hibernate @vlad_mihalcea https://t.co/lXIDCJXnL0 pic.twitter.com/SL4N0hvjkF
— Java (@java) July 29, 2018
Domain Model
Consider our database contains the following tables which form a many-to-many relationship:
Both the cluster and the tag tables are independent relations. For this purpose, they are associated through the cluster_tag join table.
Our database tables contain the following entries:
The cluster table
| id | name |
|---|---|
| 1 | Cluster 1 |
| 2 | Cluster 2 |
| 3 | Cluster 3 |
The tag table
| id | tag_name | tag_value |
|---|---|---|
| 1 | Spark | 2.2 |
| 2 | Hadoop | 2.7 |
| 3 | Spark | 2.3 |
| 4 | Hadoop | 2.6 |
The cluster_tag table
| cluster_id | tag_id |
|---|---|
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 4 |
| 3 | 3 |
| 3 | 4 |
JPA entities
As I explained in this article, a very efficient way of mapping the many-to-many table relationship is to map the join table as a JPA entity.
The Tag entity looks as follows:
@Entity(name = "Tag")
@Table(
name = "tag",
uniqueConstraints = @UniqueConstraint(
columnNames = {
"tag_name",
"tag_value"
}
)
)
public class Tag {
@Id
private Long id;
@Column(name = "tag_name")
private String name;
@Column(name = "tag_value")
private String value;
//Getters and setters omitted for brevity
}
The Cluster entity is mapped like this:
@Entity(name = "Cluster")
@Table(name = "cluster")
public class Cluster {
@Id
private Long id;
private String name;
@OneToMany(
mappedBy = "cluster",
cascade = CascadeType.ALL,
orphanRemoval = true
)
private List<ClusterTag> tags = new ArrayList<>();
//Getters and setters omitted for brevity
public void addTag(Tag tag) {
tags.add(new ClusterTag(this, tag));
}
}
The ClusterTag entity is mapped as follows:
@Entity(name = "ClusterTag")
@Table(name = "cluster_tag")
public class ClusterTag {
@EmbeddedId
private ClusterTagId id;
@ManyToOne
@MapsId("clusterId")
private Cluster cluster;
@ManyToOne
@MapsId("tagId")
private Tag tag;
private ClusterTag() {}
public ClusterTag(Cluster cluster, Tag tag) {
this.cluster = cluster;
this.tag = tag;
this.id = new ClusterTagId(
cluster.getId(),
tag.getId()
);
}
//Getters and setters omitted for brevity
}
As explained in this article, because the ClusterTag entity has a composite identifier, we are using the ClusterTagId embeddable which looks as follows:
@Embeddable
public class ClusterTagId
implements Serializable {
@Column(name = "cluster_id")
private Long clusterId;
@Column(name = "tag_id")
private Long tagId;
public ClusterTagId() {}
public ClusterTagId(
Long clusterId,
Long tagId) {
this.clusterId = clusterId;
this.tagId = tagId;
}
//Getters omitted for brevity
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass())
return false;
ClusterTagId that = (ClusterTagId) o;
return Objects.equals(clusterId, that.clusterId) &&
Objects.equals(tagId, that.tagId);
}
@Override
public int hashCode() {
return Objects.hash(clusterId, tagId);
}
}
That’s it.
The problem
We now want to fetch cluster records having tag entries that match the following two conditions:
- The
tagnameisSparkand thevalueis2.2or - The
tagnameisHadoopand thevalueis2.7
Now, let’s try various ways to solve this problem.
A broken first attempt
One of the first queries that might come to your mind is to simply join all associations and filter by the required criteria:
List<Cluster> clusters = entityManager
.createQuery(
"select distinct c " +
"from ClusterTag ct " +
"join ct.cluster c " +
"join ct.tag t " +
"where " +
" (t.name = :tagName1 and t.value = :tagValue1) or " +
" (t.name = :tagName2 and t.value = :tagValue2) "
, Cluster.class)
.setParameter("tagName1", "Spark")
.setParameter("tagValue1", "2.2")
.setParameter("tagName2", "Hadoop")
.setParameter("tagValue2", "2.7")
.getResultList();
However, this query returns 2 results: Cluster1 and Cluster2 because both of them have a tag row matching one of the two conditions.
But, we don’t want that! We want the cluster records for which all associated tag match either the first or the second predicate.
Native SQL – JOIN solution
Before figuring out how to solve this problem in JPQL, it’s better to try it with plain SQL.
One solution would be to select the cluster_tag entries whose associated tag rows match the filtering criteria and since we expect 2 matches, count the number of matches so that we filter out the tag not matching all conditions.
By joining the cluster table with the table result of the cluster_tag inner query, we can get the desired result:
List<Cluster> clusters = entityManager
.createNativeQuery(
"SELECT * " +
"FROM cluster c " +
"JOIN (" +
" SELECT ct.cluster_id AS c_id " +
" FROM cluster_tag ct " +
" JOIN tag t ON ct.tag_id = t.id " +
" WHERE " +
" (t.tag_name = :tagName1 AND t.tag_value = :tagValue1) OR " +
" (t.tag_name = :tagName2 AND t.tag_value = :tagValue2) " +
" GROUP BY ct.cluster_id " +
" HAVING COUNT(*) = 2" +
") ct1 on c.id = ct1.c_id ", Cluster.class)
.setParameter("tagName1", "Spark")
.setParameter("tagValue1", "2.2")
.setParameter("tagName2", "Hadoop")
.setParameter("tagValue2", "2.7")
.getResultList();
However, this query has one big disadvantage. We just want the Cluster entity, so the topmost join will generate more work on the database side (especially if the underlying database only supports Nested Loops) which could be avoided if we rewrite the query to use a semi-join instead. More, the SQL query above cannot be expressed in JPQL, which might be a problem if we want to build the query dynamically using Criteria API.
Native SQL – SEMI-JOIN solution
The SEMI-JOIN query looks like this:
List<Cluster> clusters = entityManager
.createNativeQuery(
"SELECT * " +
"FROM cluster c " +
"WHERE EXISTS (" +
" SELECT ct.cluster_id as c_id " +
" FROM cluster_tag ct " +
" JOIN tag t ON ct.tag_id = t.id " +
" WHERE " +
" c.id = ct.cluster_id AND ( " +
" (t.tag_name = :tagName1 AND t.tag_value = :tagValue1) OR " +
" (t.tag_name = :tagName2 AND t.tag_value = :tagValue2) " +
" )" +
" GROUP BY ct.cluster_id " +
" HAVING COUNT(*) = 2 " +
") ", Cluster.class)
.setParameter("tagName1", "Spark")
.setParameter("tagValue1", "2.2")
.setParameter("tagName2", "Hadoop")
.setParameter("tagValue2", "2.7")
.getResultList();
Not only that this is more efficient since, ultimately, we are only selecting and projecting the cluster records, but the query is even easier to read and can be adapted to JPQL or Criteria API as well.
JPQL – SEMI-JOIN solution with explicit association joining
As already stated, the SEMI-JOIN query can be rewritten to JPQL as follows:
List<Cluster> clusters = entityManager.createQuery(
"select c " +
"from Cluster c " +
"where exists (" +
" select ctc.id " +
" from ClusterTag ct " +
" join ct.cluster ctc " +
" join ct.tag ctt " +
" where " +
" c.id = ctc.id and ( " +
" (ctt.name = :tagName1 and ctt.value = :tagValue1) or " +
" (ctt.name = :tagName2 and ctt.value = :tagValue2) " +
" )" +
" group by ctc.id " +
" having count(*) = 2" +
") ", Cluster.class)
.setParameter("tagName1", "Spark")
.setParameter("tagValue1", "2.2")
.setParameter("tagName2", "Hadoop")
.setParameter("tagValue2", "2.7")
.getResultList();
Although explicit joins are usually preferred when writing JPQL queries, this time, it looks like Hibernate issues a useless JOIN between cluster_tag and cluster in the inner query:
SELECT c.id AS id1_0_,
c.NAME AS name2_0_
FROM cluster c
WHERE EXISTS (
SELECT ctc.id
FROM cluster_tag ct
INNER JOIN cluster ctc ON ct.cluster_id = ctc.id
INNER JOIN tag ctt ON ct.tag_id = ctt.id
WHERE c.id = ctc.id AND (
ctt.tag_name = ? AND ctt.tag_value = ? OR
ctt.tag_name = ? AND ctt.tag_value = ?
)
GROUP BY ctc.id
HAVING COUNT(*) = 2
)
Notice the INNER JOIN cluster ctc ON ct.cluster_id = ctc.id redundant join which we’d like to avoid.
JPQL – SEMI-JOIN solution with implicit association joining
Rewriting the previous query to use an implicit join for ClusterTag.cluster can be done as follows:
List<Cluster> clusters = entityManager
.createQuery(
"select c " +
"from Cluster c " +
"where exists (" +
" select ct.cluster.id " +
" from ClusterTag ct " +
" join ct.tag ctt " +
" where " +
" c.id = ct.cluster.id and ( " +
" (ctt.name = :tagName1 and ctt.value = :tagValue1) or " +
" (ctt.name = :tagName2 and ctt.value = :tagValue2) " +
" )" +
" group by ct.cluster.id " +
" having count(*) = 2" +
") ", Cluster.class)
.setParameter("tagName1", "Spark")
.setParameter("tagValue1", "2.2")
.setParameter("tagName2", "Hadoop")
.setParameter("tagValue2", "2.7")
.getResultList();
When running the JPQL query above, Hibernate generates the following SQL query:
SELECT c.id AS id1_0_,
c.NAME AS name2_0_
FROM cluster c
WHERE EXISTS (
SELECT ct.cluster_id
FROM cluster_tag ct
INNER JOIN tag ctt ON ct.tag_id = ctt.id
WHERE
c.id = ct.cluster_id AND (
ctt.tag_name = ? AND ctt.tag_value = ? OR
ctt.tag_name = ? AND ctt.tag_value = ?
)
GROUP BY ct.cluster_id
HAVING COUNT(*) = 2
)
Great!
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
When it comes to fetching data, it’s best to envision the SQL query first, and only afterward, use a JPQL or Criteria API query to fetch entities that we plan on modifying.







