SQL CTE – Common Table Expression
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to see what an SQL CTE (Common Table Expression) is and how you can use it to reuse subquery result sets or inline views.
Database tables
Let’s assume we have the following post
and post_comment
tables that form a one-to-many table relationship:
The parent post
table contains the following entries:
| id | title | |----|----------| | 1 | SQL:2016 | | 2 | SQL:2011 | | 3 | SQL:2008 | | 4 | JPA 3.0 | | 5 | JPA 2.2 | | 6 | JPA 2.1 | | 7 | JPA 2.0 | | 8 | JPA 1.0 |
And the post_comment
child table contains the following data:
| id | review | post_id | |----|------------------------|---------| | 1 | SQL:2016 is great! | 1 | | 2 | SQL:2016 is excellent! | 1 | | 3 | SQL:2016 is awesome! | 1 | | 4 | SQL:2011 is great! | 2 | | 5 | SQL:2011 is excellent! | 2 | | 6 | SQL:2008 is great! | 3 |
SQL CTE (Common Table Expression)
We now want to get the first two most-commented posts that start with the SQL
prefix along with all their associated comments.
We cannot use a simple Top-N SQL query that defines a FETCH FIRST 2 ROWS ONLY
clause because that would bring the first 2 records of the joined post
and post_comment
result set, and that’s not what we want.
In order to fetch the desired result set, we need to:
- JOIN the
post
andpost_comment
and filter by theSQL
prefix - Use a COUNT Window Function to determine how many comments are associated with each post
- Use the
DENSE_RANK
Window Function to rank the posts by their number of comments - Extract just the first two most-commented posts
Since there are multiple steps involved, we could either use Derived Tables (Inline Views) or a CTE (Common Table Expression) query.
The CTE query uses the WITH
clause to define an alias to a given subquery. For this reason, we can reference the subquery in a subsequent WITH
subquery or in the outer query that builds the final result set.
Our CTE query looks as follows:
WITH p_pc AS ( SELECT p.id AS post_id, p.title AS post_title, pc.id AS comment_id, pc.review AS comment_review, COUNT(post_id) OVER(PARTITION BY post_id) AS comment_count FROM post p LEFT JOIN post_comment pc ON p.id = pc.post_id WHERE p.title LIKE 'SQL%' ), p_pc_r AS ( SELECT post_id, post_title, comment_id, comment_review, DENSE_RANK() OVER (ORDER BY p_pc.comment_count DESC) AS ranking FROM p_pc ) SELECT * FROM p_pc_r WHERE p_pc_r.ranking <= 2 ORDER BY post_id, comment_id
The first subquery is p_pc
and defines the JOIN and the WHERE clause that filters the post
records. The p_pc
query projection includes a COUNT
Window Function that provides the number of comments for each post.
The second subquery is p_pc_r
and, as you can see, it references the p_pc
subquery we have previously defined. The goal of the p_pc_r
subquery is to add a new column that ranks the posts by the number of associated comments. For this purpose, we have used the DENSE_RANK
Window Function.
The last query filters the p_pc_r
query result set and extracts just the top 2 most-commented posts along with their associated comments.
The reason why the WHERE clause in the final query cannot be moved to the p_pc_r
query is that the WHERE clause evaluates before Window Functions. It’s the same argument we’ve been using for writing legacy Top-N queries using Oracle.
And the CTE query result set looks like this:
| post_id | post_title | comment_id | comment_review | ranking | |---------|------------|------------|------------------------|---------| | 1 | SQL:2016 | 1 | SQL:2016 is great! | 1 | | 1 | SQL:2016 | 2 | SQL:2016 is excellent! | 1 | | 1 | SQL:2016 | 3 | SQL:2016 is awesome! | 1 | | 2 | SQL:2011 | 4 | SQL:2011 is great! | 2 | | 2 | SQL:2011 | 5 | SQL:2011 is excellent! | 2 |
Awesome, right?
SQL CTE (Common Table Expression) Gotcha
Not all relational database systems are able to push down predicates from the outer query to the WITH queries. For instance, in PostgreSQL, the Common Table Expression queries are optimization fences.
Since a CTE query can be translated to a Derived Table Or Inline View query, it’s a good idea to compare the Execution Plans of the CTE and the Derived Table queries and see which one performs best.
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
The SQL CTE (Common Table Expression) allows us to decompose a complex query into a series of smaller subqueries that can be referenced afterward.
The CTE (Common Table Expression) WITH
clause is supported by all top relational database systems starting from the following versions:
- Oracle 9i R2
- SQL Server 2005
- PostgreSQL 8.4
- MySQL 8.0.1
- MariaDB 10.2.1
