SQL CTE – Common Table Expression

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see what an SQL CTE (Common Table Expression) is and how you can use it to reuse subquery result sets or inline views.

Database tables

Let’s assume we have the following post and post_comment tables that form a one-to-many table relationship:

SQL CTE (Common Table Expression) Tables

The parent post table contains the following entries:

| id | title    |
|----|----------|
| 1  | SQL:2016 |
| 2  | SQL:2011 |
| 3  | SQL:2008 |
| 4  | JPA 3.0  |
| 5  | JPA 2.2  |
| 6  | JPA 2.1  |
| 7  | JPA 2.0  |
| 8  | JPA 1.0  |

And the post_comment child table contains the following data:

| id | review                 | post_id |
|----|------------------------|---------|
| 1  | SQL:2016 is great!     | 1       |
| 2  | SQL:2016 is excellent! | 1       |
| 3  | SQL:2016 is awesome!   | 1       |
| 4  | SQL:2011 is great!     | 2       |
| 5  | SQL:2011 is excellent! | 2       |
| 6  | SQL:2008 is great!     | 3       |

SQL CTE (Common Table Expression)

We now want to get the first two most-commented posts that start with the SQL prefix along with all their associated comments.

We cannot use a simple Top-N SQL query that defines a FETCH FIRST 2 ROWS ONLY clause because that would bring the first 2 records of the joined post and post_comment result set, and that’s not what we want.

In order to fetch the desired result set, we need to:

  • JOIN the post and post_comment and filter by the SQL prefix
  • Use a COUNT Window Function to determine how many comments are associated with each post
  • Use the DENSE_RANK Window Function to rank the posts by their number of comments
  • Extract just the first two most-commented posts

Since there are multiple steps involved, we could either use Derived Tables (Inline Views) or a CTE (Common Table Expression) query.

The CTE query uses the WITH clause to define an alias to a given subquery. For this reason, we can reference the subquery in a subsequent WITH subquery or in the outer query that builds the final result set.

Our CTE query looks as follows:

WITH 
p_pc AS (
    SELECT
        p.id AS post_id,
        p.title AS post_title,
        pc.id AS comment_id,
        pc.review AS comment_review,
        COUNT(post_id) OVER(PARTITION BY post_id) AS comment_count
    FROM post p
    LEFT JOIN post_comment pc ON p.id = pc.post_id
    WHERE p.title LIKE 'SQL%'
),
p_pc_r AS (
    SELECT
        post_id,
        post_title,
        comment_id,
        comment_review,
        DENSE_RANK() OVER (ORDER BY p_pc.comment_count DESC) AS ranking
    FROM p_pc
)
SELECT *
FROM p_pc_r
WHERE p_pc_r.ranking <= 2
ORDER BY post_id, comment_id

The first subquery is p_pc and defines the JOIN and the WHERE clause that filters the post records. The p_pc query projection includes a COUNT Window Function that provides the number of comments for each post.

The second subquery is p_pc_r and, as you can see, it references the p_pc subquery we have previously defined. The goal of the p_pc_r subquery is to add a new column that ranks the posts by the number of associated comments. For this purpose, we have used the DENSE_RANK Window Function.

The last query filters the p_pc_r query result set and extracts just the top 2 most-commented posts along with their associated comments.

The reason why the WHERE clause in the final query cannot be moved to the p_pc_r query is that the WHERE clause evaluates before Window Functions. It’s the same argument we’ve been using for writing legacy Top-N queries using Oracle.

And the CTE query result set looks like this:

| post_id | post_title | comment_id | comment_review         | ranking |
|---------|------------|------------|------------------------|---------|
| 1       | SQL:2016   | 1          | SQL:2016 is great!     | 1       |
| 1       | SQL:2016   | 2          | SQL:2016 is excellent! | 1       |
| 1       | SQL:2016   | 3          | SQL:2016 is awesome!   | 1       |
| 2       | SQL:2011   | 4          | SQL:2011 is great!     | 2       |
| 2       | SQL:2011   | 5          | SQL:2011 is excellent! | 2       |

Awesome, right?

SQL CTE (Common Table Expression) Gotcha

Not all relational database systems are able to push down predicates from the outer query to the WITH queries. For instance, in PostgreSQL, the Common Table Expression queries are optimization fences.

Since a CTE query can be translated to a Derived Table Or Inline View query, it’s a good idea to compare the Execution Plans of the CTE and the Derived Table queries and see which one performs best.

I'm running an online workshop on the 11th of October about High-Performance SQL.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The SQL CTE (Common Table Expression) allows us to decompose a complex query into a series of smaller subqueries that can be referenced afterward.

The CTE (Common Table Expression) WITH clause is supported by all top relational database systems starting from the following versions:

  • Oracle 9i R2
  • SQL Server 2005
  • PostgreSQL 8.4
  • MySQL 8.0.1
  • MariaDB 10.2.1
Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.