SQL LEFT JOIN – A Beginner’s Guide
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, we are going to see how LEFT JOIN works in SQL, and how we can use it to associate rows belonging to different tables and build compound result sets.
Unlike JOIN or INNER JOIN, LEFT JOIN is an OUTER JOIN. So, LEFT JOIN and LEFT OUTER JOIN are equivalent. Other OUTER JOIN types are RIGHT OUTER JOIN and FULL OUTER JOIN. However, since the SQL OUTER keyword is superfluous, it’s usually omitted.
Database tables
To demonstrate how LEFT JOIN works, we are going to use the following two tables, post
and post_comment
, which form a one-to-many table relationship via the post_id
Foreign Key column in the post_comment
table that references the id
Primary Key column in the post
table:
The post
table has the following 3 rows:
| id | title | |----|-----------| | 1 | Java | | 2 | Hibernate | | 3 | JPA |
and the post_comment
table has the following 3 records:
| id | review | post_id | |----|-----------|---------| | 1 | Good | 1 | | 2 | Excellent | 1 | | 3 | Awesome | 2 |
So, the first record in the post
table has two associated child records in the post_comment
. The second record in the post
table has one child record in the post_comment
while the third record in the post
table has no associated child.
SQL LEFT JOIN
In SQL, the JOIN clause allows us to associate rows that belong to different tables. For example, CROSS JOIN creates a Cartesian Product that contains all possible combinations of rows between the two joining tables.
Even if CROSS JOIN is useful in certain situations, we usually want to associate tables based on a specific condition. Just like JOIN or INNER JOIN, LEFT JOIN provides an ON clause that defines how the two tables are to be associated.
For instance, if we execute the following SQL LEFT JOIN query:
SELECT p.id AS post_id, p.title AS post_title, pc.review AS review FROM post p LEFT JOIN post_comment pc ON pc.post_id = p.id ORDER BY p.id, pc.id
We get the following result set:
| post_id | post_title | review | |---------|------------|-----------| | 1 | Java | Good | | 1 | Java | Excellent | | 2 | Hibernate | Awesome | | 3 | JPA | |
The LEFT JOIN clause takes all rows on the left side of the JOIN condition (e.g., post
table in our case) and tries to find any matching record on the right side of the JOIN condition.
The first row of the post
table has two associated post_comment
rows, so the LEFT JOIN clause will combine the first post
record with the two associated child post_comment
rows, as illustrated by the highlighted records in the query result set:
| post_id | post_title | review | |---------|------------|-----------| | 1 | Java | Good | | 1 | Java | Excellent | | 2 | Hibernate | Awesome | | 3 | JPA | |
The second row of the post
table has only one associated post_comment
row, so the LEFT JOIN clause will combine the second post
record with its associated child post_comment
row, as illustrated by the highlighted record in the query result set:
| post_id | post_title | review | |---------|------------|-----------| | 1 | Java | Good | | 1 | Java | Excellent | | 2 | Hibernate | Awesome | | 3 | JPA | |
The third row of the post
table has no associated post_comment
row, so the LEFT JOIN clause will combine the third post
record with a virtual row where all post_comment
columns are NULL
, as illustrated by the highlighted record in the query result set:
| post_id | post_title | review | |---------|------------|-----------| | 1 | Java | Good | | 1 | Java | Excellent | | 2 | Hibernate | Awesome | | 3 | JPA | |
SQL ANTI JOIN
In relational algebra, a SEMI JOIN (⋉) between two relations, L and R, is defined as the set of all tuples in L for which there is a tuple in R that is equal based o the common attributes of the two relations.
And, the ANTI JOIN between two relations, L and R, is defined like this:
L ▷ R = L − (L ⋉ R)
The proper way of implementing a SEMI JOIN is via the EXISTS or IN clauses, while the ANTI JOIN can be implemented using NOT EXISTS or NOT IN.
So, to get all post
rows that have no associated post_comment
record, we can use the following SQL query:
SELECT p.id AS post_id, p.title AS post_title FROM post p WHERE NOT EXISTS ( SELECT 1 FROM post_comment WHERE post_id = p.id ) ORDER BY p.id
which renders the expected result set:
| post_id | post_title | |---------|------------| | 3 | JPA |
For more details about the EXISTS and NOT EXISTS clauses, check out this article.
However, there are many devs who try to emulate the ANTI JOIN operator using LEFT JOIN, like this:
SELECT p.id AS post_id, p.title AS post_title FROM post p LEFT JOIN post_comment pc ON pc.post_id = p.id WHERE pc.id IS NULL ORDER BY p.id
However, even if the two queries generate the same result set, depending on the database engine, the LEFT JOIN alternative might be less efficient than the NOT EXISTS query.
Line INNER JOIN, the OUTER JOIN is useful when the query projection is built out of columns belonging to both tables that are joined. ON the other hand, SEMI JOIN and ANTI JOIN return only the left-side table, so there is no need to combine the records of the left and right-side tables.
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
LEFT JOIN allows us to fetch records from the left side table even if there is no associated record on the right side table that match the join condition.
However, when there is a match between the left and right-side tables, LEFT JOIN allows us to build a compound projection that includes columns from the two joining tables.
