Nested Loop Join Algorithm

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Hypersistence Optimizer is that tool!

Introduction

In this article, we are going to see how the Nested Loop Join Algorithm works and when it’s suitable for a relational database system to employ it in order to execute an SQL JOIN query.

Data Sets

Let’s consider we have two relations, Post and PostComment, that look as follows:

Nested Loop Join Relations

The two relations form a one-to-many relationship since the postId attribute in the PostComment relation references the id attribute in the parent Post relation:

The parent Post relation has 1000 records that look as follows:

| id   | title         |
|------|---------------|
| 1    | Post no. 1    |
| 2    | Post no. 2    |
| ..   | ..            |
| 999  | Post no. 999  |
| 1000 | Post no. 1000 |

And, the child PostComment relation has 10000 rows that are associated to the 1000 Post records:

| id    | review            | postId  |
|-------|-------------------|---------|
| 1     | Comment no. 1     | 1       |
| 2     | Comment no. 2     | 1       |
| ..    | ..                | ..      |
| 9999  | Comment no. 9999  | 1000    |
| 10000 | Comment no. 10000 | 1000    |

Now, we are interested in joining the Post and PostComment records by matching the id and postId attributes and building a projection that contains the following attributes:

  • the Post identifier
  • the Post title
  • the PostComment review

So, in our case, this is how the report should look like this:

| post_id | post_title | review         |
|---------|------------|----------------|
| 1       | Post no. 1 | Comment no. 1  |
| 1       | Post no. 1 | Comment no. 2  |
| 1       | Post no. 1 | Comment no. 3  |
| 1       | Post no. 1 | Comment no. 4  |
| 1       | Post no. 1 | Comment no. 5  |
| 1       | Post no. 1 | Comment no. 6  |
| 1       | Post no. 1 | Comment no. 7  |
| 1       | Post no. 1 | Comment no. 8  |
| 1       | Post no. 1 | Comment no. 9  |
| ..      |..          | ..             |
| 2       | Post no. 2 | Comment no. 14 |
| 2       | Post no. 2 | Comment no. 15 |

Nested Loop Join Algorithm

The Nested Loop Join Algorithm is based on two for loops that iterate both relations in search for records that are matching the joining condition:

List<Tuple> tuples = new ArrayList<>();

for (Post post : posts) {
    for (PostComment postComment : postComments) {
        if(post.getId().equals(postComment.getPostId())) {
            tuples.add(
                new Tuple()
                    .add("post_id", postComment.getPostId())
                    .add("post_title", post.getTitle())
                    .add("review", postComment.getReview())
            );
        }
    }
}

While the algorithm is simple to implement, its complexity is quadratic (e.g., O(n²)), and the larger the size of the relations, the more processing will be needed to find all matching records, as illustrated by the following graph:

Nested Loops Join Algorithm Complexity

The Nested Loops algorithm may be used by relational database systems when joining relations that have a very low number of records.

For instance, running this SQL query on PostgreSQL when joining the very same post and post_comment tables:

SELECT
   p.id AS post_id,
   p.title AS post_title,
   pc.review  AS review
FROM post p
INNER JOIN post_comment pc ON pc.post_id = p.id
ORDER BY pc.id
LIMIT 15

produces a Nested Loops Join, as illustrated by the underlying execution plan:

Limit  
  (cost=0.56..5.92 rows=15 width=1048) 
  (actual time=0.229..0.272 rows=15 loops=1)
  ->  Nested Loop  
        (cost=0.56..3575.06 rows=10000 width=1048) 
        (actual time=0.227..0.269 rows=15 loops=1)
        ->  Index Scan using idx_post_comment_id on post_comment pc  
              (cost=0.29..602.28 rows=10000 width=532) 
              (actual time=0.112..0.120 rows=15 loops=1)
        ->  Index Scan using idx_post_id on post p  
              (cost=0.28..0.30 rows=1 width=524) 
              (actual time=0.009..0.009 rows=1 loops=15)
              Index Cond: (id = pc.post_id)

I'm running an online workshop on the 27th of May about High-Performance SQL.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The Nested Loops Join algorithm is very simple to understand, and relational database systems can use it when the number of records to be joined is relatively low.

When the joined relations have many entries, then the Nested Loops Join Algorithm is no longer a viable option, and relational database systems will use a Hash Join or Merge Joi algorithm instead.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

High-Performance SQL Online Workshop - 27th of May