Nested Loop Join Algorithm

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see how the Nested Loop Join Algorithm works and when it’s suitable for a relational database system to employ it in order to execute an SQL JOIN query.

Data Sets

Let’s consider we have two relations, Post and PostComment, that look as follows:

Nested Loop Join Relations

The two relations form a one-to-many relationship since the postId attribute in the PostComment relation references the id attribute in the parent Post relation:

The parent Post relation has 1000 records that look as follows:

| id   | title         |
|------|---------------|
| 1    | Post no. 1    |
| 2    | Post no. 2    |
| ..   | ..            |
| 999  | Post no. 999  |
| 1000 | Post no. 1000 |

And, the child PostComment relation has 10000 rows that are associated to the 1000 Post records:

| id    | review            | postId  |
|-------|-------------------|---------|
| 1     | Comment no. 1     | 1       |
| 2     | Comment no. 2     | 1       |
| ..    | ..                | ..      |
| 9999  | Comment no. 9999  | 1000    |
| 10000 | Comment no. 10000 | 1000    |

Now, we are interested in joining the Post and PostComment records by matching the id and postId attributes and building a projection that contains the following attributes:

  • the Post identifier
  • the Post title
  • the PostComment review

So, in our case, this is how the report should look like this:

| post_id | post_title | review         |
|---------|------------|----------------|
| 1       | Post no. 1 | Comment no. 1  |
| 1       | Post no. 1 | Comment no. 2  |
| 1       | Post no. 1 | Comment no. 3  |
| 1       | Post no. 1 | Comment no. 4  |
| 1       | Post no. 1 | Comment no. 5  |
| 1       | Post no. 1 | Comment no. 6  |
| 1       | Post no. 1 | Comment no. 7  |
| 1       | Post no. 1 | Comment no. 8  |
| 1       | Post no. 1 | Comment no. 9  |
| ..      |..          | ..             |
| 2       | Post no. 2 | Comment no. 14 |
| 2       | Post no. 2 | Comment no. 15 |

Nested Loop Join Algorithm

The Nested Loop Join Algorithm is based on two for loops that iterate both relations in search for records that are matching the joining condition:

List<Tuple> tuples = new ArrayList<>();

for (Post post : posts) {
    for (PostComment postComment : postComments) {
        if(post.getId().equals(postComment.getPostId())) {
            tuples.add(
                new Tuple()
                    .add("post_id", postComment.getPostId())
                    .add("post_title", post.getTitle())
                    .add("review", postComment.getReview())
            );
        }
    }
}

While the algorithm is simple to implement, its complexity is quadratic (e.g., O(n²)), and the larger the size of the relations, the more processing will be needed to find all matching records, as illustrated by the following graph:

Nested Loops Join Algorithm Complexity

The Nested Loops algorithm may be used by relational database systems when joining relations that have a very low number of records.

For instance, running this SQL query on PostgreSQL when joining the very same post and post_comment tables:

SELECT
   p.id AS post_id,
   p.title AS post_title,
   pc.review  AS review
FROM post p
INNER JOIN post_comment pc ON pc.post_id = p.id
WHERE p.id BETWEEN 1 AND 10

produces a Nested Loops Join, as illustrated by the underlying execution plan:

Nested Loop  
  (cost=0.56..86.08 rows=100 width=36) 
  (actual time=0.035..0.069 rows=100 loops=1)
  ->  Index Scan using idx_post_id on post p  
        (cost=0.28..8.47 rows=10 width=20) 
        (actual time=0.027..0.029 rows=10 loops=1)
        Index Cond: ((id >= 1) AND (id <= 10))
  ->  Index Scan using idx_post_comment_post_id on post_comment pc  
        (cost=0.29..7.66 rows=10 width=24) 
        (actual time=0.001..0.003 rows=10 loops=10)
        Index Cond: (post_id = p.id)

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The Nested Loops Join algorithm is very simple to understand, and relational database systems can use it when the number of records to be joined is relatively low.

When the joined relations have many entries, then the Nested Loops Join Algorithm is no longer a viable option, and relational database systems will use a Hash Join or Merge Joi algorithm instead.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.