Why you should avoid EXTRA Lazy Collections with Hibernate

(Last Updated On: April 15, 2019)
Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

In this article, you are going to learn why using EXTRA Lazy Collections with Hibernate is a bad idea since it can lead to N+1 query issues and cause performance problems.

The reason I wanted to write this article is that I keep seeing it mentioned in StackOverflow or the Hibernate forum.

Domain Model

Let’s assume our application uses a parent Post entity which can have multiple PostComment child entities.

Post and PostComment entities

The Post entity is mapped as follows:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    @OneToMany(
        mappedBy = "post", 
        cascade = CascadeType.ALL, 
        orphanRemoval = true
    )
    @LazyCollection(
        LazyCollectionOption.EXTRA
    )
    @OrderColumn(name = "order_id")
    private List<PostComment> comments = new ArrayList<>();

    public Long getId() {
        return id;
    }

    public Post setId(Long id) {
        this.id = id;
        return this;
    }

    public String getTitle() {
        return title;
    }

    public Post setTitle(String title) {
        this.title = title;
        return this;
    }

    public List<PostComment> getComments() {
        return comments;
    }

    public Post addComment(
            PostComment comment) {
        comments.add(comment);
        comment.setPost(this);
        return this;
    }

    public Post removeComment(
            PostComment comment) {
        comments.remove(comment);
        comment.setPost(null);
        return this;
    }
}

The first thing you can notice is that the setters use a Fluent API style.

The second thing to notice is that the bidirectional comments collection uses the @LazyCollection annotation with the EXTRA LazyCollectionOption. The @LazyCollectionOption.EXTRA option is taken into consideration only for indexed List collections, hence we need to use the @OrderColumn annotation.

The third thing to notice is that we have defined the addComment and removeComment methods because we want to make sure that both sides of the bidirectional association are in sync. For more details about why you should always synchronize both sides of a bidirectional JPA relationship, check out this article.

The PostComment entity is mapped like this:

@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {

    @Id
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    private Post post;

    private String review;

    public Long getId() {
        return id;
    }

    public PostComment setId(Long id) {
        this.id = id;
        return this;
    }

    public Post getPost() {
        return post;
    }

    public PostComment setPost(Post post) {
        this.post = post;
        return this;
    }

    public String getReview() {
        return review;
    }

    public PostComment setReview(String review) {
        this.review = review;
        return this;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) 
            return true;
        if (o == null || getClass() != o.getClass()) 
            return false;
        return id != null && 
               id.equals(((PostComment) o).getId());
    }

    @Override
    public int hashCode() {
        return 31;
    }
}

Just like the Post entity, the PostComment uses a fluent-style API which simplifies the entity instantiation process.

The @ManyToOne association uses the FetchType.LAZY fetch strategy because the default FetchType.EAGER is a very bad idea from a performance perspective.

Notice that the hashCode uses a constant value and the equals implementation considers the entity identifier only if it’s not null. The reason why the hashCode and equals methods are implemented like this is that, otherwise, the equality would not be consistent across all entity state transitions. For more details about using the entity identifier for equality, check out this article.

Now, when persisting one Post entity with three associated PostComment child entities:

entityManager.persist(
    new Post()
    .setId(1L)
    .setTitle(
        "High-Performance Java Persistence"
    )
    .addComment(
        new PostComment()
        .setId(1L)
        .setReview(
            "Excellent book to understand Java persistence
        ")
    )
    .addComment(
        new PostComment()
        .setId(2L)
        .setReview(
            "The best JPA ORM book out there"
        )
    )
    .addComment(
        new PostComment()
        .setId(3L)
        .setReview(
            "Must-read for Java developers"
        )
    )
);

Hibernate executes the following SQL INSERT and UPDATE statements:

INSERT INTO post (
    title, 
    id
) 
VALUES (
    'High-Performance Java Persistence', 
    1
)

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'Excellent book to understand Java persistence', 
    1
)

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'The best JPA ORM book out there', 
    2
)

INSERT INTO post_comment (
    post_id, 
    review, 
    id
) 
VALUES (
    1, 
    'Must-read for Java developers', 
    3
)

UPDATE post_comment 
SET 
    order_id = 0 
WHERE 
    id = 1
    
UPDATE post_comment 
SET 
    order_id = 1 
WHERE 
    id = 2

UPDATE post_comment 
SET 
    order_id = 2 
WHERE 
    id = 3

The UPDATE statements are executed in order to set the List entry index. The reason why the UPDATE is executed separately is that the INSERT action is executed first and the Collection-based actions are executed at a later flush stage. For more details about the flush operation order, check out this article.

Iterating the EXTRA @LazyCollection using a for-each loop

Assuming we have a Post entity associated with the current running Persistence Context if we want to access its PostComment child entities using a for-each loop, as illustrated by the following code snippet:

for (PostComment comment: post.getComments()) {
    LOGGER.info("{} book review: {}",
        post.getTitle(),
        comment.getReview()
    );
}

Hibernate is going to execute one SELECT statement:

SELECT 
    pc.post_id as post_id3_1_0_, 
    pc.id as id1_1_0_, 
    pc.order_id as order_id4_0_,
    pc.review as review2_1_1_ 
FROM 
    post_comment pc 
WHERE 
    pc.post_id = 1

-- High-Performance Java Persistence book review: 
Excellent book to understand Java persistence
-- High-Performance Java Persistence book review: 
The best JPA ORM book out there
-- High-Performance Java Persistence book review: 
Must-read for Java developers

Iterating the EXTRA @LazyCollection using a for loop

However, if we iterate the PostComment collection using a for loop:

int commentCount = post.getComments().size();

for(int i = 0; i < commentCount; i++ ) {
    PostComment comment = post.getComments().get(i);
    
    LOGGER.info("{} book review: {}",
        post.getTitle(),
        comment.getReview()
    );
}

Hibernate will generate 4 SELECT queries:

SELECT 
    MAX(order_id) + 1 
FROM 
    post_comment 
WHERE 
    post_id = 1

SELECT 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_ 
FROM 
    post_comment pc 
WHERE 
    pc.post_id = 1 AND 
    pc.order_id = 0

-- High-Performance Java Persistence book review: 
Excellent book to understand Java persistence

SELECT 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_ 
FROM 
    post_comment pc 
WHERE 
    pc.post_id = 1 AND 
    pc.order_id = 1

-- High-Performance Java Persistence book review: 
The best JPA ORM book out there

SELECT 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_ 
FROM 
    post_comment pc 
WHERE 
    pc.post_id = 1 AND 
    pc.order_id = 2
    
-- High-Performance Java Persistence book review: 
Must-read for Java developers

The first SELECT query is for the collection size while the remaining SELECT queries are going to fetch each individual List entry.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Accessing a List that uses both @OrderColumn and the EXTRA @LazyCollection by the entry position can lead to N+1 query issues, which, in turn, can cause performance problems.

Therefore, it is better to avoid ordered List collections altogether because the entry order is set using secondary UPDATE statements. And, using the default FetchType.LAZY collection fetching strategy is sufficient as you don’t need the EXTRA lazy feature.

If your collection is too big and you consider that using EXTRA lazy fetching, then you are better off replacing the collection with a JPQL query which can use pagination. For more details about the best way to use a @OneToMany association, check out this article.

Download free ebook sample

Newsletter logo
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence,
  • a 10% discount coupon for my book.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Want to run your data access layer at warp speed?