How do Set and List collections behave with JPA and Hibernate

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

Hibernate is a great ORM tool, and it eases development considerably, but it has a lot of gotchas you must be aware of if you want to use it properly.

On medium to large projects, it’s very common to have bidirectional parent-child associations, which allow us to navigate both ends of a given relationship.

When it comes to controlling the persist/merge part of the association, there are two options available. One would be to have the @OneToMany end in charge of synchronizing the collection changes, but this is an inefficient approach.

The most common approach is when the @ManyToOne side controls the association and the @OneToMany end is using the “mappedBy” option.

I will discuss the latter approach since it’s the most common and the most efficient one, in terms of the executed queries number.

Bidirectional associations

So, for bidirectional collections, we could use a java.util.List or a java.util.Set.

According to Hibernate docs, lists and bags are more efficient than sets.

But I am still getting anxious when I see the following code:

@Entity
public class Parent {

    @OneToMany(cascade = CascadeType.ALL, 
        mappedBy = "parent", orphanRemoval = true)
    private List children = new ArrayList()

    public List getChildren() {
        return children;
    }

    public void addChild(Child child) {
        children.add(child);
        child.setParent(this);
    }

    public void removeChild(Child child) {
        children.remove(child);
        child.setParent(null);
    }
}

@Entity
public class Child {

    @ManyToOne
    private Parent parent;

    public Parent getParent() {
        return parent;
    }

    public void setParent(Parent parent) {
        this.parent = parent;
    }
}

Parent parent = loadParent(parentId);
Child child1 = new Child();
child1.setName("child1");
Child child2 = new Child();
child2.setName("child2");
parent.addChild(child1);
parent.addChild(child2);
entityManager.merge(parent);

This is because for the last five years I’ve been getting duplicate children inserted when the merge operation is called on the parent association. This happens because of the following issues HHH-5855.

The HHH-5855 issue was fixed in Hibernate 5.0.8, so another reason to update.

I’ve been testing some Hibernate versions lately and this still replicates on 3.5.6, 3.6.10 and 4.2.6 versions. So, after 5 years of seeing this on many projects, you understand why I’m being skeptical of using Lists vs Sets.

This is what I get when running a test case replicating this issue, so for adding two children we get:

select parent0_.id as id1_2_0_ from Parent parent0_ where parent0_.id=?
insert into Child (id, name, parent_id) values (default, ?, ?)
insert into Child (id, name, parent_id) values (default, ?, ?)
insert into Child (id, name, parent_id) values (default, ?, ?)
insert into Child (id, name, parent_id) values (default, ?, ?)

This issue only replicates if a merge operation is cascaded from parent to children, and there are workarounds like:

  • merging the child instead of the parent
  • persisting the children prior to merging the parent
  • removing the Cascade.ALL or Cascade.MERGE from parent, since it only affects the merge operation and not the persist one.

But all of those are hacks and are very difficult to follow on a large-scale project, with many developers working on the same code base.

So, until you migrate to Hibernate 5.0.8 which fixes HHH-5855, the preferred way is to use Sets.

When it comes to this types of problems, it’s good to have code conventions, as they are easy to add in a project development guideline, and are also easier to remember and be adopted.

One advantage of using Sets is that it forces you to define a proper equals/hashCode strategy (which should always include the entity’s business key. A business key is a field combination that’s unique, or unique among a parent’s children, and that’s consistent even before and after the entity is persisted into the database).

If you are worried you are going to lose the List ability to save the children in the same order you’ve added them, then you can still emulate this for Sets too.

By default, Sets are unordered and unsorted, but even if you can’t order them you may still sort them by a given column, by using the @OrderBy JPA annotation like this:

@Entity
public class LinkedParent {

    @OneToMany(cascade = CascadeType.ALL, 
        mappedBy = "parent", orphanRemoval = true)
    @OrderBy("id")
    private Set children = new LinkedHashSet();

    public Set getChildren() {
        return children;
    }

    public void addChild(LinkedChild child) {
        children.add(child);
        child.setParent(this);
    }

    public void removeChild(LinkedChild child) {
        children.remove(child);
        child.setParent(null);
    }
}

When the parent’s children are loaded, the generated SQL is like:

select
   children0_.parent_id as parent_i3_3_1_,
   children0_.id as id1_2_1_,
   children0_.id as id1_2_0_,
   children0_.name as name2_2_0_,
   children0_.parent_id as parent_i3_2_0_ 
from
   LinkedChild children0_ 
where
   children0_.parent_id=? 
order by
   children0_.id

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

If your domain model requires using a List than a Set will break your constraint, disallowing duplicates. But if you need duplicates you can still use an Indexed List. A Bag is said to be unsorted and “unordered” (even if it retrieves the children in the order they were added to the database table). So an indexed List would be also a good candidate, right?

I also wanted to draw attention to a 5-year bug, affecting multiple Hibernate versions and one that I replicated on multiple projects. There are workarounds of course, like removing the Cascade.Merge or merging the Children vs the Parent, but there are many developers unaware of this issue and its workarounds.

More, Sets are the recommended way to represent basic and embeddable types associations as well sinc ethey perform better than Lists.

Code available on GitHub.

FREE EBOOK

8 Comments on “How do Set and List collections behave with JPA and Hibernate

  1. Hi, thanks for your awesome blog!

    In almost all my code I use this verbose notation:

    @OneToMany(cascade = CascadeType.ALL)
    private Set<Child> children = new LinkedHashSet<>();
    
    public Set<Child> getReadOnlyChildren() {
        return Collections.unmodifiableSet(children);
    }
    
    public void addChild(Child child) {
      Objects.requireNonNull(child, "Persistent collection cannot work with null elements");
      if (child.getId() != null) {
        throw new IllegalArgumentException("Trying to persist an already managed Child");
      }
      child.setParent(this); //package-private setter
      boolean alreadyExists = children.add(child);
      if (alreadyExists) {
        throw new IllegalArgumentException("Child already exists");
      }
    }
    
    /*actually I use a custom utility, so, it looks like that:
    public void addChild(Child child) {
      addDetachedEntity(child).withCode(() -> child.setParent(this)).to(children);
    }*/
    

    I’ve used that approach for about 2 years because of what I learned from your blogs. But I don’t think I remember any of those exceptions being ever thrown, so I have a couple of questions:
    1. Is that approach legit or unnecessary?
    2. Do you think my custom util could be useful to anyone? Not sure what would the right place to contribute. Here it is https://gist.github.com/Sam-Kruglov/cf46c96392b7a2c4cddf5450174b6715

    • I noticed the lack of mappedBy attribute on @OneToMany, which is not a good idea.

      The extra checks for alreadyExists would not be needed if using mappedBy as the collection simply delegates the persistence to the child entity. I suppose this would only be needed for unidirectional bags.

      I’m not sure why do you have that utility. I have managed to get it working with simple add/remove utilities. Do you have some specific use cases where you need that utility?

      • There were a couple of other annotations and I actually accidentally omitted a @JoinColumn(name = "parent_id), so there’s no join table. Sorry for the confusion.

        There are no other use cases, I just use that validation for every persistence collection “just in case” and, as I said, have not seen any of those exceptions I defined ever being thrown.

        Given all that, I think you agree that I should get rid of that utility class and simply do this:

        public void addChild(Child child) {
          child.setParent(this); //package-private setter
          children.add(child);
        }
        

        Hibernate will probably throw something for null or already-managed cases and the Set will take care of any duplicates anyway.

      • Even if you supply @JoinColumn, the unidirectional one-to-many association will be inefficient compared to the bidirectional one. That’s because the INSERT statements for child entities are executed without the FK, only to UPDATE those later during flush.

        I don’t know whether you should get rid of that utility. I don’t know the use cases you had that required you to write it. For me, I managed to get it working with that simple add/remove. However, I don’t exclude the fact that there might be situations where this would not be sufficient. So, if you manage to remember some use cases hat led to writing that advanced add/remove utility, I’d be curious to read more about it.

      • Oh, okay, thanks, I will look into @JoinColumn closer.

        As disappointing as it may sound, there are no special use cases. I created that utility class when I was learning about JPA and the only purpose of it is to add more validations. I remember getting a bunch of “detached entity passed to persist” exceptions, so, perhaps, that might have pushed me to add those validations. I guess, these errors are more about wrong JPA usage rather than catching some runtime exceptions, which makes them somewhat unreachable code when you get the mapping right. I’ll keep the Gist for now.

      • That error is triggered when persist is cascaded to a detached entity.

  2. It is usually a good practice to store only immutable objects in set, because changing an element can break set’s internal structure. Entities are not immutable, so this can be another argument in favor of Lists. Would you agree?

    • Actually, it is only the subset of data that is used for equals and hashCode that’s need to be immutable, not the entire entity. Check out this article for more details.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.