The best way to map a many-to-many association with extra columns when using JPA and Hibernate

(Last Updated On: December 5, 2018)

Introduction

For a simple many-to-many database relationship, you can use the @ManyToMany JPA annotation and, therefore, hide the join table.

However, sometimes you need more than the two Foreign Key columns in the join table, and, for this purpose, you need to replace the @ManyToMany association with two bidirectional @OneToMany associations. Unlike unidirectional @OneToMany, the bidirectional relationship is the best way to map a one-to-many database relationship that requires a collection of Child elements on the parent side

In this article, we are going to see how you can map a many-to-many database relationship using an intermediary entity for the join table. This way, we can map additional columns that would be otherwise impossible to persist using the @ManyToMany JPA annotation.

Domain Model

Assuming we have the following database tables:

The first thing we need is to map the composite Primary Key which belongs to the intermediary join table. As explained in this article, we need an @Embeddable type to hold the composite entity identifier:

@Embeddable
public class PostTagId
    implements Serializable {

    @Column(name = "post_id")
    private Long postId;

    @Column(name = "tag_id")
    private Long tagId;

    private PostTagId() {}

    public PostTagId(
        Long postId, 
        Long tagId) {
        this.postId = postId;
        this.tagId = tagId;
    }

    //Getters omitted for brevity

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (o == null || getClass() != o.getClass()) 
            return false;

        PostTagId that = (PostTagId) o;
        return Objects.equals(postId, that.postId) && 
               Objects.equals(tagId, that.tagId);
    }

    @Override
    public int hashCode() {
        return Objects.hash(postId, tagId);
    }
}

There are two very important aspects to take into consideration when mapping an @Embeddable composite identifier:

  1. You need the @Embeddable type to be Serializable
  2. The @Embeddable type must override the default equals and hashCode methods based on the two Primary Key identifier values.

Next, we need to map the join table using a dedicated entity:

@Entity(name = "PostTag")
@Table(name = "post_tag")
public class PostTag {

    @EmbeddedId
    private PostTagId id;

    @ManyToOne(fetch = FetchType.LAZY)
    @MapsId("postId")
    private Post post;

    @ManyToOne(fetch = FetchType.LAZY)
    @MapsId("tagId")
    private Tag tag;

    @Column(name = "created_on")
    private Date createdOn = new Date();

    private PostTag() {}

    public PostTag(Post post, Tag tag) {
        this.post = post;
        this.tag = tag;
        this.id = new PostTagId(post.getId(), tag.getId());
    }

    //Getters and setters omitted for brevity

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (o == null || getClass() != o.getClass())
            return false;

        PostTag that = (PostTag) o;
        return Objects.equals(post, that.post) &&
               Objects.equals(tag, that.tag);
    }

    @Override
    public int hashCode() {
        return Objects.hash(post, tag);
    }
}

The Tag entity is going to map the @OneToMany side for the tag attribute in the PostTag join entity:

@Entity(name = "Tag")
@Table(name = "tag")
@NaturalIdCache
@Cache(
    usage = CacheConcurrencyStrategy.READ_WRITE
)
public class Tag {

    @Id
    @GeneratedValue
    private Long id;

    @NaturalId
    private String name;

    @OneToMany(
        mappedBy = "tag",
        cascade = CascadeType.ALL,
        orphanRemoval = true
    )
    private List<PostTag> posts = new ArrayList<>();

    public Tag() {
    }

    public Tag(String name) {
        this.name = name;
    }

    //Getters and setters omitted for brevity

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Tag tag = (Tag) o;
        return Objects.equals(name, tag.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name);
    }
}

The Tag entity is marked with the following Hibernate-specific annotations:

  1. The @NaturalId annotation allows us to fetch the Tag entity by its business key.
  2. The @Cache annotation marks the cache concurrency strategy.
  3. The @NaturalIdCache tells Hibernate to cache the entity identifier associated with a given business key.

For more details about the @NaturalId and @NaturalIdCache annotations, check out this article.

With these annotations in place, we can fetch the Tag entity without needing to hit the database.

And the Post entity is going to map the @OneToMany side for the post attribute in the PostTag join entity:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @OneToMany(
        mappedBy = "post", 
        cascade = CascadeType.ALL, 
        orphanRemoval = true
    )
    private List<PostTag> tags = new ArrayList<>();

    public Post() {
    }

    public Post(String title) {
        this.title = title;
    }

    //Getters and setters omitted for brevity

    public void addTag(Tag tag) {
        PostTag postTag = new PostTag(this, tag);
        tags.add(postTag);
        tag.getPosts().add(postTag);
    }

    public void removeTag(Tag tag) {
        for (Iterator<PostTag> iterator = tags.iterator(); 
             iterator.hasNext(); ) {
            PostTag postTag = iterator.next();

            if (postTag.getPost().equals(this) &&
                    postTag.getTag().equals(tag)) {
                iterator.remove();
                postTag.getTag().getPosts().remove(postTag);
                postTag.setPost(null);
                postTag.setTag(null);
            }
        }
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (o == null || getClass() != o.getClass()) 
            return false;

        Post post = (Post) o;
        return Objects.equals(title, post.title);
    }

    @Override
    public int hashCode() {
        return Objects.hash(title);
    }
}

Notice that the Post entity features the addTag and removeTag utility methods which are needed by every bidirectional association so that all sides of the association stay in sync.

While we could have added the same add/remove methods to the Tag entity, it’s unlikely that these associations will be set from the Tag entity because the users operate with Post entities.

To better visualize the entity relationships, check out the following diagram:

Testing time

First, let’s persist some Tag entities which we’ll later associate to a Post:

Tag misc = new Tag("Misc");
Tag jdbc = new Tag("JDBC");
Tag hibernate = new Tag("Hibernate");
Tag jooq = new Tag("jOOQ");

doInJPA(entityManager -> {
    entityManager.persist( misc );
    entityManager.persist( jdbc );
    entityManager.persist( hibernate );
    entityManager.persist( jooq );
});

Now, when we persist two Post entities:

Session session = entityManager
    .unwrap( Session.class );

Tag misc = session
    .bySimpleNaturalId(Tag.class)
    .load( "Misc" );

Tag jdbc = session
    .bySimpleNaturalId(Tag.class)
    .load( "JDBC" );

Tag hibernate = session
    .bySimpleNaturalId(Tag.class)
    .load( "Hibernate" );

Tag jooq = session
    .bySimpleNaturalId(Tag.class)
    .load( "jOOQ" );

Post hpjp1 = new Post(
    "High-Performance Java Persistence 1st edition"
);
hpjp1.setId(1L);

hpjp1.addTag(jdbc);
hpjp1.addTag(hibernate);
hpjp1.addTag(jooq);
hpjp1.addTag(misc);

entityManager.persist(hpjp1);

Post hpjp2 = new Post(
    "High-Performance Java Persistence 2nd edition"
);
hpjp2.setId(2L);

hpjp2.addTag(jdbc);
hpjp2.addTag(hibernate);
hpjp2.addTag(jooq);

entityManager.persist(hpjp2);

Hibernate generates the following SQL statements:

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence 1st edition', 1)

INSERT INTO post_tag (created_on, post_id, tag_id) 
VALUES ('2017-07-26 13:14:08.988', 1, 2)

INSERT INTO post_tag (created_on, post_id, tag_id) 
VALUES ('2017-07-26 13:14:08.989', 1, 3)

INSERT INTO post_tag (created_on, post_id, tag_id) 
VALUES ('2017-07-26 13:14:08.99', 1, 4)

INSERT INTO post_tag (created_on, post_id, tag_id) 
VALUES ('2017-07-26 13:14:08.99', 1, 1)

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence 2nd edition', 2)

INSERT INTO post_tag (created_on, post_id, tag_id) 
VALUES ('2017-07-26 13:14:08.992', 2, 3)

INSERT INTO post_tag (created_on, post_id, tag_id) 
VALUES ('2017-07-26 13:14:08.992', 2, 4)

INSERT INTO post_tag (created_on, post_id, tag_id) 
VALUES ('2017-07-26 13:14:08.992', 2, 2)

Now, since the Misc Tag entity was added by mistake, we can remove it as follows:

Tag misc = entityManager.unwrap( Session.class )
    .bySimpleNaturalId(Tag.class)
    .load( "Misc" );

Post post = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "join fetch p.tags pt " +
    "join fetch pt.tag " +
    "where p.id = :postId", Post.class)
.setParameter( "postId", 1L )
.getSingleResult();

post.removeTag( misc );

Hibernate generating the following SQL statements:

SELECT p.id AS id1_0_0_,
       p_t.created_on AS created_1_1_1_,
       p_t.post_id AS post_id2_1_1_,
       p_t.tag_id AS tag_id3_1_1_,
       t.id AS id1_2_2_,
       p.title AS title2_0_0_,
       p_t.post_id AS post_id2_1_0__,
       p_t.created_on AS created_1_1_0__,
       p_t.tag_id AS tag_id3_1_0__,
       t.name AS name2_2_2_
FROM   post p
INNER JOIN 
       post_tag p_t ON p.id = p_t.post_id
INNER JOIN 
       tag t ON p_t.tag_id = t.id
WHERE  p.id = 1

SELECT p_t.tag_id AS tag_id3_1_0_,
       p_t.created_on AS created_1_1_0_,
       p_t.post_id AS post_id2_1_0_,
       p_t.created_on AS created_1_1_1_,
       p_t.post_id AS post_id2_1_1_,
       p_t.tag_id AS tag_id3_1_1_
FROM   post_tag p_t
WHERE  p_t.tag_id = 1

DELETE 
FROM   post_tag 
WHERE  post_id = 1 AND tag_id = 1

The second SELECT query is needed by this line in the removeTag utility method:

postTag.getTag().getPosts().remove(postTag);

However, if you don’t need to navigate all Post entities associated to a Tag, you can remove the posts collection from the Tag entity and this secondary SELECT statement will not be executed anymore.

Using a single-side bidirectional association

The Tag entity will not map the PostTag @OneToMany bidirectional association anymore.

@Entity(name = "Tag")
@Table(name = "tag")
@NaturalIdCache
@Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Tag {

    @Id
    @GeneratedValue
    private Long id;

    @NaturalId
    private String name;

    public Tag() {
    }

    public Tag(String name) {
        this.name = name;
    }

    //Getters omitted for brevity

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (o == null || getClass() != o.getClass()) 
            return false;

        Tag tag = (Tag) o;
        return Objects.equals(name, tag.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name);
    }
}

The PostTag entity and its PostTagId @Embeddable are identical with the previous example.

However, the Post entity addTag and removeTag are simplified as follows:

public void addTag(Tag tag) {
    PostTag postTag = new PostTag(this, tag);
    tags.add(postTag);
}

public void removeTag(Tag tag) {
    for (Iterator<PostTag> iterator = tags.iterator(); 
         iterator.hasNext(); ) {
        PostTag postTag = iterator.next();

        if (postTag.getPost().equals(this) &&
                postTag.getTag().equals(tag)) {
            iterator.remove();
            postTag.setPost(null);
            postTag.setTag(null);
        }
    }
}

The rest of the Post entity is the same as with the previous example as seen in the following diagram:

Inserting the PostTag entities is going to render the same SQL statements as seen before.

But when removing the PostTag entity, Hibernate is going to execute a single SELECT query as well as a single DELETE statement:

SELECT p.id AS id1_0_0_,
       p_t.created_on AS created_1_1_1_,
       p_t.post_id AS post_id2_1_1_,
       p_t.tag_id AS tag_id3_1_1_,
       t.id AS id1_2_2_,
       p.title AS title2_0_0_,
       p_t.post_id AS post_id2_1_0__,
       p_t.created_on AS created_1_1_0__,
       p_t.tag_id AS tag_id3_1_0__,
       t.name AS name2_2_2_
FROM   post p
INNER JOIN 
       post_tag p_t ON p.id = p_t.post_id
INNER JOIN 
       tag t ON p_t.tag_id = t.id
WHERE  p.id = 1

DELETE 
FROM   post_tag 
WHERE  post_id = 1 AND tag_id = 1

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While mapping the many-to-many database relationship using the @ManyToMany annotation is undoubtedly simpler, when you need to persist extra columns in the join table, you need to map the join table as a dedicated entity.

Although a little bit more work, the association works just as its @ManyToMany counterpart, and this time we can List collections without worrying about SQL statement performance issues.

When mapping the intermediary join table, it’s better to map only one side as a bidirectional @OneToMany association since otherwise a second SELECT statement will be issued while removing the intermediary join entity.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

40 thoughts on “The best way to map a many-to-many association with extra columns when using JPA and Hibernate

  1. Excellent article.

    I have only one question. Is there a way to avoid having to manually set the IDs for Post entities before persisting them? I.e., is it possible to use a @GeneratedValue for Post IDs and have the association table use that value?

    Many thanks,
    George

    1. Yes. You can use generated identifers for parent entities: Post or Tag. Only the PostTag child entity requires a composite identifer which cannot be auto-generated since the ids are taken from the Post and Tag entities.

      1. Thank you, it works.

        However, I ran into another problem.

        Following your example and creating a single-side bidirectional association, I am unable to run integration tests on the Post entity if I try to add any tags to it. Hibernate won’t even persist the Posts containing tags, complaining about detached entities, throwing a PersistentObjectException:

        org.springframework.dao.InvalidDataAccessApiUsageException: detached entity passed to persist: com.api.domain.tag.Tag; nested exception is org.hibernate.PersistentObjectException: detached entity passed to persist: com.api.domain.tag.Tag

        or if I try to make a post request (with a Post object containing one or more Tags) it throws the following exception:

        org.springframework.http.converter.HttpMessageNotReadableException: JSON parse error: null; nested exception is com.fasterxml.jackson.databind.JsonMappingException: N/A at [Source: (org.apache.catalina.connector.CoyoteInputStream); line: 8, column: 5] (through reference chain: com.api.domain.post.Post["tags"])
        ...
        Caused by: java.lang.NullPointerException: null
        at com.api.domain.post.Post.addTag(Post.java:86)

        Any ideas why this happens and how to solve it?

      2. All the test cases in this article are on GitHub and they run like a charm. Try to compare my test cases to yours and see where they differ.

      3. The only differences in the relevant tests are:

        1) You are assigning IDs to Post entities manually through the setter method while I’m using @GeneratedValue(s).

        2) You are persisting Post entities by calling EntityManager’s persist() method, while I’m doing the same by calling the save() method of my @RepositoryRestResource interface which extends CrudRepository.

        The models are pretty much identical.

      4. The save method should call persist if the id is null. You can change my examples to use @GeneratedValue and they will still work.

        You also need to do a comparison debug to see why your example is not working.

  2. Hi,

    thanks for a helpful post; I’ve now (finally) got my program working as intended. One question though:

    For the class PostTagId you define it as public static,

    public static class PostTagId

    The static part didn’t work with my compiler and when I looked more it seems like a static class only can be nested inside another class, i.e. not top level.

    Should the PostTagId be inside something else, like PostTag? Or have I misunderstood?

    1. The classes are static in my unit tests as they confined to the test class. You don’t need to use that in your application.

  3. Hey Vlad,

    Could you please explain if this self reference in the PostTag mapping entity is not causing issues? I tried serializing the Post class directly and what you get when you see the JSON for tags is a long descending nest of Post -> Tag -> Post -> Tag, etc.

    Does JPA handle this in the background and not recognize this or will this potentially cause serious memory/performance issues? Is this resolved by the FetchType.lazy? What if I just want to call a FindAll() and get back all the Tags? Do I need to make a second query somehow?

    public void addTag(Tag tag) {
    PostTag postTag = new PostTag(this, tag);
    tags.add(postTag);
    tag.getPosts().add(postTag);
    }

    1. JPA is about persisting data into the database. It has nothing to do with JSON which is handled by Jackson. You need the @JsonIgnore annotation on one side of a bidirectional association, and that applies to any hierarchical model, not just to JPA entities.

  4. First of all,
    I’d like to thank you for your blog. There is a plenty of great material about persistence.

    @vladmihalcea, I followed the tutorial described above to implement a many-to-many with extra attribute (in my case a list of integers). Nevertheless, it is not working as expected. The code succesfully saves the data into the database since I can check the data directly on the DB console. But I’m unable to load a domain class from the other endpoint.
    On the other hand, if I create a JPARepository to access the intermediary entity I’m able to retrieve the data. Do you have any tip about how to fix it?

    Thanks in advance!

    Domain classes
    Vacina class => https://pastebin.com/KJgSkgiN (equivalent to Post class)
    CalendarioVacinal class => https://pastebin.com/6FjBWJW0 (equivalente to Tag class)

    Intermediary entities
    VacinaCalendarioVacinal class => https://pastebin.com/hJvALVt0 (equivalent to PostTag class)
    VacinaCalendarioVacinalId class => https://pastebin.com/MxpmbcTr (equivalent to PostTagId class)

    JPA repositories
    VacinaRepository class => https://pastebin.com/csHHzuNf
    CalendarioVacinalRepository class => https://pastebin.com/qB5i6Kiu
    VacinaCalendarioVacinalRepository class => https://pastebin.com/SA5fgPDu

    Test class
    JUnit Test class => https://pastebin.com/GCBSQhpg

  5. Hi, What about doing this with Kotlin? Is kind of nightmare handling inmutability and secondary constructors and so on… Thanks!

  6. Is it possible when saving a Post, to use cascade to also persist new Tags that were added to the Post? As I see it, this solution works only when both Post and Tags have already been persisted (because only then you know their ids to form the @EmbeddedId).

  7. Hi,

    we have some issues working with this kind of table configuration. We are using Hibernate version 5.3.6.

    We are not able to load data into the mapped class even if the query generated by hibernate looks ok.
    We use session.get(class, id)

    In order to have the query running properly we had to add the @JoinColumn annotation in @MapsId as by default hibernate add an _ID at the end of the map class.
    Note that we have a maximum of 4 rows per table.

    We have a jvm error when trying to access collection object. We also have these Hibernat error.

    10-10-2018 18:37:00.517] [Test worker] WARN org.hibernate.engine.loading.internal.LoadContexts – HHH000100: Fail-safe cleanup (collections) : org.hibernate.engine.loading.internal.CollectionLoadContext@6f4e6518<rs=com.mchange.v2.c3p0.impl.NewProxyResultSet@6178010c [wrapping: null]>
    [10-10-2018 18:37:00.526] [Test worker] WARN org.hibernate.engine.loading.internal.CollectionLoadContext – HHH000160: On CollectionLoadContext#cleanup, localLoadingCollectionKeys contained [1] entries
    [10-10-2018 18:37:00.527] [Test worker] WARN org.hibernate.engine.loading.internal.LoadContexts – HHH000100: Fail-safe cleanup (collections) : org.hibernate.engine.loading.internal.CollectionLoadContext@3ba2288a<rs=com.mchange.v2.c3p0.impl.NewProxyResultSet@191f363d [wrapping: null]>

    Thanks in advance.

      1. I purchased your book and read through the Relationship chapter completely. I reconfigured my code to match your example in 10.5.3. It works fine like that, but if I make one a OneToOne relationship, I get an error of “Referenced property not a (One|Many)ToOne: domain.workorders.WorkOrderInspection.inspection in mappedBy of domain.inspections.Inspection.workOrderInspection”. So I still can’t find an answer to whether a OneToOne can be used. Any thoughts? Thanks!

      2. Send me a Pull Request to the high-performance-java-persistence GitHub repository where you replicate this issue so I can better investigate it.

  8. Hi @Vlad,

    Thanks for sharing this example. I have followed exact same example, but my hql with join was failing when I added JoinColumn on PostTag like given below, it was working then.

    @JoinColumn(name = “post_id”, referencedColumnName = “id”)
    @JoinColumn(name = “tag_id”, referencedColumnName = “id”)

    If you can update the article accordingly, it might be helpful to someone. Thank you again for sharing this.

      1. Dear Vlad,
        Thanks for these great resources, but I have to agree with Vishal’s suggestion given above. I was constantly getting EntityManagerFactory exception, until I introduced these two annotations to the joined table. It is probably caused by some versioning issues (or possibly naming Hibernate’s conventions conflict). If you want, I can provide you with current versions of my stack.

  9. Hello Vlad,

    thanks for posting, your posts are very helpful. I’m reading your posts often.

    I’m buiding a spring-boot aplication, and in dev aspect I set Hibernete to autocreate database tables and it works fine.
    I used your post to map relation between company and it’s socialMediaAccount and indeed it works fine, except that when Hibernate creates midle-table it creates columns: company_id, sm_id, _path (thats OK) and additional two columns: company_company_id and social_media_acount_id.
    I tested and realized that first two FKs are from composited key class (CompanySocialMedia in my case) and last two FKs are actualy PKs from Company and SocialMedia classes.
    I googled for solution, but I can’t understand how to get rid last two columns in table.
    Do you have some advice, please.

    Thank you very much!

  10. Hi Vlad, great article.
    Suppose the extra column be another entity. This new association could be an unidirectional ManyToOne?

    Thanks a lot.

  11. Hi, thank you for the post. I just didn’t understand why the date is the primary key in the join table. Wouldn’t it be better to just have an extra id attribute? Or it doesn’t matter?

  12. Hello Vlad,
    Nice post! I’m using JPA Criteria API, and I’m trying to apply the above in my implementation with no luck. Specifically, I try to implement the join fetch using an EntityGraph, and get the error :
    “org.hibernate.QueryException: query specified join fetching, but the owner of the fetched association was not present in the select list [FromElement{explicit,not a collection join,fetch join,…”
    Could you provide an example of how your case could be implemented using JPA Criteria?

    1. The problem you have is because you issued a JOIN FETCH while the select clause does not contain the root entity, but something else, like a projection.

  13. Hi Vlad:

    Is there a better way to reference a subId of an EmbbededId in JPQL? I mean:

    select pt
    from post_tag pt
    where pt.id.post_id = …

    vs

    select pt
    from post_tag pt
    where pt.post.id = …

    or even

    select pt
    from post_tag pt
    where pt.post = …

    Is there any performance implications when choosing one or another? Does it make any difference?

    Thank you

    1. They all render the same SQL, so probably there’s no difference. Better benchmark it to be sure.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.