How to implement equals and hashCode using the JPA entity identifier (Primary Key)

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

As previously explained, using the JPA entity business key for equals and hashCode is always best choice. However, not all entities feature a unique business key, so we need to use another database column that is also unique, as the primary key.

But using the entity identifier for equality is very challenging, and this post is going to show you how you can use it without issues.

Test harness

When it comes to implementing equals and hashCode, there is one and only one rule you should have in mind:

Equals and hashCode must behave consistently across all entity state transitions.

To test the effectiveness of an equals and hashCode implementation, the following test can be used:

protected void assertEqualityConsistency(
        Class<T> clazz,
        T entity) {

    Set<T> tuples = new HashSet<>();

    assertFalse(tuples.contains(entity));
    tuples.add(entity);
    assertTrue(tuples.contains(entity));

    doInJPA(entityManager -> {
        entityManager.persist(entity);
        entityManager.flush();
        assertTrue(
            "The entity is not found in the Set after it's persisted.",
            tuples.contains(entity)
        );
    });

    assertTrue(tuples.contains(entity));

    doInJPA(entityManager -> {
        T entityProxy = entityManager.getReference(
            clazz,
            entity.getId()
        );
        assertTrue(
            "The entity proxy is not equal with the entity.",
            entityProxy.equals(entity)
        );
    });

    doInJPA(entityManager -> {
        T entityProxy = entityManager.getReference(
            clazz,
            entity.getId()
        );
        assertTrue(
            "The entity is not equal with the entity proxy.",
            entity.equals(entityProxy));
    });

    doInJPA(entityManager -> {
        T _entity = entityManager.merge(entity);
        assertTrue(
            "The entity is not found in the Set after it's merged.",
            tuples.contains(_entity)
        );
    });

    doInJPA(entityManager -> {
        entityManager.unwrap(Session.class).update(entity);
        assertTrue(
            "The entity is not found in the Set after it's reattached.",
            tuples.contains(entity)
        );
    });

    doInJPA(entityManager -> {
        T _entity = entityManager.find(clazz, entity.getId());
        assertTrue(
            "The entity is not found in the Set after it's loaded in a different Persistence Context.",
            tuples.contains(_entity)
        );
    });

    doInJPA(entityManager -> {
        T _entity = entityManager.getReference(clazz, entity.getId());
        assertTrue(
            "The entity is not found in the Set after it's loaded as a proxy in a different Persistence Context.",
            tuples.contains(_entity)
        );
    });

    T deletedEntity = doInJPA(entityManager -> {
        T _entity = entityManager.getReference(
            clazz,
            entity.getId()
        );
        entityManager.remove(_entity);
        return _entity;
    });

    assertTrue(
        "The entity is not found in the Set even after it's deleted.",
        tuples.contains(deletedEntity)
    );
}

Natural id

The first use case to test is the natural id mapping. Considering the following entity:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    private String isbn;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getIsbn(), book.getIsbn());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getIsbn());
    }

    //Getters and setters omitted for brevity
}

The isbn property is also a @NaturalId, therefore, it should be unique and not nullable. Both equals and hashCode use the isbn property in their implementations.

For more details about the @NaturalId annotation, check out this article.

When running the following test case:

Book book = new Book();
book.setTitle("High-PerformanceJava Persistence");
book.setIsbn("123-456-7890");

assertEqualityConstraints(Book.class, book);

Everything works fine, as expected.

Default java.lang.Object equals and hashCode

What if our entity does not have any column that can be used as a @NaturalId? The first urge is to not define your own implementations of equals and hashCode, like in the following example:

@Entity(name = "Book")
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    //Getters and setters omitted for brevity
}

However, when testing this implementation:

Book book = new Book();
book.setTitle("High-PerformanceJava Persistence");

assertEqualityConstraints(Book.class, book);

Hibernate throws the following exception:

java.lang.AssertionError: The entity is not found after it's merged

The original entity is not equal with the one returned by the merge method because two distinct Object(s) do not share the same reference.

Using the entity identifier for equals and hashCode

So if the default equals and hashCode is no good either, then let’s use the entity identifier for our custom implementation. Let’s just use our IDE to generate the equals and hashCode and see how it works:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getId(), book.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getId());
    }

    //Getters and setters omitted for brevity
}

When running the previous test case, Hibernate throws the following exception:

java.lang.AssertionError: The entity is not found after it's persisted

When the entity was first stored in the Set, the identifier was null. After the entity was persisted, the identifier was assigned to a value that was automatically generated, hence the hashCode differs. For this reason, the entity cannot be found in the Set after it got persisted.

Fixing the entity identifier equals and hashCode

To address the previous issue, there is only one solution: the hashCode should always return the same value:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (!(o instanceof Book))
            return false;

        Book other = (Book) o;

        return id != null && 
               id.equals(other.getId());
    }

    @Override
    public int hashCode() {
        return 31;
    }

    //Getters and setters omitted for brevity
}

Also, when the entity identifier is null, we can guarantee equality only for the same object references. Otherwise, no transient object is equal to any other transient or persisted object. That’s why the identifier equality check is done only if the current Object identifier is not null.

With this implementation, the equals and hashCode test runs fine for all entity state transitions. The reason why it works is that the hashCode value does not change, hence, we can rely on the java.lang.Object reference equality as long as the identifier is null.

Online Workshops

If you enjoyed this article, I bet you are going to love my upcoming Online Workshops!

Conclusion

The entity identifier can be used for equals and hashCode, but only if the hashCode returns the same value all the time. This might sound like a terrible thing to do since it defeats the purpose of using multiple buckets in a HashSet or HashMap.

However, for performance reasons, you should always limit the number of entities that are stored in a collection. You should never fetch thousands of entities in a @OneToMany Set because the performance penalty on the database side is multiple orders of magnitude higher than using a single hashed bucket.

All tests are available on GitHub.

Transactions and Concurrency Control eBook

12 Comments on “How to implement equals and hashCode using the JPA entity identifier (Primary Key)

  1. Hi Vlad,

    I have a question with regard to implementing equals and hashCode in case I use @MappedSuperclass abstract BaseEntity with a generated @Id. Is it enough to implement equals and hashCode for the BaseEntity or should it be overridden by each of the concrete subclasses? Is it OK that different entities will return the sames constant in equals?

    • I never do that because some entities have a business key while others don’t.

  2. Hey Vlad, Thanks for this great article!
    I’m a bit confused on why we shouldn’t use mutable properties in equals() ?
    I mean if we use this :
    public boolean equals(Object o) {
    …some equality checks
    return id != null &&
    id.equals(other.getId()) || name.equals(other.getName()) ;
    }

    public int hashCode() {
    return 31;
    }
    This could fix the “no transient object is equal to any other transient or persisted object”, right?
    In which case this implementation is problematic ?

      • Thanks, your tests still pass with this modification, and we can add a test that check equality between two transient objects with same title.
        (sorry no PR, i’ll take the time to send you one if you think it’s worth).

        Also, the title check should be cleaned up a little bit with a null check first.

        To resume : for objects with no natural key, we could generalize the equals() method to something like this :
        return id != null && id.equals(other.getId()) ||

        What do you think ?

      • I don’t see any reason to add any other property other than the id if you don’t have a natural key. If you have a natural id, then use that one insted.

      • my previous answer has been truncated, i meant :

        return id != null && id.equals(other.getId()) || (equals check on non generated properties)

        one of the reason i can find is when you need the equals() method to work on transient objects.
        let’s say i have a lot of transient Books imported from a third party source of data,and some of them are duplicates (same title, same authors,…) : i may want to deduplicate them before sending them to the persist layer.
        Anyways, thanks for your time and your help.

      • Using equals for multiple purposes is not a good idea. For finding duplicates, you can use findAny with a Lambda function that defines the business equivalent criteria.

  3. Hello! Very helpful article, thank you!

    I’m curious if and how this changes if you have “version” field in the entity. Should the version also be checked as part of the equals?

    • It’s not a good idea to use mutable attributes for entity identity checks.

  4. Hello, Vlad.
    My previous comment maybe was deleted.
    We have a Book class and suppose to field ‘title’ is required, but non unique.
    And we can’t use that field as NaturalId.
    With this suggestion we can implement next equals() and hashCode():

    public boolean equals(Object o) {
    ...some equality checks
    return id != null &&
    id.equals(other.getId());
    }

    public int hashCode() {
    return Objects.hash(getTitle());
    }

    This implementation passes all your tests and we lose not so mush performance in hash based collections.
    What do you think about this solution?

    • Unless the title is immutable, this will not work. If you load the entity, and the entity is stored in a Set, you modify the title, you won’t be able to locate your entity. Hence, use either an immutable property or a fixed hash code value. The performance impact is so low, almost negligible as entities cannot be fetched in large numbers. Because if you fetch too much data, you would have way bigger performance issues than the constant hashCode.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.