How to implement equals and hashCode using the JPA entity identifier (Primary Key)

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

As previously explained, using the JPA entity business key for equals and hashCode is always best choice. However, not all entities feature a unique business key, so we need to use another database column that is also unique, as the primary key.

But using the entity identifier for equality is very challenging, and this post is going to show you how you can use it without issues.

Test harness

When it comes to implementing equals and hashCode, there is one and only one rule you should have in mind:

Equals and hashCode must behave consistently across all entity state transitions.

To test the effectiveness of an equals and hashCode implementation, the following test can be used:

protected void assertEqualityConsistency(
        Class<T> clazz,
        T entity) {

    Set<T> tuples = new HashSet<>();

    assertFalse(tuples.contains(entity));
    tuples.add(entity);
    assertTrue(tuples.contains(entity));

    doInJPA(entityManager -> {
        entityManager.persist(entity);
        entityManager.flush();
        assertTrue(
            "The entity is not found in the Set after it's persisted.",
            tuples.contains(entity)
        );
    });

    assertTrue(tuples.contains(entity));

    doInJPA(entityManager -> {
        T entityProxy = entityManager.getReference(
            clazz,
            entity.getId()
        );
        assertTrue(
            "The entity proxy is not equal with the entity.",
            entityProxy.equals(entity)
        );
    });

    doInJPA(entityManager -> {
        T entityProxy = entityManager.getReference(
            clazz,
            entity.getId()
        );
        assertTrue(
            "The entity is not equal with the entity proxy.",
            entity.equals(entityProxy));
    });

    doInJPA(entityManager -> {
        T _entity = entityManager.merge(entity);
        assertTrue(
            "The entity is not found in the Set after it's merged.",
            tuples.contains(_entity)
        );
    });

    doInJPA(entityManager -> {
        entityManager.unwrap(Session.class).update(entity);
        assertTrue(
            "The entity is not found in the Set after it's reattached.",
            tuples.contains(entity)
        );
    });

    doInJPA(entityManager -> {
        T _entity = entityManager.find(clazz, entity.getId());
        assertTrue(
            "The entity is not found in the Set after it's loaded in a subsequent Persistence Context.",
            tuples.contains(_entity)
        );
    });

    doInJPA(entityManager -> {
        T _entity = entityManager.getReference(clazz, entity.getId());
        assertTrue(
            "The entity is not in the Set found after it's loaded as a proxy in an other Persistence Context.",
            tuples.contains(_entity)
        );
    });

    T deletedEntity = doInJPA(entityManager -> {
        T _entity = entityManager.getReference(
            clazz,
            entity.getId()
        );
        entityManager.remove(_entity);
        return _entity;
    });

    assertTrue(
        "The entity is found in not the Set even after it's deleted.",
        tuples.contains(deletedEntity)
    );
}

Natural id

The first use case to test is the natural id mapping. Considering the following entity:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    private String isbn;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getIsbn(), book.getIsbn());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getIsbn());
    }

    //Getters and setters omitted for brevity
}

The isbn property is also a @NaturalId, therefore, it should be unique and not nullable. Both equals and hashCode use the isbn property in their implementations.

For more details about the @NaturalId annotation, check out this article.

When running the following test case:

Book book = new Book();
book.setTitle("High-PerformanceJava Persistence");
book.setIsbn("123-456-7890");

assertEqualityConstraints(Book.class, book);

Everything works fine, as expected.

Default java.lang.Object equals and hashCode

What if our entity does not have any column that can be used as a @NaturalId? The first urge is to not define your own implementations of equals and hashCode, like in the following example:

@Entity(name = "Book")
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    //Getters and setters omitted for brevity
}

However, when testing this implementation:

Book book = new Book();
book.setTitle("High-PerformanceJava Persistence");

assertEqualityConstraints(Book.class, book);

Hibernate throws the following exception:

java.lang.AssertionError: The entity is found after it's merged

The original entity is not equal with the one returned by the merge method because two distinct Object(s) do not share the same reference.

Using the entity identifier for equals and hashCode

So if the default equals and hashCode is no good either, then let’s use the entity identifier for our custom implementation. Let’s just use our IDE to generate the equals and hashCode and see how it works:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getId(), book.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getId());
    }

    //Getters and setters omitted for brevity
}

When running the previous test case, Hibernate throws the following exception:

java.lang.AssertionError: The entity is found after it's persisted

When the entity was first stored in the Set, the identifier was null. After the entity was persisted, the identifier was assigned to a value that was automatically generated, hence the hashCode differs. For this reason, the entity cannot be found in the Set after it got persisted.

Fixing the entity identifier equals and hashCode

To address the previous issue, there is only one solution: the hashCode should always return the same value:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (!(o instanceof Book))
            return false;

        Book other = (Book) o;

        return id != null && 
               id.equals(other.getId());
    }

    @Override
    public int hashCode() {
        return 31;
    }

    //Getters and setters omitted for brevity
}

Also, when the entity identifier is null, we can guarantee equality only for the same object references. Otherwise, no transient object is equal to any other transient or persisted object. That’s why the identifier equality check is done only if the current Object identifier is not null.

With this implementation, the equals and hashCode test runs fine for all entity state transitions. The reason why it works is that the hashCode value does not change, hence, we can rely on the java.lang.Object reference equality as long as the identifier is null.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The entity identifier can be used for equals and hashCode, but only if the hashCode returns the same value all the time. This might sound like a terrible thing to do since it defeats the purpose of using multiple buckets in a HashSet or HashMap.

However, for performance reasons, you should always limit the number of entities that are stored in a collection. You should never fetch thousands of entities in a @OneToMany Set because the performance penalty on the database side is multiple orders of magnitude higher than using a single hashed bucket.

All tests are available on GitHub.

FREE EBOOK

30 Comments on “How to implement equals and hashCode using the JPA entity identifier (Primary Key)

  1. Hi, Vlad!
    Thank you for this nice post.
    But I have a question about this line:

    return id != null && id.equals(book.id);

    What if argument of equals will be an uninitialized proxy? Shouldn’t you get Id of comparable object using getter?

    return id != null && id.equals(book.getId());

  2. All the complexity with equals/hashCode goes away if the assigned identifiers are used for entities. I see no drawbacks of this approach, and so I don’t understand why many (maybe even the majority) of people keep using other ID generation strategies.

    Vlad, do you see drawbacks of assigned identifiers (the necessity to have a high-performance ID generator which may work similarly to Hibernate’s hi-lo/pooling generators and to explicitly supplying IDs via the constructor of an entity are not drawbacks 🙂 ?

    • And how do generate those unique values in the application?

      • The same way Hibernate does this: by generating random UUID values; or if that’s not applicable, by using DB sequence with pooling implemented on top of it in the application; or by using a different approach suitable to the applications.

      • That’s a bad idea. It takes 128 bits, and the values are random, hence they don’t work nice with clustered indexes. And the bloat is amplified by every Foreign Key.

      • That’s a bad idea
        That was only one of the approaches I mentioned. And it is not always a bad idea. But if you don’t like it, I also mentioned (twice already:) ) DB sequence + pooling.

      • The only time you’d want to use it is for multi-master replication. Otherwise, a numeric id is a much better approach.

      • Somehow we deviated from the original question. But since you did not mention the drawbacks of using assigned identifiers, I am guessing you also don’t see any? (the possibility of picking a non-optimal way of generating the identifiers is the same for both assigned IDs and other Hibernate ID generation strategies).

      • I just told 2 reasons: bloat and randomness.

      • But your mentions of bloat and randomness were about UUID identifiers. The application is free to choose any way of generating IDs for the assigned identifier approach. Which means that the app can easily use the DB sequence with pooling on top of it (or with hi-lo if the DB sequence does not allow advancing the sequence value for more than 1). It’s done in Hibernate, and can be done anywhere (it’s not hard to do at all).

      • Yes, that ca be done, but it sounds more work than just providing the right equals and hashCode.

      • Yes, there is this thing “implement your ID generator VS implement only hashCode/equals in an unexpected way” when choosing the assigned ID strategy. But the additional benefit we have with it is that you know that the ID of an entity is never null. And this is a noticeable simplification for a real-life applications when you can’t always easily say whether the entity you have is new or not.

      • Goid point. I’ll keep that in mind for my new Hypersistence Optimizer tool. I will check the equals/hashCode validity too.

  3. Thanks for the explanations Vlad!

    Would you say it’s a requirement to (almost) always implement the equals/hashCode methods for every @Entity?

    I’m asking this because I’ve seen that many, if not most of the Entities in the project high-performance-java-persistence don’t have custom equals/hashCode methods. And I think people learn best practices by reading good, consistent code (even if it’s longer).

    • If you don’t have to store the entities in a collection or compare them, then you won’t need equals or hashCode. There is no universal rule that applies to any use case.

  4. I do not see how your implementation would work when merging a new entity. The new entity would not have an id, while the object returned by the merge operation would. So your implementation would consider them different.

    In fact, your test harness only works, because you first persist the entity, providing it with an id, an then you detach and merge it later on. At that time, id generation will no longer be a issue and your implementation will consider them equal.

    In the case of a direct merge of new entites, this will be no better than object-references.

    • There is no such thing as a direct merge. Merge is for integrating changes o detached entities, which have been persisted previously because that’s how JPA works. You can find all the tests in my High-Performance Java Persistence GitHub repository, so feel free to provide a test case that proves this implementation does not work.

  5. What do you think of the default implementation provided by Spring Data Jaa in org.springframework.data.jpa.domain.AbstractPersistable?

    They use a slightly optimized version of hashCode(), if id is not null:

    public int hashCode() {
    int hashCode = 17;
    hashCode += null == getId() ? 0 : getId().hashCode() * 31;
    return hashCode;
    }

    https://github.com/spring-projects/spring-data-jpa/blob/master/src/main/java/org/springframework/data/jpa/domain/AbstractPersistable.java

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.