How to implement equals and hashCode using the JPA entity identifier (primary key)

Introduction

As previously explained, using the JPA entity business key for equals and hashCode is always best choice. However, not all entities feature a unique business key, so we need to use another database column that is also unique, like the primary key.

But using the entity identifier for equality is very challenging, and this post is going to show you how you can use it without issues.

Test harness

When it comes to implementing equals and hashCode, there is one and only one rule you should have in mind:

Equals and hashCode must behave consistently across all entity state transitions.

To test the effectiveness of an equals and hashCode implementation, the following test can be used:

protected <T extends Identifiable<? extends Serializable>> 
    void assertEqualityConstraints(Class<T> clazz, T entity) {
    
    Set<T> tuples = new HashSet<>();

    assertFalse(tuples.contains(entity));
    tuples.add(entity);
    assertTrue(tuples.contains(entity));

    doInJPA(entityManager -> {
        entityManager.persist(entity);
        entityManager.flush();
        assertTrue("The entity is found after it's persisted",
            tuples.contains(entity));
    });

    //The entity is found after the entity is detached
    assertTrue(tuples.contains(entity));

    doInJPA(entityManager -> {
        T _entity = entityManager.merge(entity);
        assertTrue("The entity is found after it's merged",
            tuples.contains(_entity));
    });

    doInJPA(entityManager -> {
        entityManager.unwrap(Session.class).update(entity);
        assertTrue("The entity is found after it's reattached",
            tuples.contains(entity));
    });

    doInJPA(entityManager -> {
        T _entity = entityManager.find(clazz, entity.getId());
        assertTrue("The entity is found after it's loaded " +
                   "in an other Persistence Context",
            tuples.contains(_entity));
    });

    executeSync(() -> {
        doInJPA(entityManager -> {
            T _entity = entityManager.find(clazz, entity.getId());
            assertTrue("The entity is found after it's loaded " +
                       "in an other Persistence Context and " +
                       "in an other thread",
                tuples.contains(_entity));
        });
    });
}

Natural id

The first use case to test is the natural id mapping. Considering the following entity:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    private String isbn;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getIsbn(), book.getIsbn());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getIsbn());
    }

    //Getters and setters omitted for brevity
}

The isbn property is also a @NaturalId, therefore, it should be unique and not nullable. Both equals and hashCode use the isbn property in their implementations. When running the following test case:

Book book = new Book();
book.setTitle("High-PerformanceJava Persistence");
book.setIsbn("123-456-7890");

assertEqualityConstraints(Book.class, book);

Everything works fine, as expected.

Default java.lang.Object equals and hashCode

What if our entity does not have any column that can be used as a @NaturalId? The first urge is to not define your own implementations of equals and hashCode, like in the following example:

@Entity(name = "Book")
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    //Getters and setters omitted for brevity
}

However, when testing this implementation:

Book book = new Book();
book.setTitle("High-PerformanceJava Persistence");

assertEqualityConstraints(Book.class, book);

Hibernate throws the following exception:

java.lang.AssertionError: The entity is found after it's merged

The original entity is not equal with the one returned by the merge method because two distinct Object(s) do not share the same reference.

Using the entity identifier for equals and hashCode

So if the default equals and hashCode is no good either, then let’s use the entity identifier for our custom implementation. Let’s just use our IDE to generate the equals and hashCode and see how it works:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getId(), book.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getId());
    }

    //Getters and setters omitted for brevity
}

When running the previous test case, Hibernate throws the following exception:

java.lang.AssertionError: The entity is found after it's persisted

When the entity was first stored in the Set, the identifier was null. After the entity was persisted, the identifier was assigned to a value that was automatically generated, hence the hashCode differs. For this reason, the entity cannot be found in the Set after it got persisted.

Fixing the entity identifier equals and hashCode

To address the previous issue, there is only one solution: the hashCode should always return the same value:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return getId() != null && Objects.equals(getId(), book.getId());
    }

    @Override
    public int hashCode() {
        return 31;
    }

    //Getters and setters omitted for brevity
}

Also, when the entity identifier is null, we can guarantee equality only for the same object references. Otherwise, no transient object is equal to any other transient or persisted object. That’s why the identifier equality check is done only if the current Object identifier is not null.

With this implementation, the equals and hashCode test runs fine for all entity state transitions. The reason why it works is because the hashCode value does not change, hence, we can rely on the java.lang.Object reference equality as long as the identifier is null.

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

The entity identifier can be used for equals and hashCode, but only if the hashCode returns the same value all the time. This might sound like a terrible thing to do since it defeats the purpose of using multiple buckets in a HashSet or HashMap.

However, for performance reasons, you should always limit the number of entities that are stored in a collection. You should never fetch thousands of entities in a @OneToMany Set because the performance penalty on the database side is multiple orders of magnitude higher than using a single hashed bucket.

All tests are available on GitHub.

Enter your email address to follow this blog and receive notifications of new posts by email.

Advertisements

14 thoughts on “How to implement equals and hashCode using the JPA entity identifier (primary key)

  1. Great article, Vlad.

    Very interesting your advice about keeping the same identifier through all the entity transition states. I’ve never thought about that before! I really liked your integration test to assert that, very smart!

    I’m not sure if I’m lucky, but most of time I use the entity identifier (attribute with @Id) for my entities and I can’t remember any time I had issue with it.

    One thing I find important is how to define the entities’ ID, because in the end every entity has an identity: it may be the @Id attribute or @NaturalId attribute or even two or more attributes together depending on your domain. So it has to do more with your domain model than your database schema.

    Let me ask you a question: what’re the benefits of using @NaturalId?

    1. Thanks. @NaturalId allows you to fetch an entity using the business key, just like when you fetch a User by its email property. This is also integrated with the second-level cache, so you can actually do that without even hitting the DB.

  2. Oh you use a uuid that is generated in Java, you dont have a state with null Identifier.
    Just don’t ask the db for an id.
    Why would you wanna miss out on bucket usage?

    1. But a UUID is clunky, taking a lot of space on the DB side. The more space it takes, the greater the pressure on storing the index in memory, especially for tables with hundreds of million of rows. That’s why numerical identifiers are much more common than UUIDs.

  3. Excerpt from Effective Java:

    So what should a hashCode method look like? It’s trivial to write one that is legal but not good. This one, for example, is always legal, but it should never be used:

    // The worst possible legal hash function – never use!
    public int hashCode() { return 42; }

    It’s legal because it ensures that equal objects have the same hash code. It’s atrocious because it ensures that every object has the same hash code. Therefore every object hashes to the same bucket, and hash tables degenerate to linked lists. Programs that should run in linear time run instead in quadratic time. For large hash tables, this is the difference between working and not working.

    1. In general, yes. This is what we should aim for. However, this is a particular use case.

      In practice, you don’t need to store tons of entities in a Set (be it a mapped association or a collection in some DTO) because the fetching step is going to have the biggest impact on application performance.

      In-memory data processing is fast. In one millisecond, you can find a single entity among tens of thousand of siblings even if the hash value was a constant value, and you have only one bucket. On the other hand, fetching thousands of entities to store them in a Set is a terrible thing to do. Fetching so much data is going to take one or two orders of magnitude more time than the in-memory processing.

      So, as long as the collections are small enough and all result sets are limited in size, this is not an issue. In the end, you are more likely to bump into an issue because equals and hashcode are not properly implemented across all entity state transitions than to pay the price for a slight performance penalty when you operate with limited sets of data.

  4. That is actually quite clever.
    I guess if you have attributes that are immutable, but aren’t a key candidate (like, a creation date/time that isn’t unique) you could use that in the hashCode() implementation, to get a somewhat better bucket utilization.
    Thanks for the insight!

    1. A creation timestamp that is not set by the database is a good addition to the identifier in the hashCode implementation. The creation timestamp alone is not because two entities might be created at the same microsecond (or even second for old MySQL versions).

      1. Sorry I wasn’t clear, i was referring to the case where you return a constant in the hashCode() function.
        The id shouldn’t (or mustn’t) be used as part of hashCode() in that case.

      2. It was me that I mistakingly specify “an addition to the identifier”. Yes, the hashCode could use the creation time if that does not change when we persist the entity. Even if we omit it during equals, the entity could be located since either the Object reference or the id will cover the equality just fine.

  5. Because of “Objects.equals(getId(), book.getId());” two new (transient) book entities are always equal, but as soon as one (or both) of them are persisted they are not equal anymore, right? I would not call this a “consistent behaviour”. Better let equals() return false if one (or both) entities are transient…

    1. Only if the two Object references point to the same Object on Heap (e.g. if (this == o) return true;), we can say that two transient objects are consistently equal even after persistence.

      Two disparate objects that are transient should probably not be equal. Therefore, I should only make that check if and only if the id is not null. Otherwise, I should just return false:

      @Override
      public boolean equals(Object o) {
      if (this == o) return true;
      if (!(o instanceof Book)) return false;
      Book book = (Book) o;
      return getId() != null && Objects.equals(getId(), book.getId());
      }

  6. Great tutorials and examples. I think it will help if enterprises ensure new hires go through all of your tutorials and understand them before touching hibernate, that will help a great deal. I actually have a dislike for Hibernate, though I work with it weekly and have been using it for years. Your resources have been immensely beneficial. I am beginning to think I can re-evaluate hibernate for personal projects. It’s more like I am forced to use it for so called enterprise projects. I think Redhat should have “Vlad JPA/Hibernate Certification”. I am serious. You have made a difference. Your examples have been really helpful. Can’t think of anything close to your resources in the mid 2000s. Most so called architects proclaim JPA/Hibernate, yet VERY VERY FEW have a good understanding of its workings. Thanks again for the effort you put into your work. God bless you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s