The best way to implement equals, hashCode, and toString with JPA and Hibernate

(Last Updated On: April 18, 2018)

Bytecode enhancement and toString

Last week, Mark Struberg, who is an Apache Software Foundation member and OpenJPA contributor, made the following statement:

Basically, he says that implementing toString is bad from a performance perspective. Well, that might be the case in OpenJPA, but in Hibernate things are a little bit different. Hibernate does not use bytecode enhancement by default.

Therefore, the toString method can use any basic entity attributes (that are needed to identify a certain entity in logs) as long as the basic attributes are fetched when the entity is loaded from the database.

Nevertheless, Hibernate allows attributes to be lazy loaded, but even then, the bytecode enhancement is not the necessarily the best approach. Using subentities might be a better alternative, and it does not even require bytecode enhancement.

Equals and hashCode

Unfortunately, Mark continues this discussion with this very misleading statement about equals and hashCode:

This statement is wrong, as this post will demonstrate in great detail.

Equality contract

According to Java specification, a good equals implementation must have the following properties:

  1. reflexive
  2. symmetric
  3. transitive
  4. consistent

The first three are rather intuitive, but ensuring consistency in the context of JPA and Hibernate entities is usually the biggest challenge for developers.

As already explained, equals and hashCode must behave consistently across all entity state transitions.

Identifier types

From an equal contract perspective, the identifiers can be split into two categories:

  • Assigned identifiers
  • Database-generated identifiers

Assigned identifiers

Assigned identifiers are allocated prior to flushing the Persistence Context, and we can further split them into two subcategories:

  • Natural identifiers
  • Database-agnostic UUIDs

Natural identifiers are assigned by a third-party authority, like a book ISBN.

Database-agnostic UUID numbers are generated outside of the database, like calling the java.util.UUID#randomUUID method.

Both natural identifiers and database-agnostic UUIDs have the luxury of being known when the entity gets persisted. For this reason, it is safe to use them in the equals and hashCode implementation:

@Entity(name = "Book")
@Table(name = "book")
public class Book 
    implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    private String isbn;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getIsbn(), book.getIsbn());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getIsbn());
    }

    //Getters and setters omitted for brevity
}

For more details about the @NaturalId annotation, check out this article.

Database-generated identifiers

The database-generated identifiers are a different story. Because the identifier is assigned by the database during flush-time, the consistency guarantee breaks if we implemented the equals and hashCode based on the identifier just like for assigned identifiers.

This issue was detailed in my article, How to implement equals and hashCode using the entity identifier (primary key).

Therefore, whenever you have a database-generated identifier, a synthetic key (be it a numeric identifier or a database UUID type), you have to use the following equals and hashCode implementation:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return id != null && id.equals(((Post) o).id);
    }

    @Override
    public int hashCode() {
        return 31;
    }

    //Getters and setters omitted for brevity
}

So, the hashCode yields the same value across all entity state transitions, and the equals method is going to use the identifier check only for non-transient entities.

That’s it!

The only time when you’ll see a performance bottleneck due to a single hash bucket is if you have a large collection of tens of thousands of entries.

But then, it implies that you fetched that large collection from the database. The performance penalty of fetching such a collection from the database is multiple orders of magnitude higher than the single bucket overhead.

That’s why you never map large collections with Hibernate. You use queries for those instead. But then, for small collections.

Also, most of the time you don’t even need to use a Set or a Map. For bidirectional associations, List(s) perform better anyway.

More misconceptions

Mark has written a blog post to justify his beliefs.

In his article, Marks says that the database-generated identifier equality implementation does not work for merge or getReference().

Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().

How to implement equals and hashCode using the JPA entity identifier (primary key) article demonstrates that this equals implementation works for detached objects. That was the whole point of coming up with such an implementation. We want it to work across all entity state transitions.

As for getReference(), there’s a check for that as well. It’s all on GitHub.

There’s one argument which I agree with, and that’s about making sure that the equality check is using only entity attributes that are immutable. That’s why the entity identifier sequence number is very appealing. And with the equality implementation method that I offer you, you can use it safely.

Unfortunately, Mark continues with more misconceptions, like:

Why do you need equals() and hashCode() at all?

This is a good question. And my answer is: “you don’t !”

Well, you DO!

If you don’t implement equals and hashCode then the merge test will fail, therefore breaking the consistency guarantee. It’s all explained in my How to implement equals and hashCode using the entity identifier (primary key) article, by the way.

And another misconception, from a Hibernate point of view

Why you shouldn’t store managed and detached entities in the same Collection

Not only that you should NOT avoid mixing detached and managed entities, but this is actually a great feature that allows you to hold on detached objects, and therefore prevent lost updates in long conversations.

And yet another misconception, from a Hibernate implementation perspective:

So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed.

Hibernate strives to deliver strong consistency. That’s why the READ_WRITE and TRANSACTIONAL cache concurrency strategies allow you to not worry about such inconsistencies. It’s the second-level cache provider that guarantees this isolation level. Just like a relational database system.

Only NONSTRICT_READ_WRITE offers a weaker isolation level, but the non strict naming choice is self-descriptive after all.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The best advice I can give you is that you should always question every statement that you read on the Internet. You should always check every advice against your current JPA provider implementation because details make a very big difference.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

10 thoughts on “The best way to implement equals, hashCode, and toString with JPA and Hibernate

  1. Hi!
    Thanks for this post.

    I have a couple of questions to you.

    In this post https://vladmihalcea.com/hibernate-facts-equals-and-hashcode/
    you said that:

    “Using a combination of fields that are unique among Entities is probably the best choice for implementing equals and hashCode methods.”

    And in this post I can see that you suggest to compare ID inside equals() for database-generated identifiers.

    It seems to me that “compare ID” approach deprives you of the opportunity to use transient entities inside Set ( because of duplicates inside Set). Correct me if I am wrong.

    Could you please clarify what approach is more preferable nowadays for database-generated identifiers?

    1. This article already answers your questions. In my book GitHub repository, you can find the tests that prove the answers to the question about Set too.

  2. Hi!
    Just wanted to say that getId() != null && Objects.equals(getId(), book.getId()); gets evaluated into getId() != null && (getId() == book.getId()) || (getId() != null && getId().equals(book.getId()));. So, you don’t need to check for null.

  3. Hi Vlad,

    Great post! I’ve always used the equals right, but you do have a different implementation for hashcode, and I sure haven’t put much thought about the transition from new to managed entities.

    I have a question though, with all entities having a constant hashcode, I can imagine that hash dependent collections such as hashmaps and hashsets start to perform pretty poorly.

    I am not entirely sure whether Hibernate internally uses the hashcode at all, and when may be the first time of hashcode to typically be invoked, but would it for example to work to check on the first invocation of hashCode whether the id value is already present, and if it is, always use the hash of the id as hash? (And only fallback to using the constant value if the id was not available on the first hashcode access.) I can imagine this has quite some performance improvements for “typical” (i.e. managed) use.

    Another approach I’ve seen is using UUID natural id’s, but it would seem a bit overkill to start adding a UUID column to all my entities just to get the hashcode right (approach from: https://youtu.be/EZwpOLCfuq4?t=37m23s )

    Thanks again for your very useful posts!

    1. That will not work because you need to have the hashCode consistent. The UUID is a solution but, as you said, it adds an unnecessary yet bulky column.

      1. Yes, plus that whilst the UUID would require the entity to be fully loaded, whereas access to the primary key is optimised AFAIK (in Hibernate, although this is against the JPA spec).

        Nevertheless, why exactly would the hashcode be inconsistent? Managed entities will have the same hashcode (based on the identifier), new entities will have the same hashcode (constant value), and persisted new entities will have the same hashcode (constant value, as hashcode was accessed prior to persisting). Within the same persistence context, always the same entity should be reused. So the only thing I can imagine where this breaks is when a issue gets removed from the persistence context, correct?

        Given all the complications with equals / hashcode (either having to add an additional colomn and enforce loading of entities, or losing performance on all hash based data structures) I’m starting to move towards the no-equals/hashcode-for-entities camp after all…

      2. Because if you add it to a HashSet with a constant hash, then persist it, and the hash switches to id, you won’t be able to locate your entity anymore.

        No-equals is not an option because it breaks consistency too. Comparing just Object references does not work for the merge operation.

      3. Just to be sure that you understand my suggested approach correctly. I am thinking of something along the lines of:

        @Id Long id;
        @Transient Long hashcode;

        public long hashCode() {
        return hashcode == null ?
        (hashcode = (id == null ? 31 : id )) : hashcode;
        }

        That hopefully uses a “stronger” hash iff the id is available in time. This, as far as I can see, would not break persisting. Yet it may have some unexpected issues with detached entities perhaps?

      4. You can create a new test and send me a Pull Request on my GitHub repository. Just extend the AbstractEqualityTest.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.