The best way to implement equals, hashCode, and toString with JPA and Hibernate

(Last Updated On: January 4, 2018)

Bytecode enhancement and toString

Last week, Mark Struberg, who is an Apache Software Foundation member and OpenJPA contributor, made the following statement:

Basically, he says that implementing toString is bad from a performance perspective. Well, that might be the case in OpenJPA, but in Hibernate things are a little bit different. Hibernate does not use bytecode enhancement by default.

Therefore, the toString method can use any basic entity attributes (that are needed to identify a certain entity in logs) as long as the basic attributes are fetched when the entity is loaded from the database.

Nevertheless, Hibernate allows attributes to be lazy loaded, but even then, the bytecode enhancement is not the necessarily the best approach. Using subentities might be a better alternative, and it does not even require bytecode enhancement.

Equals and hashCode

Unfortunately, Mark continues this discussion with this very misleading statement about equals and hashCode:

This statement is wrong, as this post will demonstrate in great detail.

Equality contract

According to Java specification, a good equals implementation must have the following properties:

  1. reflexive
  2. symmetric
  3. transitive
  4. consistent

The first three are rather intuitive, but ensuring consistency in the context of JPA and Hibernate entities is usually the biggest challenge for developers.

As already explained, equals and hashCode must behave consistently across all entity state transitions.

Identifier types

From an equal contract perspective, the identifiers can be split into two categories:

  • Assigned identifiers
  • Database-generated identifiers

Assigned identifiers

Assigned identifiers are allocated prior to flushing the Persistence Context, and we can further split them into two subcategories:

  • Natural identifiers
  • Database-agnostic UUIDs

Natural identifiers are assigned by a third-party authority, like a book ISBN.

Database-agnostic UUID numbers are generated outside of the database, like calling the java.util.UUID#randomUUID method.

Both natural identifiers and database-agnostic UUIDs have the luxury of being known when the entity gets persisted. For this reason, it is safe to use them in the equals and hashCode implementation:

@Entity(name = "Book")
@Table(name = "book")
public class Book 
    implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    private String isbn;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getIsbn(), book.getIsbn());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getIsbn());
    }

    //Getters and setters omitted for brevity
}

For more details about the @NaturalId annotation, check out this article.

Database-generated identifiers

The database-generated identifiers are a different story. Because the identifier is assigned by the database during flush-time, the consistency guarantee breaks if we implemented the equals and hashCode based on the identifier just like for assigned identifiers.

This issue was detailed in my article, How to implement equals and hashCode using the entity identifier (primary key).

Therefore, whenever you have a database-generated identifier, a synthetic key (be it a numeric identifier or a database UUID type), you have to use the following equals and hashCode implementation:

@Entity
public class Book implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return getId() != null && Objects.equals(getId(), book.getId());
    }

    @Override
    public int hashCode() {
        return 31;
    }

    //Getters and setters omitted for brevity
}

So, the hashCode yields the same value across all entity state transitions, and the equals method is going to use the identifier check only for non-transient entities.

That’s it!

The only time when you’ll see a performance bottleneck due to a single hash bucket is if you have a large collection of tens of thousands of entries.

But then, it implies that you fetched that large collection from the database. The performance penalty of fetching such a collection from the database is multiple orders of magnitude higher than the single bucket overhead.

That’s why you never map large collections with Hibernate. You use queries for those instead. But then, for small collections.

Also, most of the time you don’t even need to use a Set or a Map. For bidirectional associations, List(s) perform better anyway.

More misconceptions

Mark has written a blog post to justify his beliefs.

In his article, Marks says that the database-generated identifier equality implementation does not work for merge or getReference().

Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().

How to implement equals and hashCode using the JPA entity identifier (primary key) article demonstrates that this equals implementation works for detached objects. That was the whole point of coming up with such an implementation. We want it to work across all entity state transitions.

As for getReference(), there’s a check for that as well. It’s all on GitHub.

There’s one argument which I agree with, and that’s about making sure that the equality check is using only entity attributes that are immutable. That’s why the entity identifier sequence number is very appealing. And with the equality implementation method that I offer you, you can use it safely.

Unfortunately, Mark continues with more misconceptions, like:

Why do you need equals() and hashCode() at all?

This is a good question. And my answer is: “you don’t !”

Well, you DO!

If you don’t implement equals and hashCode then the merge test will fail, therefore breaking the consistency guarantee. It’s all explained in my How to implement equals and hashCode using the entity identifier (primary key) article, by the way.

And another misconception, from a Hibernate point of view

Why you shouldn’t store managed and detached entities in the same Collection

Not only that you should NOT avoid mixing detached and managed entities, but this is actually a great feature that allows you to hold on detached objects, and therefore prevent lost updates in long conversations.

And yet another misconception, from a Hibernate implementation perspective:

So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed.

Hibernate strives to deliver strong consistency. That’s why the READ_WRITE and TRANSACTIONAL cache concurrency strategies allow you to not worry about such inconsistencies. It’s the second-level cache provider that guarantees this isolation level. Just like a relational database system.

Only NONSTRICT_READ_WRITE offers a weaker isolation level, but the non strict naming choice is self-descriptive after all.

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

The best advice I can give you is that you should always question every statement that you read on the Internet. You should always check every advice against your current JPA provider implementation because details make a very big difference.

Subscribe to our Newsletter

* indicates required
10 000 readers have found this blog worth following!

If you subscribe to my newsletter, you'll get:
  • A free sample of my Video Course about running Integration tests at warp-speed using Docker and tmpfs
  • 3 chapters from my book, High-Performance Java Persistence, 
  • a 10% discount coupon for my book. 
Get the most out of your persistence layer!

Advertisements

29 thoughts on “The best way to implement equals, hashCode, and toString with JPA and Hibernate

  1. I just had an idea about the hashcode. I don’t really like it being a hardcoded value. What can we do about it? What if we call Objects.hash(... on all the other fields that are not @Id?

    1. Then, it will break the consistency hash test in case the underlying properties change, which is natural. The equals and hashCode should be based on attributes that are immutable and render a unique key (either simple or composite).

      Grab the GitHub repository and you test it if you like.

      Just because the hashCode is hardcoded, it does not mean it is wrong. Entities are not simple POJOs that you can store in a HashSet with millions.

  2. Nice article!
    As I understand, one should only implement equals & hashcode if they use hash structures like HashSet, HashMap, etc, or if they compare instances of this particular entity somewhere in the code. Otherwise one should not implement these methods so the entity definition looks cleaner without the “unused” methods. Do you agree with that?

    1. Thanks.

      As I understand, one should only implement equals & hashcode if they use hash structures like HashSet, HashMap, etc, or if they compare instances of this particular entity somewhere in the code. Otherwise one should not implement these methods so the entity definition looks cleaner without the “unused” methods. Do you agree with that?

      I think you have missed the:

      Well, you DO!

      part in my article which draws the opposite conclusion than the one you have just drawn.

      1. I saw that but didn’t give much of a thought, though.
        Let’s say, I have a User entity, that has generated id and a name. I select the user with the name John and rename him to Mark and call Spring Data save. Default hashcode now generates a different value, thus breaking the spec (if equals then hashcode must also be the same). But I don’t think that matters since JPA only looks for @Id. Specifically, Spring Data sends merge just because the @Id is not null and persist otherwise. And the database does not care at all. But if the user has a unique username (business key) then it would be significant in case we add users to a set, for example.

    1. No, it does not. That’s why you need o use explicit optimistic locking requests. And even so, you won’t support phantom read skew if the DB does not support index range locks or predicate locks.

      1. Yeah using explicit optimistic locking and making good use of OPTIMISTIC_FORCE_INCREMENT we have repeteable read, right ?

        we still can have phantoms reads REPETEABLE READ allows them.

  3. Hey Vlad,
    I’ve come up with another doubt:

    Suppose use read committed isolation level on set on the db
    and use jpa optimistic locking to raise this level to repeteable read and to
    enable long conversation transactions.

    What about triggers and store procedures that can run on the db ?
    For example, should we check and update version field manually ?

    1. JPA optimistic locking does not raise the isolation level to Repeatable Read. It’s just that it allows you to overcome lost updates across multiple DB transactions.

      As for stored procedures and triggers, that’s a totally different discussion. Basically, it depends on your data integrity requirements and what the stored procedures do.

Leave a Reply

Your email address will not be published. Required fields are marked *