The best way to implement equals, hashCode, and toString with JPA and Hibernate

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Bytecode enhancement and toString

Last week, Mark Struberg, who is an Apache Software Foundation member and OpenJPA contributor, made the following statement:

Basically, he says that implementing toString is bad from a performance perspective. Well, that might be the case in OpenJPA, but in Hibernate things are a little bit different. Hibernate does not use bytecode enhancement by default.

Therefore, the toString method can use any basic entity attributes (that are needed to identify a certain entity in logs) as long as the basic attributes are fetched when the entity is loaded from the database.

Nevertheless, Hibernate allows attributes to be lazily loaded, but even then, the bytecode enhancement is not necessarily the best approach. Using subentities might be a better alternative, and it does not even require bytecode enhancement.

Equals and hashCode

Unfortunately, Mark continues this discussion with this very misleading statement about equals and hashCode:

This statement is wrong, as this post will demonstrate in great detail.

Equality contract

According to Java specification, a good equals implementation must have the following properties:

  1. reflexive
  2. symmetric
  3. transitive
  4. consistent

The first three are rather intuitive, but ensuring consistency in the context of JPA and Hibernate entities is usually the biggest challenge for developers.

As already explained, equals and hashCode must behave consistently across all entity state transitions.

Identifier types

From an equal contract perspective, the identifiers can be split into two categories:

  • Assigned identifiers
  • Database-generated identifiers

Assigned identifiers

Assigned identifiers are allocated prior to flushing the Persistence Context, and we can further split them into two subcategories:

  • Natural identifiers
  • Database-agnostic UUIDs

Natural identifiers are assigned by a third-party authority, like a book ISBN.

Database-agnostic UUID numbers are generated outside of the database, like calling the java.util.UUID#randomUUID method.

Both natural identifiers and database-agnostic UUIDs have the luxury of being known when the entity gets persisted. For this reason, it is safe to use them in the equals and hashCode implementation:

@Entity(name = "Book")
@Table(name = "book")
public class Book 
    implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    private String isbn;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getIsbn(), book.getIsbn());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getIsbn());
    }

    //Getters and setters omitted for brevity
}

For more details about the @NaturalId annotation, check out this article.

Database-generated identifiers

The database-generated identifiers are a different story. Because the identifier is assigned by the database during flush-time, the consistency guarantee breaks if we implemented the equals and hashCode based on the identifier just like for assigned identifiers.

This issue was detailed in my article, How to implement equals and hashCode using the entity identifier (primary key).

Therefore, whenever you have a database-generated identifier, a synthetic key (be it a numeric identifier or a database UUID type), you have to use the following equals and hashCode implementation:

@Entity(name = "Post")
@Table(name = "post")
public class Post implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    public Post() {}

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (!(o instanceof Post))
            return false;

        Post other = (Post) o;

        return id != null && 
               id.equals(other.getId());
    }

    @Override
    public int hashCode() {
        return getClass().hashCode();
    }
 
    //Getters and setters omitted for brevity
}

So, the hashCode yields the same value across all entity state transitions, and the equals method is going to use the identifier check only for non-transient entities.

That’s it!

The only time when you’ll see a performance bottleneck due to a single hash bucket is if you have a large collection of tens of thousands of entries.

But then, it implies that you fetched that large collection from the database. The performance penalty of fetching such a collection from the database is multiple orders of magnitude higher than the single bucket overhead.

That’s why you never map large collections with Hibernate. You use queries for those instead. But then, for small collections.

Also, most of the time you don’t even need to use a Set or a Map. For bidirectional associations, List(s) perform better anyway.

More misconceptions

Mark has written a blog post to justify his beliefs.

In his article, Marks says that the database-generated identifier equality implementation does not work for merge or getReference().

Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().

How to implement equals and hashCode using the JPA entity identifier (primary key) article demonstrates that this equals implementation works for detached objects. That was the whole point of coming up with such an implementation. We want it to work across all entity state transitions.

As for getReference(), there’s a check for that as well. It’s all on GitHub.

There’s one argument which I agree with, and that’s about making sure that the equality check is using only entity attributes that are immutable. That’s why the entity identifier sequence number is very appealing. And with the equality implementation method that I offer you, you can use it safely.

Unfortunately, Mark continues with more misconceptions, like:

Why do you need equals() and hashCode() at all?

This is a good question. And my answer is: “you don’t !”

Well, you DO!

If you don’t implement equals and hashCode then the merge test will fail, therefore breaking the consistency guarantee. It’s all explained in my How to implement equals and hashCode using the entity identifier (primary key) article, by the way.

And another misconception, from a Hibernate point of view

Why you shouldn’t store managed and detached entities in the same Collection

Not only that you should NOT avoid mixing detached and managed entities, but this is actually a great feature that allows you to hold on detached objects, and therefore prevent lost updates in long conversations.

And yet another misconception, from a Hibernate implementation perspective:

So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed.

Hibernate strives to deliver strong consistency. That’s why the READ_WRITE and TRANSACTIONAL cache concurrency strategies allow you to not worry about such inconsistencies. It’s the second-level cache provider that guarantees this isolation level. Just like a relational database system.

Only NONSTRICT_READ_WRITE offers a weaker isolation level, but the non strict naming choice is self-descriptive after all.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The best advice I can give you is that you should always question every statement that you read on the Internet. You should always check every advice against your current JPA provider implementation because details make a very big difference.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.