The best way to implement equals, hashCode, and toString with JPA and Hibernate
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Bytecode enhancement and toString
Last week, Mark Struberg, who is an Apache Software Foundation member and OpenJPA contributor, made the following statement:
People, PLEASE do _not_ write toString() methods in #JPA entities! This will implicitly trigger lazy loading on all fields...
— Mark Struberg (@struberg) October 13, 2016
Basically, he says that implementing toString
is bad from a performance perspective. Well, that might be the case in OpenJPA, but in Hibernate things are a little bit different. Hibernate does not use bytecode enhancement by default.
Therefore, the toString method can use any basic entity attributes (that are needed to identify a certain entity in logs) as long as the basic attributes are fetched when the entity is loaded from the database.
Nevertheless, Hibernate allows attributes to be lazily loaded, but even then, the bytecode enhancement is not necessarily the best approach. Using subentities might be a better alternative, and it does not even require bytecode enhancement.
Equals and hashCode
Unfortunately, Mark continues this discussion with this very misleading statement about equals
and hashCode
:
oh, and the same applies for hashCode() and equals() in #JPA entities: also almost always unnecessary and creating damage.
— Mark Struberg (@struberg) October 13, 2016
This statement is wrong, as this post will demonstrate in great detail.
Equality contract
According to Java specification, a good equals
implementation must have the following properties:
- reflexive
- symmetric
- transitive
- consistent
The first three are rather intuitive, but ensuring consistency in the context of JPA and Hibernate entities is usually the biggest challenge for developers.
As already explained, equals
and hashCode
must behave consistently across all entity state transitions.
Identifier types
From an equal contract perspective, the identifiers can be split into two categories:
- Assigned identifiers
- Database-generated identifiers
Assigned identifiers
Assigned identifiers are allocated prior to flushing the Persistence Context, and we can further split them into two subcategories:
- Natural identifiers
- Database-agnostic UUIDs
Natural identifiers are assigned by a third-party authority, like a book ISBN.
Database-agnostic UUID numbers are generated outside of the database, like calling the java.util.UUID#randomUUID
method.
Both natural identifiers and database-agnostic UUIDs have the luxury of being known when the entity gets persisted. For this reason, it is safe to use them in the equals
and hashCode
implementation:
@Entity(name = "Book") @Table(name = "book") public class Book implements Identifiable<Long> { @Id @GeneratedValue private Long id; private String title; @NaturalId private String isbn; @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Book)) return false; Book book = (Book) o; return Objects.equals(getIsbn(), book.getIsbn()); } @Override public int hashCode() { return Objects.hash(getIsbn()); } //Getters and setters omitted for brevity }
For more details about the
@NaturalId
annotation, check out this article.
Database-generated identifiers
The database-generated identifiers are a different story. Because the identifier is assigned by the database during flush-time, the consistency guarantee breaks if we implemented the equals and hashCode based on the identifier just like for assigned identifiers.
This issue was detailed in my article, How to implement equals and hashCode using the entity identifier (primary key).
Therefore, whenever you have a database-generated identifier, a synthetic key (be it a numeric identifier or a database UUID type), you have to use the following equals
and hashCode
implementation:
@Entity(name = "Post") @Table(name = "post") public class Post implements Identifiable<Long> { @Id @GeneratedValue private Long id; private String title; public Post() {} @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Post)) return false; Post other = (Post) o; return id != null && id.equals(other.getId()); } @Override public int hashCode() { return getClass().hashCode(); } //Getters and setters omitted for brevity }
So, the hashCode
yields the same value across all entity state transitions, and the equals
method is going to use the identifier check only for non-transient entities.
That’s it!
The only time when you’ll see a performance bottleneck due to a single hash bucket is if you have a large collection of tens of thousands of entries.
But then, it implies that you fetched that large collection from the database. The performance penalty of fetching such a collection from the database is multiple orders of magnitude higher than the single bucket overhead.
That’s why you never map large collections with Hibernate. You use queries for those instead. But then, for small collections.
Also, most of the time you don’t even need to use a
Set
or aMap
. For bidirectional associations,List(s)
perform better anyway.
More misconceptions
Mark has written a blog post to justify his beliefs.
In his article, Marks says that the database-generated identifier equality implementation does not work for merge
or getReference()
.
Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().
How to implement equals and hashCode using the JPA entity identifier (primary key) article demonstrates that this equals implementation works for detached objects. That was the whole point of coming up with such an implementation. We want it to work across all entity state transitions.
As for getReference()
, there’s a check for that as well. It’s all on GitHub.
There’s one argument which I agree with, and that’s about making sure that the equality check is using only entity attributes that are immutable. That’s why the entity identifier sequence number is very appealing. And with the equality implementation method that I offer you, you can use it safely.
Unfortunately, Mark continues with more misconceptions, like:
Why do you need equals() and hashCode() at all?
This is a good question. And my answer is: “you don’t !”
Well, you DO!
If you don’t implement equals
and hashCode
then the merge test will fail, therefore breaking the consistency guarantee. It’s all explained in my How to implement equals and hashCode using the entity identifier (primary key) article, by the way.
And another misconception, from a Hibernate point of view
Why you shouldn’t store managed and detached entities in the same Collection
Not only that you should NOT avoid mixing detached and managed entities, but this is actually a great feature that allows you to hold on detached objects, and therefore prevent lost updates in long conversations.
And yet another misconception, from a Hibernate implementation perspective:
So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed.
Hibernate strives to deliver strong consistency. That’s why the READ_WRITE and TRANSACTIONAL cache concurrency strategies allow you to not worry about such inconsistencies. It’s the second-level cache provider that guarantees this isolation level. Just like a relational database system.
Only NONSTRICT_READ_WRITE offers a weaker isolation level, but the non strict naming choice is self-descriptive after all.
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
The best advice I can give you is that you should always question every statement that you read on the Internet. You should always check every advice against your current JPA provider implementation because details make a very big difference.
