How to implement equals and hashCode using the JPA entity identifier (Primary Key)
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
You can earn a significant passive income stream from promoting my book, courses, tools, training, or coaching subscriptions.
If you're interested in supplementing your income, then join my affiliate program.
Introduction
As previously explained, using the JPA entity business key for equals
and hashCode
is always best choice. However, not all entities feature a unique business key, so we need to use another database column that is also unique, as the primary key.
But using the entity identifier for equality is very challenging, and this post is going to show you how you can use it without issues.
Test harness
When it comes to implementing equals
and hashCode
, there is one and only one rule you should have in mind:
Equals and hashCode must behave consistently across all entity state transitions.
To test the effectiveness of an equals
and hashCode
implementation, the following test can be used:
protected void assertEqualityConsistency( Class<T> clazz, T entity) { Set<T> tuples = new HashSet<>(); assertFalse(tuples.contains(entity)); tuples.add(entity); assertTrue(tuples.contains(entity)); doInJPA(entityManager -> { entityManager.persist(entity); entityManager.flush(); assertTrue( "The entity is not found in the Set after it's persisted.", tuples.contains(entity) ); }); assertTrue(tuples.contains(entity)); doInJPA(entityManager -> { T entityProxy = entityManager.getReference( clazz, entity.getId() ); assertTrue( "The entity proxy is not equal with the entity.", entityProxy.equals(entity) ); }); doInJPA(entityManager -> { T entityProxy = entityManager.getReference( clazz, entity.getId() ); assertTrue( "The entity is not equal with the entity proxy.", entity.equals(entityProxy)); }); doInJPA(entityManager -> { T _entity = entityManager.merge(entity); assertTrue( "The entity is not found in the Set after it's merged.", tuples.contains(_entity) ); }); doInJPA(entityManager -> { entityManager.unwrap(Session.class).update(entity); assertTrue( "The entity is not found in the Set after it's reattached.", tuples.contains(entity) ); }); doInJPA(entityManager -> { T _entity = entityManager.find(clazz, entity.getId()); assertTrue( "The entity is not found in the Set after it's loaded in a different Persistence Context.", tuples.contains(_entity) ); }); doInJPA(entityManager -> { T _entity = entityManager.getReference(clazz, entity.getId()); assertTrue( "The entity is not found in the Set after it's loaded as a proxy in a different Persistence Context.", tuples.contains(_entity) ); }); T deletedEntity = doInJPA(entityManager -> { T _entity = entityManager.getReference( clazz, entity.getId() ); entityManager.remove(_entity); return _entity; }); assertTrue( "The entity is not found in the Set even after it's deleted.", tuples.contains(deletedEntity) ); }
Natural id
The first use case to test is the natural id mapping. Considering the following entity:
@Entity public class Book implements Identifiable<Long> { @Id @GeneratedValue private Long id; private String title; @NaturalId private String isbn; @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Book)) return false; Book book = (Book) o; return Objects.equals(getIsbn(), book.getIsbn()); } @Override public int hashCode() { return Objects.hash(getIsbn()); } //Getters and setters omitted for brevity }
The isbn
property is also a @NaturalId
, therefore, it should be unique and not nullable. Both equals
and hashCode
use the isbn
property in their implementations.
For more details about the
@NaturalId
annotation, check out this article.
When running the following test case:
Book book = new Book(); book.setTitle("High-PerformanceJava Persistence"); book.setIsbn("123-456-7890"); assertEqualityConstraints(Book.class, book);
Everything works fine, as expected.
Default java.lang.Object equals and hashCode
What if our entity does not have any column that can be used as a @NaturalId
? The first urge is to not define your own implementations of equals
and hashCode
, like in the following example:
@Entity(name = "Book") public class Book implements Identifiable<Long> { @Id @GeneratedValue private Long id; private String title; //Getters and setters omitted for brevity }
However, when testing this implementation:
Book book = new Book(); book.setTitle("High-PerformanceJava Persistence"); assertEqualityConstraints(Book.class, book);
Hibernate throws the following exception:
java.lang.AssertionError: The entity is not found after it's merged
The original entity is not equal with the one returned by the merge method because two distinct Object(s) do not share the same reference.
Using the entity identifier for equals and hashCode
So if the default equals
and hashCode
is no good either, then let’s use the entity identifier for our custom implementation. Let’s just use our IDE to generate the equals
and hashCode
and see how it works:
@Entity public class Book implements Identifiable<Long> { @Id @GeneratedValue private Long id; private String title; @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Book)) return false; Book book = (Book) o; return Objects.equals(getId(), book.getId()); } @Override public int hashCode() { return Objects.hash(getId()); } //Getters and setters omitted for brevity }
When running the previous test case, Hibernate throws the following exception:
java.lang.AssertionError: The entity is not found after it's persisted
When the entity was first stored in the Set, the identifier was null. After the entity was persisted, the identifier was assigned to a value that was automatically generated, hence the hashCode differs. For this reason, the entity cannot be found in the Set after it got persisted.
Fixing the entity identifier equals and hashCode
To address the previous issue, there is only one solution: the hashCode should always return the same value:
@Entity public class Book implements Identifiable<Long> { @Id @GeneratedValue private Long id; private String title; @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Book)) return false; Book other = (Book) o; return id != null && id.equals(other.getId()); } @Override public int hashCode() { return getClass().hashCode(); } //Getters and setters omitted for brevity }
Also, when the entity identifier is null
, we can guarantee equality only for the same object references. Otherwise, no transient object is equal to any other transient or persisted object. That’s why the identifier equality check is done only if the current Object
identifier is not null.
With this implementation, the equals
and hashCode
test runs fine for all entity state transitions. The reason why it works is that the hashCode value does not change, hence, we can rely on the java.lang.Object
reference equality as long as the identifier is null
.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
And there is more!
You can earn a significant passive income stream from promoting all these amazing products that I have been creating.
If you're interested in supplementing your income, then join my affiliate program.
Conclusion
The entity identifier can be used for equals
and hashCode
, but only if the hashCode
returns the same value all the time. This might sound like a terrible thing to do since it defeats the purpose of using multiple buckets in a HashSet
or HashMap
.
However, for performance reasons, you should always limit the number of entities that are stored in a collection. You should never fetch thousands of entities in a @OneToMany
Set
because the performance penalty on the database side is multiple orders of magnitude higher than using a single hashed bucket.
All tests are available on GitHub.

Just curious, does anything besides DB-space-usage reasons, speak against a “surrogate business key” (yeah, sounds a bit like “a black white”), for example a UUID set via new (or when loading it from the db). This would allow the usage of the business key best practices without having to deal with the problems of a UUID as a primary key. In the end, a unique UUID shouldn’t be worse than a unique String as a business key, should it?
This would get around the fixed hashCode() (which somehow doesn’t feel right, even as I see that it should be ok for reasonable Sets).
Let’s assume you have a DB with 100 million records across several tables. By adding this extra UUID business key, you need 1.6 GB of RAM wasted in the Buffer Pool just for this column.
More, if it’s a business key, you’d start using it for queries, meaning you’d add indexes to it, which have poor fill factor due to the randomness of values and the order fashion of B+Tree indexing.
So, why not fix the problem at equals/hashCode level without adding extra columns and indexing that will hurt performance in the long run?
To be honest, it doesn’t feel like “fixing” it at this level, more like sabotaging the hashCode() method until it’s weak enough to not hinder you any more. Of course, I realize that this is not completely rational, since there’s literally nothing in the hashCode() contract that actually prevents you from having a static one, so, it’s not the most ideal implementation, but not an illegal one.
The problems I fear are also less with persistent Sets/Maps (those shouldn’t be too large, completely agree), but more when processing the entity in another layer. But of course, one could, for example, use a wrapper there that gives a nicer hashCode() for this purpose (just a random thought).
Thanks, will probably simply have to live with the uneasiness there and hope, that most of the time, a real business key will present itself. And in any case, it’s good to understand the advantages and drawbacks of any situation, to choose the least worse one for a given situation.
Great work, btw, enjoy reading through the blog so far (found already some quite nice pieces of advice) and will very likely also enjoy the book (and the video course) in the near future.
The Java
Object
equality is specific to the object itself, so what works for JPA entities will not work for objects that are supposed to reside only in memory to be processed in collections of hundreds of million of entries.In this case here, the
HashSet
performance will be similar to that of anArrayList
since you are going to inspect elements in a single bucket. However, callingcontains
on anArrayList
is not slow at all.In fact, there’s this article that shows the performance comparison between
HashSet
andArrayList
.For a collection of 10k elements, calling
contains
on theArrayList
takes 57 microseconds while theHashSet
takes 11 nanoseconds. You may say that the relative difference is huge, but for a data access layer, 50 microseconds is nothing compared to how much time it would take to fetch those 10k entities in the first place, which would take at least 250-500 milliseconds.In reality, collections will have at most 100 elements, as, otherwise, you are better off using a paginated query. For an
ArrayList
of 100 elements, contains will take less than 1 microsecond, so this single bucket penalty is not going to make any difference in terms of performance. However, it will make a lot of difference in terms of code quality, as you will avoid some very hard-to-track bugs.If you enjoyed this article, you are going to love the book and video courses, as they go into even more details that are often overlooked.