The best way to implement equals, hashCode, and toString with JPA and Hibernate

Bytecode enhancement and toString

Last week, Mark Struberg, who is an Apache Software Foundation member and OpenJPA contributor, made the following statement:

Basically, he says that implementing toString is bad from a performance perspective. Well, that might be the case in OpenJPA, but in Hibernate things are a little bit different. Hibernate does not use bytecode enhancement by default.

Therefore, the toString method can use any basic entity attributes (that are needed to identify a certain entity in logs) as long as the basic attributes are fetched when the entity is loaded from the database.

Nevertheless, Hibernate allows attributes to be lazy loaded, but even then, the bytecode enhancement is not the necessarily the bets approach. Using subentities might be a better alternative, and it does not even require bytecode enhancement.

Equals and hashCode

Unfortunately, Mark continues this discussion with this very misleading statement about equals and hashCode:

This statement is wrong, as this post will demonstrate in great detail.

Equality contract

According to Java specification, a good equal implementation must have the following properties:

  1. reflexive
  2. symmetric
  3. transitive
  4. consistent

The first three are rather intuitive, but ensuring consistency in the context of JPA and Hibernate entities is usually the biggest challenge for developers.

As already explained, equals and hashCode must behave consistently across all entity state transitions.

Identifier types

From an equal contract perspective, the identifiers can be split into two categories:

  • Assigned identifiers
  • Database-generated identifiers

Assigned identifiers

Assigned identifiers are allocated prior to flushing the Persistence Context, and we can further split them into two subcategories:

  • Natural identifiers
  • Database-agnostic UUIDs

Natural identifiers are assigned by a third-party authority, like a book ISBN.

Database-agnostic UUID numbers are generated outside of the database, like calling the java.util.UUID#randomUUID method.

Both natural identifiers and database-agnostic UUIDs have the luxury of being known when the entity gets persisted. For this reason, it is safe to use them in the equals and hashCode implementation:

@Entity(name = "Book")
@Table(name = "book")
public class Book 
    implements Identifiable<Long> {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    private String isbn;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return Objects.equals(getIsbn(), book.getIsbn());
    }

    @Override
    public int hashCode() {
        return Objects.hash(getIsbn());
    }

    //Getters and setters omitted for brevity
}

Database-generated identifiers

The database-generated identifiers are a different story. Because the identifier is assigned by the database during flush-time, the consistency guarantee breaks if we implemented the equals and hashCode based on the identifier just like for assigned identifiers.

This issue was detailed in my article, How to implement equals and hashCode using the entity identifier (primary key).

Therefore, whenever you have a database-generated identifier, a synthetic key (be it a numeric identifier or a database UUID type), you have to use the following equals and hashCode implementation:

@Entity
public class Book implements Identifiable<Long> {
 
    @Id
    @GeneratedValue
    private Long id;
 
    private String title;
 
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Book)) return false;
        Book book = (Book) o;
        return getId() != null && Objects.equals(getId(), book.getId());
    }
 
    @Override
    public int hashCode() {
        return 31;
    }
 
    //Getters and setters omitted for brevity
}

So, the hashCode yields the same value across all entity state transitions, and the equals method is going to use the identifier check only for non-transient entities.

That’s it!

The only time when you’ll see a performance bottleneck due to a single hash bucket is if you have a large collection with tens of thousands of entries.

But then, it implies that you fetched that large collection from the database. The performance penalty of fetching such a collection from the database is multiple orders of magnitude higher than the single bucket overhead.

That’s why you never map large collections with Hibernate. You use queries for those instead. But then, for small collections.

Also, most of the time you don’t even need to use a Set or a Map. For bidirectional associations, List(s) perform better anyway.

More misconceptions

Mark has written a blog post to justify his beliefs.

In his article, Marks says that the database-generated identifier equality implementation does not work for merge or getReference().

Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().

How to implement equals and hashCode using the JPA entity identifier (primary key) article demonstrates that this equals implementation works for detached objects. That was the whole point of coming up with such an implementation. We want it to work across all entity state transitions.

As for getReference(), there’s a check for that as well. It’s all on GitHub.

There’s one argument which I agree with, and that’s about making sure that the equality check is using only entity attributes that are immutable. That’s why the entity identifier sequence number is very appealing. And with the equality implementation method that I offer you, you can use it safely.

Unfortunately, Mark continues with more misconceptions, like:

Why do you need equals() and hashCode() at all?

This is a good question. And my answer is: “you don’t !”

Well, you DO!

If you don’t implement equals and hashCode then the merge test will fail, therefore breaking the consistency guarantee. It’s all explained in my How to implement equals and hashCode using the entity identifier (primary key) article, by the way.

And another misconception, from a Hibernate point of view

Why you shouldn’t store managed and detached entities in the same Collection

Not only that you should NOT avoid mixing detached and managed entities, but this is actually a great feature that allows you to hold on detached objects, and therefore prevent lost updates in long conversations.

And yet another misconception, from a Hibernate implementation perspective:

So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed.

Hibernate strives for delivering strong consistency. That’s why the READ_WRITE and TRANSACTIONAL cache concurrency strategies allow you to not worry about such inconsistencies. It’s the second-level cache provider that guarantees this isolation level. Just like a relational database system.

Only NONSTRICT_READ_WRITE offers a weaker isolation level, but the non strict naming choice is self-descriptive after all.

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

The best advice I can give you is that you should always question every statement that you read on the Internet. You should always check every advice against your current JPA provider implementation because details make a very big difference.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

23 thoughts on “The best way to implement equals, hashCode, and toString with JPA and Hibernate

  1. Hi Vlad!

    The 140 char limit on Twitter isn’t really helpful to make to the point statements. At least it seems you totally missed my 2nd tweet. But since the context between 2 tweets often gets lost I thus added a longer explanation as blog post https://struberg.wordpress.com/2016/10/15/tostring-equals-and-hashcode-in-jpa-entities/

    I you have read my blog post carefully then you might notice that your first sentence is quite off from what I expressed. You missed that I am focusing on PORTABLE behaviour (which you as a Hibernate maintainer obviously don’t care about) and on DEFAULT IDE generated behaviour (which will hit Hibernate as well).

    To get things straight: most IDEs do a bad job in GENERATING a good toString(), equals() and hashCode() method. They usually just pick all the attributes. Those generated methods usually also include attributes which are @ElementCollections or @OneToMany fields. I guess you agree with me that touching those fields in e.g. toString() will also trigger lazyLoading even in Hibernate (if not set to fetch EAGER anyway)? Right?

    2nd point: I explicitly showed a toString() one should use for Hibernate! But please people, only print the class name and the primary key. What’s wrong with that? Please explain, I don’t get what you think is wrong with that.
    I also noted that you don’t need to implement any toString() for OpenJPA because here the SimpleClassName + PK is already the default which gets printed out.

    Same applies to equals() and hashCode(). The generated ones (with all the fields) are almost always BS. Don’t you agree?

    Regarding equals() and hashCode(). I already pointed to your solution in my blog post. It is very close to a working solution but still does not solve all edges.
    Oh and to be honest, I’ve rarely seen an entity with at least your version of the equals() and hashCode()…
    Example: https://github.com/vladmihalcea/high-performance-java-persistence/blob/ac1f8af5de04dcbc7cae225f05573e5ba278f207/core/src/test/java/com/vladmihalcea/book/hpjp/hibernate/association/BidirectionalManyAsOneToManyWithoutEmbeddedIdTest.java#L148
    What if someone changes the title?

    Post p;..
    p.setTitle("A");
    Set posts = new HashSet();
    posts.add(p);
    p.setTitle("B");
    posts.add(p);
    p.size() // -> whoops, will be 2 but should be 1
    

    Just google for >java “@Entity” “public boolean equals” site:github.com< and tell me how many correct impls you did find…

    My fazit was that it is often better to just omit any own equals() and hashCode() and fall back to Object.equals because it works more often than the other things I’ve seen in most existing code.

    Ad the parallel access to managed entities: It’s ok to mention specific Hibernate hacks to solve this. But on top of my blog post I explicitly noted that I’m talking about portable solutions. Of course I perfectly understand that YOU don’t care much about portability. But some users do.

  2. Portability is a feature. Performance is also a feature. Unfortunately, you cannot always have both. This is not just about JPA providers. The same argument applies to relational databases.

    There is nothing wrong to using Native SQL that is specific to Oracle or SQL Server. There is nothing wrong in using Hibernate-specific features that boost application performance.

    The example that you showed me. You have a single object reference p which you try to add twice to the Set. If the Object was not persisted, the size will be one. You can run a test gainst my equls implementation too. The result is 1 because equals checks for object reference equality in the first line of the equals method implementation.

  3. Hi Vlad!

    It’s good to see that you reworked your post, removed a few claims and stated others more precisely.

    > The result is 1 because equals checks for 
    > object reference equality in the first line of the equals method implementation.
    

    No it won’t! Please look at the link again! Even YOU used a wrong hashCode(). Your impl returns Objects.hash(title);. It does exactly NOT return a hardcoded hash code! Thus it wont find the first version in the HashSet. And look around on github and even better in real world projects (I’ve seen many) – the decently correct version is almost nowhere used. You code was literally just the first hit on google…

    The other point is that if you don’t explicitly differentiate between attached and managed entities then you might end up having multiple instances for the same row in the DB. But with different attributes maybe. So they are equals() (according to your impl) but still have different content. What will happen if you store the them somewhere in a Set? First one wins? Nice random generator…
    It also gets much more tricky o a Cluster.

    Regarding portability. In my example I explained why I don’t like 2nd level caches. So we are talking about different things here. It’s not a matter of right or wrong but a matter of like it or not.

    1. Where did you get the Objects.hash(title) because that was never on this page. The first example uses the hash of the ISBN while the second one uses a hard-coded value, 31. I’m not sure what you are talking about.

      The other point is that if you don’t explicitly differentiate between attached and managed entities then you might end up having multiple instances for the same row in the DB

      That’s never the case for Hibernate. You’re talking about hypothetical situations. Please send me a GitHub example with Hibernate and this equals/hashCode implementation that breaks for managed/detached or that generates multiple instance of the same Object in the DB. Remember, the same Object is a shady term here. If you save two transient Objects with the same content, they are two different rows because they have different PK.

      The GitHub example uses a Set, by the way. Check it out and let me know what use case I’ve missed. Feel free to send a Pull Request.

      As for second-level cache, it really depends on the implementation, so it’s hard to generalize.

      1. What if the title is immutable? That’s a totally different test there. It does not even make sense to compare it with the ones I’ve described here. If the title is immutable, then it’s just fine to use the title too.

  4. First: the equals/hashCode in your own example doesn’t follow your ‘rules’, right? Ok, we do agree it seems. You cannot have different equals/hashCode depending on the usage. One entity, one equals/hashcode. Also agree?

    Second: I have no clue what you did like to show in that sample, but that equals/hashCode make no sense to me. Insert 3 Books with the same title (there are TONS of books which share the same title). They have 3 different IDs, they are of course 3 different physical books. Yet they are all equals(). Simply makes no sense to me.
    In this very case you would probably do be much better to rely on the instance equality (and remove your own instance/hashCode method).

    1. That’s an example meant to demonstrate what SQL statements are executed for a particular association. It was not meant to demonstrate how equals should be implemented. You can find other tests that intentionally break rules because this repository is a playing ground. It’s not an enterprise system repo.

  5. Hi Vlad,
    you look like a big fan of optimistic locking technique
    and Session-per-request-with-detached-objects pattern.

    I’m also starting to think is on of the best approach and suitable in many situations,
    but I fail to understand it totally I guess.

    Is it correct to say that if we use optimistic locking with detached entities, that is session-per-request-with-detached-objects pattern, to cope with long conversations like this one, the level of isolation we get is read committed plus protection against lost updates ?

    So we get more than only read committed but less than repeteable read ?

    Can you confirm or dismiss my assertion ?

    Thanks,
    I’m looking forward to try out your book.

    Ralf

      1. I don’t get how can that be possible,
        Suppose Alice loads an object on her browser, (hibernate sessions closes )
        while she is thinking of purchasing or not that object,
        that same object on db can be update on a record on the db,
        so if the oibject will be read by Alice next in another hibernate session , it will have another value.

      2. I can’t get it.
        As soon as Alice loads an object into her client the hibernate session closes.
        in the meantime the object can be modified by others,
        So in the next hibernate session that Alice can possibly issue that object could have a different
        status.

      3. That’s where detached objects come into play. Or PersistenceContextType.EXTENDED. The Session offers application-level repeatable reads. But if you close the Session, you need to retain the detached entities in some other stateful store, and reattach the detached entities in the new Session that’s opened with a new HTTP request.

        Checkout this article for more info.

      4. Ok I get it, but if in the secon transaction read again read from the db the object having the same is as the detached object I could find that its status is different from the detached object that I sent to the client in the previous transaction right ?

    1. Suppose we are in an Ejb method: Supppse A is an object that just came back from a client and is still detached. A’ is the object with the same id as A but we are retrieving it using Hibernate find method,they Could have different states.right?

  6. Hey Vlad,
    I’ve come up with another doubt:

    Suppose use read committed isolation level on set on the db
    and use jpa optimistic locking to raise this level to repeteable read and to
    enable long conversation transactions.

    What about triggers and store procedures that can run on the db ?
    For example, should we check and update version field manually ?

    1. JPA optimistic locking does not raise the isolation level to Repeatable Read. It’s just that it allows you to overcome lost updates across multiple DB transactions.

      As for stored procedures and triggers, that’s a totally different discussion. Basically, it depends on your data integrity requirements and what the stored procedures do.

    1. No, it does not. That’s why you need o use explicit optimistic locking requests. And even so, you won’t support phantom read skew if the DB does not support index range locks or predicate locks.

      1. Yeah using explicit optimistic locking and making good use of OPTIMISTIC_FORCE_INCREMENT we have repeteable read, right ?

        we still can have phantoms reads REPETEABLE READ allows them.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s