The best way to map a @NaturalId business key with JPA and Hibernate

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

Introduction

In this article, you are going to learn what the Hibernate natural id is and how you can use it to fetch entities based on a business key.

As I explained in this free chapter of my book, Hibernate offers many benefits over standard JPA. One such example is the @NaturalId mapping.

In this article, you are going to see what is the best way to map a natural business key when using Hibernate.

Domain Model

Considering we have the following Post entity:

Post NaturalId

The slug attribute is the business key for our Post entity. As I explained previously, we use a surrogate key as well because it’s much more compact and it puts less pressure on memory for both table and index pages.

The id property, being the entity identifier, can be marked with the JPA @Id annotation, but for the slug attribute, we need a Hibernate-specific annotation: @NaturalId.

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    @Column(nullable = false, unique = true)
    private String slug;

    //Getters and setters omitted for brevity

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) 
            return false;
        Post post = (Post) o;
        return Objects.equals(slug, post.slug);
    }

    @Override
    public int hashCode() {
        return Objects.hash(slug);
    }
}

As I explained previously, implementing equals and hashCode is straightforward when the entity defines a natural identifier.

If the entity does not define a natural identifier, implementing equals and hashCode should be done as I explained in this article.

Natural id fetching

Hibernate allows you to fetch entities either directly, via the entity identifier, or through a JPQL or SQL query.

Just like with the JPA @Id annotation, the @NaturalId allows you to fetch the entity if you know the associated natural key.

So, considering you have the following Post entity:

Post post = new Post();
post.setTitle("High-Performance Java persistence");
post.setSlug("high-performance-java-persistence");

entityManager.persist(post);

Knowing the natural key, you can now fetch the Post entity as follows:

String slug = "high-performance-java-persistence";

Post post = entityManager.unwrap(Session.class)
.bySimpleNaturalId(Post.class)
.load(slug);

If you have a single @NaturalId attribute defined in your entity, you should always use the bySimpleNaturalId method. However, in case you have a compound @NaturalId, meaning that you declared more than one @NaturalId properties, then you need to use the byNaturalId method instead:

Post post = entityManager.unwrap(Session.class)
.byNaturalId(Post.class)
.using("slug", slug)
.load();

That’s great because the slug attribute is what the client will see in the browser address bar. Since the post URL can be bookmarked, we can now load the Post by the slug attribute sent by the client.

However, to fetch the entity by its natural key, Hibernate generates the following SQL statements:

SELECT p.id AS id1_0_
FROM post p
WHERE p.slug = 'high-performance-java-persistence'

SELECT p.id AS id1_0_0_,
       p.slug AS slug2_0_0_,
       p.title AS title3_0_0_
FROM post p
WHERE p.id = 1

The first query is needed to resolve the entity identifier associated with the provided natural identifier.

The second query is optional if the entity is already loaded in the first or the second-level cache.

The reason for having the first query is because Hibernate already has a well-established logic for loading and associating entities by their identifier in the Persistence Context.

Optimizing the entity identifier retrieval

Just like you can avoid hitting the database to fetch an entity, you can skip the entity identifier retrieval by its associated natural key using the Hibernate @NaturalIdCache:

@Entity(name = "Post")
@Table(name = "post")
@org.hibernate.annotations.Cache(
    usage = CacheConcurrencyStrategy.READ_WRITE
)
@NaturalIdCache
public class Post {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @NaturalId
    @Column(nullable = false, unique = true)
    private String slug;

    //Getters and setters omitted for brevity

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) 
            return false;
        Post post = (Post) o;
        return Objects.equals(slug, post.slug);
    }

    @Override
    public int hashCode() {
        return Objects.hash(slug);
    }
}

We also annotated the entity using the Hibernate-specific @Cache annotation so that we declare a READ_WRITE Cache Concurrency Strategy.

This time, when running the previous example and fetch the Post entity, Hibernate generates zero SQL statements.

Because the READ_WRITE Cache Concurrency Strategy is write-through, the Post entity is cached during the persist operation, along with the natural key to identifier mapping.

If we were using NONSTRICT_READ_WRITE Cache Concurrency Strategy, the Post entity would be cached upon being accessed for the very first time.

However, for READ_WRITE, we don’t have to hit the database at all when fetching our Post entity. Cool, right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Seize the deal! 40% discount. Seize the deal! 40% discount.

Conclusion

The @NaturalId annotation is a very useful Hibernate feature that allows you to retrieve entities by their natural business key without even hitting the database.

Transactions and Concurrency Control eBook

6 Comments on “The best way to map a @NaturalId business key with JPA and Hibernate

  1. Hi Vlad,

    thanks for this read. Regarding

    “The first query is needed to resolve the entity identifier associated with the provided natural identifier.

    The second query is optional if the entity is already loaded in the first or the second-level cache.

    The reason for having the first query is because Hibernate already has a well-established logic for loading and associating entities by their identifier in the Persistence Context.”

    I do not get how this is the most performant way. 2 doubts:

    for the “check if it is in the cache” part, for this to be most performant, iterating the cache for a naturalId match must be more expensive than a query for the id and checking if the id is present in the cache. I can imagine that iterating for the naturalId is really inefficient compared to checking if the id is in the cache, but will that outweigh the cost of the database hit?
    if the entity that is loaded by naturalId turns out not to be in the cache, a select is needed to fetch it. If the first query that selected only the id based on the naturalId would have selected all attributes for the entity, the second query is not needed on a cache miss. If the entity is in cache you use that and you loaded the attributes while you do not need them. But if there is a cache miss, there is no need for the select select query. What is the most efficient way would depend on the miss/hit ratio, but selecting some extra columns to save an extra select seems considerable at least.

    Jos

    • You don’t have the use the natural id API. I like the @NaturalId annotation as a way to document that a given entity property or combination of properties is a business key. Then, based on the application configuration, I can choose whether to use the natural id API or a query.

      • Yes, and the information in this post contributes in making a good decision on whether or not using the natural id API.

        But within the natural Id API, my assumption is hibernate operates the most performant way. I guess there is a catch with my 2 suggestions that are the reason Hibernate does not do it that way, but I do see not the catch. Knowing those catches would improve my understanding.

      • It’s very easy to understand why Hibernate was implemented like that. You said it too:

        The reason for having the first query is because Hibernate already has a well-established logic for loading and associating entities by their identifier in the Persistence Context.

        Hibernate already has an API for resolving the entity by its identifier, so it was reused for this use case. In terms of performance, it’s unlikely to see a big difference if you are loading a single entity even if you go twice to the DB. The first query might be served from a B+Tree index if you add one for the natural id and the id. This will be very fast. The second one is also fast as it uses the PK index to locate the record. If you have to load multiple entities, then a JPQL query is a much better option.

  2. I have a BaseEntity in my projects where the id is a Long.

    In terms of DB and Entity Design:
    would you say it is OK if some entities would use a different Id type like here the string, where it can be sort of a natural identifier? Or should i keep the id consistently a Long and use the @NaturalId separtely on a separate field like you showed it?

    • As I expained in my High-Performance Java Persistence book, having the @Id in a base class is not a good idea, as the @Id definition cannot be overridden.

      Of course you can have entities use a different type id. Some entities can use numerical PK, others need to use composite identifiers, other entities can use a UUID PK.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.