Posted on August 22, 2023 by vladmihalcea

The best way to hide the JPA entity identifier

Last modified:

Are you struggling with performance issues in your Spring, Jakarta EE, or Java EE application?

Imagine having a tool that could automatically detect performance issues in your JPA and Hibernate data access layer long before pushing a problematic change into production!

With the widespread adoption of AI agents generating code in a heartbeat, having such a tool that can watch your back and prevent performance issues during development, long before they affect production systems, can save your company a lot of money and make you a hero!

Hypersistence Optimizer is that tool, and it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, Micronaut, or Play Framework.

So, rather than allowing performance issues to annoy your customers, you are better off preventing those issues using Hypersistence Optimizer and enjoying spending your time on the things that you love!

Introduction

In this article, I’m going to show you the best way to hide the JPA entity identifier so that the users of your application won’t be able to guess and access data that belongs to other users.

This has been a recurring question that I’ve been getting when running training or workshops, so I decided it’s a good idea to formalize it in an article.

Domain Model

Let’s assume we are using the following Post and PostComment entities:

Masquerade Id Entities

Notice that both the Post and PostComment use numerical identifiers, and, most often, this is a great choice since auto-incremented identifiers are great for B+Tree indexes.

If you’re using MySQL, MariaDB, or SQL Server, the database table is shaped as a Clustered Index, it’s much more efficient to use an auto-incremented numerical identifier for the Primary Key instead of a random UUID one.

Why hide the JPA entity identifier

While numeric identifiers have a lot of advantages:

They can be as compact as possible since we can choose between 1, 2, 4, or 8 bytes (tinyint, smallint, int, bigint).
Being monotonically increasing, they are very suitable for B+Tree indexes that you will create for your Primary and Foreign Key columns.

There is also one big disadvantage associated with using numeric record identifiers. The identifiers are very easy to guess, and this may allow clients to fabricate HTTP requests with values that might extract data they are not supposed to access.

So, for this very single reason, it’s quite common for developers to choose UUIDs instead of numeric identifiers. However, as I explained in this article, UUIDs are not a great choice from a performance consideration because:

They are huge (128 bits), and this puts pressure on the number of table records and index entries you can cache in the Buffer Pool.
The v4 UUID values are random, which affects the B+Tree page fill factor and the number of balancing operations that will be triggered due to the randomness of new entries.

So, instead of choosing a UUID and suffering from performance issues, there is a much better way.

Masquerading the JPA entity identifier

Since the numerical identifiers are fine from the database storage perspective, we don’t really need to switch to random column values just to make it harder for people to guess the record identifiers.

Instead, we can simply encrypt the row identifiers when we send them to the client and decrypt them back when the client sends back the values in a subsequent request.

For instance, let’s say that we want to extract the latest Post records that have been created, and since we might have a lot of entries, we use Keyset Pagination, as I explained in this article.

Since we are using Keyset Pagination, we are implementing the Top-N and Next-N data access methods in the following custom Repository:

public class CustomPostRepositoryImpl implements CustomPostRepository {

    private final EntityManager entityManager;

    private final CriteriaBuilderFactory criteriaBuilderFactory;

    public CustomPostRepositoryImpl(
            EntityManager entityManager,
            CriteriaBuilderFactory criteriaBuilderFactory) {
        this.entityManager = entityManager;
        this.criteriaBuilderFactory = criteriaBuilderFactory;
    }

    @Override
    public PagedList<PostDTO> findTopN(Sort sortBy, int pageSize) {
        return sortedCriteriaBuilder(sortBy)
            .page(0, pageSize)
            .withKeysetExtraction(true)
            .getResultList();
    }

    @Override
    public PagedList<PostDTO> findNextN(
            Sort sortBy, 
            PagedList<PostDTO> previousPage) {
        return sortedCriteriaBuilder(sortBy)
            .page(
                previousPage.getKeysetPage(),
                previousPage.getPage() * previousPage.getMaxResults(),
                previousPage.getMaxResults()
            )
            .getResultList();
    }

    private CriteriaBuilder<PostDTO> sortedCriteriaBuilder(
            Sort sortBy) {
        CriteriaBuilder<Post> criteriaBuilder = criteriaBuilderFactory
            .create(entityManager, Post.class)
            .from(Post.class, "p");
        sortBy.forEach(order -> {
            criteriaBuilder.orderBy(
                order.getProperty(), 
                order.isAscending()
            );
        });
        return criteriaBuilder.selectNew(PostDTO.class)
            .with("p.id")
            .with("p.title")
            .end();
    }
}

Notice that both the Top-N and Next-N methods don’t fetch the Post entity. Instead, they return a paginated list of PostDTO instances.

The PostDTO class looks as follows:

public class PostDTO {

    private final String id;

    private final String title;

    public PostDTO(Long id, String title) {
        this.id = CryptoUtils.encrypt(id);
        this.title = title;
    }

    public String getId() {
        return id;
    }

    public String getTitle() {
        return title;
    }
}

Notice that the type of the id is String and that it will store the encrypted value of the actual Post identifier. By encrypting the entity identifier, the client will no longer be able to guess its value or the value of any other identifier of the Post table records.

The CryptoUtils class defines the encrypt and decrypt methods, and if you’re curious about it, you can take a look at it on GitHub.

Now, when fetching the first page of PostDTO entries, we can see that while the identifier values are encrypted for the external users, we are still able to decrypt their values when needed:

PagedList<PostDTO> topPage = forumService.firstLatestPosts(PAGE_SIZE);

List<String> topIds = topPage.stream()
    .map(PostDTO::getId)
    .toList();

assertEquals(
    "3qEiB21WnB/yQ4muQe6cpw==", 
    topIds.get(0)
);  
assertEquals(
    Long.valueOf(50), 
    CryptoUtils.decrypt(topIds.get(0), Long.class)
);

assertEquals(
    "9jfsI1A92KIzd34ZfRxgtQ==", 
    topIds.get(1)
);
assertEquals(
    Long.valueOf(49), 
    CryptoUtils.decrypt(topIds.get(1), Long.class)
);

If the clients want to access the comments of a given post entry, they can call the following service method:

public List<PostCommentDTO> findCommentsByPost(String postId) {
    return postRepository.findCommentsByPost(
        CryptoUtils.decrypt(postId, Long.class)
    );
}

Notice that we are decrypting the Post identifier prior to calling the PostRepository method that extracts the list of PostCommentDTO entries.

The PostCommentDTO can also masquerade the actual PostComment identifier:

public class PostCommentDTO {

    private final String id;

    private final String review;

    public PostCommentDTO(Long id, String review) {
        this.id = CryptoUtils.encrypt(id);
        this.review = review;
    }

    public String getId() {
        return id;
    }

    public String getReview() {
        return review;
    }
}

And you can see that comment identifiers are encrypted as well prior to sending them back to the client:

List<PostCommentDTO> comments = forumService.findCommentsByPost(
    firstPost.getId()
);

assertEquals(
    10, 
    comments.size()
);
assertEquals(
    "ltAKs4jLw8N7q7SHeUR2Kw==", 
    comments.get(0).getId()
);
assertEquals(
    Long.valueOf(1), 
    CryptoUtils.decrypt(comments.get(0).getId(), Long.class)
);

That’s it!

How about performance?

After first publishing this article, I noticed that some developers were worried about the performance implictaions of using cryptography.

So, the following test case measures the duration of the encrypt and decrypt method execution:

MetricRegistry metricRegistry = new MetricRegistry();

Slf4jReporter logReporter = Slf4jReporter
	.forRegistry(metricRegistry)
	.outputTo(LOGGER)
	.convertDurationsTo(TimeUnit.MICROSECONDS)
	.build();

final Timer encryptTimer = metricRegistry.timer("encryptTimer");
final Timer decryptTimer = metricRegistry.timer("decryptTimer");

final ThreadLocalRandom random = ThreadLocalRandom.current();
int MAX_COUNT = 100_000;

LongStream.rangeClosed(1, MAX_COUNT/10).forEach(i -> {
    Long value = random.nextLong();
    String encryptedValue = CryptoUtils.encrypt(value);
    Long decryptedValue = CryptoUtils.decrypt(encryptedValue, Long.class);
    assertEquals(
        value.longValue(), 
        decryptedValue.longValue()
    );
});

LongStream.rangeClosed(1, MAX_COUNT).forEach(i -> {
    Long value = random.nextLong(i);
    
    long startNanos = System.nanoTime();
    String encryptedValue = CryptoUtils.encrypt(value);
    encryptTimer.update(
        (System.nanoTime() - startNanos), 
        TimeUnit.NANOSECONDS
    );

    startNanos = System.nanoTime();
    Long decryptedValue = CryptoUtils.decrypt(encryptedValue, Long.class);
    decryptTimer.update(
        (System.nanoTime() - startNanos), 
        TimeUnit.NANOSECONDS
    );
    
    assertEquals(
        value.longValue(), 
        decryptedValue.longValue()
    );
});

logReporter.report();

When running the test case above, we get the following metrics on my 6-year-old notebook:

name=encryptTimer, 
count=100000, 
min=5.2, 
max=47.6, 
mean=7.5871019652890395, 
stddev=3.9002472694863175, 
median=6.3, 
p75=7.5, 
p95=13.200000000000001, 
p98=22.6, 
p99=25.900000000000002, 
p999=39.4, 
duration_unit=microseconds

name=decryptTimer, 
count=100000, 
min=5.4, 
max=34.9, 
mean=7.9174440849941865, 
stddev=3.770417665194349, 
median=6.5, 
p75=8.1, 
p95=14.5, 
p98=22.5, 
p99=25.3, 
p999=33.0, 
duration_unit=microseconds

The average execution time is about 8 μs and the 99 percentile has a value of 25 μs.

Now, considering that we will apply this technique in a typical OLTP application that’s bound to fetch a very small number of record, then the overall overhead of using encryption is not going to be a deal breaker.

However, you can easily lower this overhead by using a LRU cache.

For instance, when using a LRU Cache with a 90.9% cache hit ratio, then we will get the following metrics:

name=encryptTimer, 
count=100000, 
min=0.1, 
max=33.4, 
mean=1.91390151098664, 
stddev=3.392790242522042, 
median=0.8, 
p75=1.4000000000000001, 
p95=10.0, p98=14.5,
p98=14.5,
p99=17.2, 
p999=21.400000000000002, 
duration_unit=microseconds

name=decryptTimer, 
count=100000, 
min=0.1, 
max=44.4, 
mean=1.9999605664739044, 
stddev=4.037352976657196, 
median=0.7000000000000001, 
p75=1.2, 
p95=10.5, 
p98=14.6, 
p99=17.8, 
p999=38.1, 
duration_unit=microseconds

Notice that the average duration of the encrypt and decrypt methods dropped to 2 μs. So, although the cryptographic methods are rather fast, using a LRU cache will help you lower its overhead to a couple of microseconds.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Encrypting and decrypting the entity identifier is a very straightforward technique that allows us to hide the underlying numerical value.

By using this strategy, we can enjoy the advantages of using numerical identifiers on the database side while also making sure that the users cannot get or even access data they are not supposed to extract.

Follow @vlad_mihalcea

High-Performance Java Persistence rocks!

Category: Hibernate Tags: hibernate, hide, identifier, jpa, masquerade

4 Comments on “The best way to hide the JPA entity identifier”

satya
August 31, 2023

How about using ULID instead of UUID

Reply
- vladmihalcea
  August 31, 2023
  
  ULID is just as huge as UUID. TSID is half the size of ULID and the probability of conflict can be very low.
  
  Reply
Jim
August 23, 2023

Excellent article. We have struggled with the issue of hiding the primary key from users for years. Such a simple solution. Thank You

Reply
- vladmihalcea
  August 23, 2023
  
  You’re welcome. If you liked this article, you are going to love my High-Performance Java Persistence video course.
  
  Reply

Vlad Mihalcea

The best way to hide the JPA entity identifier

Introduction

Domain Model

Why hide the JPA entity identifier

Masquerading the JPA entity identifier

How about performance?

Conclusion

Related

4 Comments on “The best way to hide the JPA entity identifier”

Leave a Reply Cancel reply

Let’s connect

Find Article

Become a Java Champion

Riveran

Book

Video Courses

Sponsored

Training

Hypersistence Optimizer

Tutorials

Social Media

About

Meta

Vlad Mihalcea

The best way to hide the JPA entity identifier

Introduction

Domain Model

Why hide the JPA entity identifier

Masquerading the JPA entity identifier

How about performance?

Conclusion

Thank you!

Related

4 Comments on “The best way to hide the JPA entity identifier”

Leave a Reply Cancel reply

Let’s connect

Find Article

Become a Java Champion

Riveran

Book

Video Courses

Sponsored

Training

Hypersistence Optimizer

Tutorials

Social Media

About

Meta