The best way to hide the JPA entity identifier
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, I’m going to show you the best way to hide the JPA entity identifier so that the users of your application won’t be able to guess and access data that belongs to other users.
This has been a recurring question that I’ve been getting when running training or workshops, so I decided it’s a good idea to formalize it in an article.
Domain Model
Let’s assume we are using the following Post
and PostComment
entities:
Notice that both the Post
and PostComment
use numerical identifiers, and, most often, this is a great choice since auto-incremented identifiers are great for B+Tree indexes.
If you’re using MySQL, MariaDB, or SQL Server, the database table is shaped as a Clustered Index, it’s much more efficient to use an auto-incremented numerical identifier for the Primary Key instead of a random UUID one.
Why hide the JPA entity identifier
While numeric identifiers have a lot of advantages:
- They can be as compact as possible since we can choose between 1, 2, 4, or 8 bytes (
tinyint
,smallint
,int
,bigint
). - Being monotonically increasing, they are very suitable for B+Tree indexes that you will create for your Primary and Foreign Key columns.
There is also one big disadvantage associated with using numeric record identifiers. The identifiers are very easy to guess, and this may allow clients to fabricate HTTP requests with values that might extract data they are not supposed to access.
So, for this very single reason, it’s quite common for developers to choose UUIDs instead of numeric identifiers. However, as I explained in this article, UUIDs are not a great choice from a performance consideration because:
- They are huge (128 bits), and this puts pressure on the number of table records and index entries you can cache in the Buffer Pool.
- The v4 UUID values are random, which affects the B+Tree page fill factor and the number of balancing operations that will be triggered due to the randomness of new entries.
So, instead of choosing a UUID and suffering from performance issues, there is a much better way.
Masquerading the JPA entity identifier
Since the numerical identifiers are fine from the database storage perspective, we don’t really need to switch to random column values just to make it harder for people to guess the record identifiers.
Instead, we can simply encrypt the row identifiers when we send them to the client and decrypt them back when the client sends back the values in a subsequent request.
For instance, let’s say that we want to extract the latest Post
records that have been created, and since we might have a lot of entries, we use Keyset Pagination, as I explained in this article.
Since we are using Keyset Pagination, we are implementing the Top-N and Next-N data access methods in the following custom Repository:
public class CustomPostRepositoryImpl implements CustomPostRepository { private final EntityManager entityManager; private final CriteriaBuilderFactory criteriaBuilderFactory; public CustomPostRepositoryImpl( EntityManager entityManager, CriteriaBuilderFactory criteriaBuilderFactory) { this.entityManager = entityManager; this.criteriaBuilderFactory = criteriaBuilderFactory; } @Override public PagedList<PostDTO> findTopN(Sort sortBy, int pageSize) { return sortedCriteriaBuilder(sortBy) .page(0, pageSize) .withKeysetExtraction(true) .getResultList(); } @Override public PagedList<PostDTO> findNextN( Sort sortBy, PagedList<PostDTO> previousPage) { return sortedCriteriaBuilder(sortBy) .page( previousPage.getKeysetPage(), previousPage.getPage() * previousPage.getMaxResults(), previousPage.getMaxResults() ) .getResultList(); } private CriteriaBuilder<PostDTO> sortedCriteriaBuilder( Sort sortBy) { CriteriaBuilder<Post> criteriaBuilder = criteriaBuilderFactory .create(entityManager, Post.class) .from(Post.class, "p"); sortBy.forEach(order -> { criteriaBuilder.orderBy( order.getProperty(), order.isAscending() ); }); return criteriaBuilder.selectNew(PostDTO.class) .with("p.id") .with("p.title") .end(); } }
Notice that both the Top-N and Next-N methods don’t fetch the Post
entity. Instead, they return a paginated list of PostDTO
instances.
The PostDTO
class looks as follows:
public class PostDTO { private final String id; private final String title; public PostDTO(Long id, String title) { this.id = CryptoUtils.encrypt(id); this.title = title; } public String getId() { return id; } public String getTitle() { return title; } }
Notice that the type of the id
is String
and that it will store the encrypted value of the actual Post
identifier. By encrypting the entity identifier, the client will no longer be able to guess its value or the value of any other identifier of the Post
table records.
The
CryptoUtils
class defines theencrypt
anddecrypt
methods, and if you’re curious about it, you can take a look at it on GitHub.
Now, when fetching the first page of PostDTO
entries, we can see that while the identifier values are encrypted for the external users, we are still able to decrypt their values when needed:
PagedList<PostDTO> topPage = forumService.firstLatestPosts(PAGE_SIZE); List<String> topIds = topPage.stream() .map(PostDTO::getId) .toList(); assertEquals( "3qEiB21WnB/yQ4muQe6cpw==", topIds.get(0) ); assertEquals( Long.valueOf(50), CryptoUtils.decrypt(topIds.get(0), Long.class) ); assertEquals( "9jfsI1A92KIzd34ZfRxgtQ==", topIds.get(1) ); assertEquals( Long.valueOf(49), CryptoUtils.decrypt(topIds.get(1), Long.class) );
If the clients want to access the comments of a given post entry, they can call the following service method:
public List<PostCommentDTO> findCommentsByPost(String postId) { return postRepository.findCommentsByPost( CryptoUtils.decrypt(postId, Long.class) ); }
Notice that we are decrypting the Post
identifier prior to calling the PostRepository
method that extracts the list of PostCommentDTO
entries.
The PostCommentDTO
can also masquerade the actual PostComment
identifier:
public class PostCommentDTO { private final String id; private final String review; public PostCommentDTO(Long id, String review) { this.id = CryptoUtils.encrypt(id); this.review = review; } public String getId() { return id; } public String getReview() { return review; } }
And you can see that comment identifiers are encrypted as well prior to sending them back to the client:
List<PostCommentDTO> comments = forumService.findCommentsByPost( firstPost.getId() ); assertEquals( 10, comments.size() ); assertEquals( "ltAKs4jLw8N7q7SHeUR2Kw==", comments.get(0).getId() ); assertEquals( Long.valueOf(1), CryptoUtils.decrypt(comments.get(0).getId(), Long.class) );
That’s it!
How about performance?
After first publishing this article, I noticed that some developers were worried about the performance implictaions of using cryptography.
So, the following test case measures the duration of the encrypt
and decrypt
method execution:
MetricRegistry metricRegistry = new MetricRegistry(); Slf4jReporter logReporter = Slf4jReporter .forRegistry(metricRegistry) .outputTo(LOGGER) .convertDurationsTo(TimeUnit.MICROSECONDS) .build(); final Timer encryptTimer = metricRegistry.timer("encryptTimer"); final Timer decryptTimer = metricRegistry.timer("decryptTimer"); final ThreadLocalRandom random = ThreadLocalRandom.current(); int MAX_COUNT = 100_000; LongStream.rangeClosed(1, MAX_COUNT/10).forEach(i -> { Long value = random.nextLong(); String encryptedValue = CryptoUtils.encrypt(value); Long decryptedValue = CryptoUtils.decrypt(encryptedValue, Long.class); assertEquals( value.longValue(), decryptedValue.longValue() ); }); LongStream.rangeClosed(1, MAX_COUNT).forEach(i -> { Long value = random.nextLong(i); long startNanos = System.nanoTime(); String encryptedValue = CryptoUtils.encrypt(value); encryptTimer.update( (System.nanoTime() - startNanos), TimeUnit.NANOSECONDS ); startNanos = System.nanoTime(); Long decryptedValue = CryptoUtils.decrypt(encryptedValue, Long.class); decryptTimer.update( (System.nanoTime() - startNanos), TimeUnit.NANOSECONDS ); assertEquals( value.longValue(), decryptedValue.longValue() ); }); logReporter.report();
When running the test case above, we get the following metrics on my 6-year-old notebook:
name=encryptTimer, count=100000, min=5.2, max=47.6, mean=7.5871019652890395, stddev=3.9002472694863175, median=6.3, p75=7.5, p95=13.200000000000001, p98=22.6, p99=25.900000000000002, p999=39.4, duration_unit=microseconds name=decryptTimer, count=100000, min=5.4, max=34.9, mean=7.9174440849941865, stddev=3.770417665194349, median=6.5, p75=8.1, p95=14.5, p98=22.5, p99=25.3, p999=33.0, duration_unit=microseconds
The average execution time is about 8 μs and the 99 percentile has a value of 25 μs.
Now, considering that we will apply this technique in a typical OLTP application that’s bound to fetch a very small number of record, then the overall overhead of using encryption is not going to be a deal breaker.
However, you can easily lower this overhead by using a LRU cache.
For instance, when using a LRU Cache with a 90.9% cache hit ratio, then we will get the following metrics:
name=encryptTimer, count=100000, min=0.1, max=33.4, mean=1.91390151098664, stddev=3.392790242522042, median=0.8, p75=1.4000000000000001, p95=10.0, p98=14.5, p98=14.5, p99=17.2, p999=21.400000000000002, duration_unit=microseconds name=decryptTimer, count=100000, min=0.1, max=44.4, mean=1.9999605664739044, stddev=4.037352976657196, median=0.7000000000000001, p75=1.2, p95=10.5, p98=14.6, p99=17.8, p999=38.1, duration_unit=microseconds
Notice that the average duration of the encrypt
and decrypt
methods dropped to 2 μs. So, although the cryptographic methods are rather fast, using a LRU cache will help you lower its overhead to a couple of microseconds.
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
Encrypting and decrypting the entity identifier is a very straightforward technique that allows us to hide the underlying numerical value.
By using this strategy, we can enjoy the advantages of using numerical identifiers on the database side while also making sure that the users cannot get or even access data they are not supposed to extract.

Great article!
But I have a question how do we create our ENCRYPT_KEY_BYTES?
That’s the encryption key. You can choose your own key and transform it to bytes.
How about using ULID instead of UUID
ULID is just as huge as UUID. TSID is half the size of ULID and the probability of conflict can be very low.
Excellent article. We have struggled with the issue of hiding the primary key from users for years. Such a simple solution. Thank You
You’re welcome. If you liked this article, you are going to love my High-Performance Java Persistence video course.