Seize the deal!
Caching Best Practices
Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!
To get the most out of the relational database in use, you need to make sure the data access layer resonates with the underlying database system.
In this article, we are going to see what you can do to boost up performance when using PostgreSQL with JPA and Hibernate.
The first thing you need to take into consideration is how PostgreSQL works behind the scenes. Knowing its inner workings can make a difference between an application that barely crawls and one which runs at warp speed.
Behind the scenes, PostgreSQL uses MVCC (MultiVersion Concurrency Control) to manage concurrent access to table rows and indexes. For instance, the UPDATE operation consists of soft-deleting the previous record and inserting a new one as illustrated by the following diagram:
Having multiple versions of the same tuple allows other transactions to see the previous entity version until the current modifying transaction manages to commit.Rollback is a rather cheap operation since it’s a matter of deleting the uncommitted records.
Thanks to MVCC, readers don’t block writers and writers don’t block readers. For more details, check out this article.
Like any relational database system, PostgreSQL is designed to minimize disk access as much as possible.
Traditionally, database systems use two logs to mark transaction modifications:
For both tables and indexes, PostgreSQL loads data in pages of 8KB which is the minimum unit of reading and writing data. PostgreSQL uses the shared_buffer to cache table and index pages that are read and modified. During a checkpoint, the dirty pages found in the shared_buffer are flushed to disk.
Aside from using the shared_buffer which is meant to store the working data set, PostgreSQL relies on the OS cache for speeding up reads and writes of pages that are not stored in the shared_buffer.
PostgreSQL has many advanced features, and you should definitely take advantage of. Just because you are using JPA and Hibernate, it does not mean you have to restrict all your database interactions to JPQL or Criteria API queries.
Otherwise, why do you think the
EntityManager allows you to run native SQL queries?
For more details about Window Functions, check out this article.
Using JPA and Hibernate for writing data is very convenient, especially for typical OLTP operations. However, processing large volumes of data is much more practical to be done in the database.
For this reason, using stored procedures (or functions as PostgreSQL calls them) allows you to avoid moving large volumes of data out of the database just to process them in the data access layer. For more details about using stored procedures with JPA and Hibernate, check out this article.
If you are using PostgreSQL in production and all your JPA and Hibernate code is tested on an in-memory database like H2 or HSQLDB, then you are doing it all wrong. First of all, you won’t be able to test Window Functions or other PostgreSQL-specific features. Second, just because all tests run on the in-memory DB, it does not guarantee that the same code will run just fine on the PostgreSQL DB you run in production.
The only reason developers choose an in-memory database for running integration tests is the associated speed of avoiding disk access. However, you can run integration tests on PostgreSQL almost as fast as on an in-memory database. All you have to do is to map the data folder on
tmpfs. For more details, check out this article.
In a relational database, it’s best to store data according to the principles of the relational model.
However, it might be that you also need to store schema-less data (non-structured logs) EAV (Entity-Attribute-Value) structures, in which case, a JSON column can help you deal with such as requirement.
So, just because you are using a RDBMS, it does not mean you cannot take advantage of a flexible data storage when the application requirements demand it.
For more details about how to use JSON with JPA and Hibernate, check out this article.
When using PostgreSQL, it’s tempting to use a SERIAL or BIGSERIAL column type to auto-increment Primary Keys, when using JPA and Hibernate, this is not recommended.
For more details, check out this article.
For writing data, JDBC batching can help you reduce transaction response time. When using Hibernate, enabling batching is just a matter of setting one configuration property.
More, PostgreSQL offers the
reWriteBatchedInserts JDBC Driver configuration property which can help you reduce execution costs even further.
Although optimistic locking scales better, there are certain concurrency control scenarios which call for a pessimistic locking approach.
PostgreSQL provides advisory locks which can be used as building blocks for implementing various concurrency control schemes that go beyond controlling the data stored in the relational database.
If you enjoyed this article, I bet you are going to love my upcoming Online Workshops!
- Caching Best Practices with JPA and Hibernate (2.5 hours) on the 30th of September
- High-Performance SQL (4 hours) on the 6th of October in collaboration with Voxxed Days Ticino
- High-Performance SQL (12 hours) starting on the 28th of October in collaboration with Bouvet
As you can see, there are many things to keep in mind when using PostgreSQL with JPA and Hibernate. Since PostgreSQL is one of the most advanced open-source RDBMS, it’s very useful to know all these tips and adjust your data access layer to get the most out of it.