High-Performance Java Persistence – Part One

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

The journey

Four months, one week and two days and 114 pages; that’s how much it took to write the first part of the High-Performance Java Persistence book.

As previously stated, the book is developed in an Agile fashion. Each part represents a milestone, which is accompanied by a release. This way, the readers can get access to the book content prior to finishing the whole book (which might take a year or so).

Table of content

Before explaining what this first part is all about, it’s better to take a look on its table of content:

1. Preface
1.1 The database server and the connectivity layer
1.2 The application data access layer
1.2.1 The ORM framework
1.2.2 The native query builder framework
2. Performance and Scaling
2.1 Response time and Throughput
2.2 Database connections boundaries
2.3 Scaling up and scaling out
2.3.1 Master-Slave replication
2.3.2 Multi-Master replication
2.3.3 Sharding
3. JDBC Connection Management
3.1 DriverManager
3.2 DataSource
3.2.1 Why is pooling so much faster?
3.3 Queuing theory capacity planning
3.4 Practical database connection provisioning
3.4.1 A real-life connection pool monitoring example
3.4.1.1 Concurrent connection request count metric
3.4.1.2 Concurrent connection count metric
3.4.1.3 Maximum pool size metric
3.4.1.4 Connection acquisition time metric
3.4.1.5 Retry attempts metric
3.4.1.6 Overall connection acquisition time metric
3.4.1.7 Connection lease time metric
4. Batch Updates
4.1 Batching Statements
4.2 Batching PreparedStatements
4.2.1 Choosing the right batch size
4.2.2 Bulk operations
4.3 Retrieving auto-generated keys
4.3.1 Sequences to the rescue
5. Statement Caching
5.1 Statement lifecycle
5.1.1 Parser
5.1.2 Optimizer
5.1.2.1 Execution plan visualization
5.1.3 Executor
5.2 Caching performance gain
5.3 Server-side statement caching
5.3.1 Bind-sensitive execution plans
5.4 Client-side statement caching
6. ResultSet Fetching
6.1 ResultSet scrollability
6.2 ResultSet changeability
6.3 ResultSet holdability
6.4 Fetching size
6.5 ResultSet size
6.5.1 Too many rows
6.5.1.1 SQL limit clause
6.5.1.2 JDBC max rows
6.5.1.3 Less is more
6.5.2 Too many columns
7. Transactions
7.1 Atomicity
7.2 Consistency
7.3 Isolation
7.3.1 Concurrency control
7.3.1.1 Two-phase locking
7.3.1.2 Multi-Version Concurrency Control
7.3.2 Phenomena
7.3.2.1 Dirty write
7.3.2.2 Dirty read
7.3.2.3 Non-repeatable read
7.3.2.4 Phantom read
7.3.2.5 Read skew
7.3.2.6 Write skew
7.3.2.7 Lost update
7.3.3 Isolation levels
7.3.3.1 Read Uncommitted
7.3.3.2 Read Committed
7.3.3.3 Repeatable Read
7.3.3.4 Serializable
7.4 Durability
7.5 Read-only transactions
7.5.1 Read-only transaction routing
7.6 Transaction boundaries
7.6.1 Distributed transactions
7.6.1.1 Two-phase commit
7.6.2 Declarative transactions
7.7 Application-level transactions
7.7.1 Pessimistic and optimistic locking
7.7.1.1 Pessimistic locking
7.7.1.2 Optimistic locking

The first part is about closing the gap between an application developer and a database administrator. This book focused on data access, and for this purpose, it explains the inner-workings of both the database engine and the JDBC drivers of the four most common relational databases (Oracle, SQL Server, MySQL, and PostgreSQL).

I explain what performance and scalability means and the thin relation between response time and throughput.
Being a big fan of Neil J. Gunther, I couldn’t not write about the Universal Scalability Law and how this equation manages to associate capacity with contention and coherency.

From hardware to distributed systems, queues are everywhere, and Queuing theory provides an invaluable equation for understanding how queues affect throughput.
Connection management is one area where queuing plays a very important role and monitoring connection usage is of paramount importance to providing responsive and scalable services.

Like any other client-server communication, the data access layer can benefit from batching requests. Database drivers, like other database-related topics, are very specific when it comes to batching statements. For this purpose, I explained how you can leverage batching based on the database system in use.

Statement caching is very important for high-performance enterprise applications, both on the server-side and the client-side. This book explains how statement caching is implemented in the most common RDBMS and how you can activate this optimization using the JDBC API.

A good data fetch plan can make a difference between a high-performance data access layer and one that barely crawls. For this reason, I explained how the fetch size and the result set size affect transaction performance.

Transactions is a very complex topic. This chapter goes beyond the SQL standard phenomena and isolation levels, and it explains all possible non-serializable data anomalies and various concurrency control mechanisms. Transactions are important, not just for ensuring data effectiveness and avoiding data integrity issues but for efficiently access data too.

Sample chapter

There is also a sample chapter, which you can read it for free and get a feeling of what this book can offer you. The sample chapter can be either read online, or it can be downloaded as PDF, mobi or epub (just like the actual book).

Enjoy reading it and let me know what you think.

Follow @vlad_mihalcea