A beginner’s guide to the Phantom Read anomaly, and how it differs between 2PL and MVCC

Introduction Unlike SQL Server which, by default, relies on the 2PL (Two-Phase Locking) to implement the SQL standard isolation levels, Oracle, PostgreSQL, and MySQL InnoDB engine use MVCC (Multi-Version Concurrency Control). However, providing a truly Serializable isolation level on top of MVCC is really difficult, and, in this post, I’ll demonstrate that it’s very difficult to prevent the Phantom Read anomaly without resorting to pessimistic locking.

A beginner’s guide to SQL injection and how you should prevent it

Introduction One of my readers asked me to answer the following StackOverflow question. Right from the start, I noticed that the entity query was constructed by concatenating strings, which can expose your application to SQL Injection attacks. Unfortunately, I’ve been seeing this problem over and over throughout my career, and not all developers are aware of the serious risk associated with SQL Injection. For this reason, this post is going to demonstrate what damage can SQL Injection do to your system.

The best way to log JDBC statements

Introduction Whenever you are using a data access framework that auto-generates statements on your behalf, it is mandatory to log all statements to ensure their effectiveness, as well as to assert the possible performance implications.

A beginner’s guide to read and write skew phenomena

Introduction In my article about ACID and database transactions, I introduced the three phenomena described by the SQL standard: dirty read non-repeatable read phantom read While these are good to differentiate the four isolation levels (Read Uncommitted, Read Committed, Repeatable Read and Serializable), in reality, there are more phenomena to take into consideration as well. The 1995 paper (A Critique of ANSI SQL Isolation Levels) introduces the other phenomena that are omitted from the standard specification. In my High-Performance Java Persistence book, I decided to insist on the Transaction chapter as it… Read More

How to prevent lost updates in long conversations

Introduction All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (BEGIN/COMMIT/ROLLBACK). Data integrity is enforced by the ACID properties of database transactions. Logical vs Physical transactions A logical transaction is an application-level unit of work that may span over multiple physical (database) transactions. Holding the database connection open throughout several user requests, including user think time, is definitely an anti-pattern. A database server can accommodate a limited number of physical connections, and often those are reused by using connection pooling. Holding… Read More

A beginner’s guide to database locking and the lost update phenomena

Introduction A database is highly concurrent system. There’s always a chance of update conflicts, like when two concurring transactions try to update the same record. If there would be only one database transaction at any time then all operations would be executed sequentially. The challenge comes when multiple transactions try to update the same database rows as we still have to ensure consistent data state transitions. The SQL standard defines three consistency anomalies (phenomena): Dirty reads, prevented by Read Committed, Repeatable Read and Serializable isolation levels Non-repeatable reads, prevented by Repeatable Read… Read More

A beginner’s guide to natural and surrogate database keys

Types of primary keys All database tables must have one primary key column. The primary key uniquely identifies a row within a table therefore it’s bound by the following constraints: UNIQUE NOT NULL IMMUTABLE When choosing a primary key we must take into consideration the following aspects: the primary key may be used for joining other tables through a foreign key relationship the primary key usually has an associated default index, so the more compact the data type the less space the index will take the primary key assignment must ensure uniqueness… Read More

The data knowledge stack

Concurrency is not for the faint-hearted We all know concurrency programming is difficult to get it right. That’s why threading tasks are followed by extensive design and code review sessions. You never assign concurrent issues to inexperienced developers. The problem space is carefully analyzed, a design emerges and the solution is both documented and reviewed. That’s how threading related tasks are usually addressed. You will naturally choose a higher level abstraction since you don’t want to get tangled up in low-level details. That’s why the java.util.concurrent is usually better (unless you build… Read More

Time to break free from the SQL-92 mindset

Are you stuck in the 90s? If you are only using the SQL-92 language reference, then you are overlooking so many great features like: Window Functions PIVOT MERGE INSTEAD OF triggers Some test data In my previous article I imported some CSV Dropwizard metrics into PostgreSQL for further analysis.

How to import CSV data into PostgreSQL

Introduction Many database servers support CSV data transfers and this post will show one way you can import CSV files to PostgreSQL. SQL aggregation rocks! My previous post demonstrated FlexyPool metrics capabilities and all connection related statistics were exported in CSV format. When it comes to aggregation tabular data SQL is at its best. If your database engine supports SQL:2003 windows functions you should definitely make use of this great feature.