The anatomy of Connection Pooling

Introduction

All projects I’ve been working on have used database connection pooling and that’s for very good reasons. Sometimes we might forget why we are employing one design pattern or a particular technology, so it’s worth stepping back and reason on it. Every technology or technological decision has both upsides and downsides, and if you can’t see any drawback you need to wonder what you are missing.

The database connection life-cycle

Every database read or write operation requires a connection. So let’s see how database connection flow looks like:

Image

The flow goes like this:

  1. The application data layer ask the DataSource for a database connection
  2. The DataSource will use the database Driver to open a database connection
  3. A database connection is created and a TCP socket is opened
  4. The application reads/writes to the database
  5. The connection is no longer required so it is closed
  6. The socket is closed

You can easily deduce that opening/closing connections is quite an expensive operation. PostgreSQL uses a separate OS process for every client connection, so a high rate of opening/closing connections is going to put a strain on your database management system.

The most obvious reasons for reusing a database connection would be:

  • reducing the application and database management system OS I/O overhead for creating/destroying a TCP connection
  • reducing JVM object garbage

Pooling vs No Pooling

Let’s compare how a no pooling solution compares to HikariCP which is probably the fastest connection pooling framework available.

The test will open and close 1000 connections.

private static final Logger LOGGER = LoggerFactory.getLogger(DataSourceConnectionTest.class);

private static final int MAX_ITERATIONS = 1000;

private Slf4jReporter logReporter;

private Timer timer;

protected abstract DataSource getDataSource();

@Before
public void init() {
	MetricRegistry metricRegistry = new MetricRegistry();
	this.logReporter = Slf4jReporter
			.forRegistry(metricRegistry)
			.outputTo(LOGGER)
			.build();
	timer = metricRegistry.timer("connection");
}

@Test
public void testOpenCloseConnections() throws SQLException {
	for (int i = 0; i < MAX_ITERATIONS; i++) {
		Timer.Context context = timer.time();
		getDataSource().getConnection().close();
		context.stop();
	}
	logReporter.report();
}

The chart displays the time spent during opening and closing connections so lower is better.

NoPoolingVsConnectionPooling

The connection pooling is 600 times faster than the no pooling alternative. Our enterprise system consists of tens of applications and just one batch processor system could issue more than 2 million database connections per hour, so a 2 orders of magnitude optimization is worth considering.

Type No Pooling Time (milliseconds) Connection Pooling Time (milliseconds)
min 74.551414 0.002633
max 146.69324 125.528047
mean 78.216549 0.128900
stddev 5.9438335 3.969438
median 76.150440 0.003218

Why is pooling so much faster?

To understand why the pooling solution performed so well, we need to analyse the pooling connection management flow:

PoolingConnectionLifeCycle

Whenever a connection is requested, the pooling data source will use the available connections pool to acquire a new connection. The pool will only create new connections when there are no available ones left and the pool hasn’t yet reached its maximum size. The pooling connection close() method is going to return the connection to the pool, instead of actually closing it.

ConnectionAcquireRequestStates

Faster and safer

The connection pool acts as a bounded buffer for the incoming connection requests. If there is a traffic spike the connection pool will level it instead of saturating all available database resources.

The waiting step and the timeout mechanism are safety hooks, preventing excessive database server load. If one application gets way too much database traffic, the connection pool is going to mitigate it therefore preventing it from taking down the database server (hence affecting the whole enterprise system).

With great power comes great responsibility

All these benefits come at a price, materialized in the extra complexity of the pool configuration (especially in large enterprise systems). So this is no silver-bullet and you need to pay attention to many pool settings such as:

  • minimum size
  • maximum size
  • max idle time
  • acquire timeout
  • timeout retry attempts

My next article will dig into enterprise connection pooling challenges and how FlexyPool can assist you finding the right pool sizes.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

About these ads

11 thoughts on “The anatomy of Connection Pooling

  1. One problem you can encounter with connection pooling (depending on the RDBMS you use) is that local settings (ie. SET foo = bar) can affect queries which come after that.

    Again this depends on the RDBMS you use (some support such settings to be transaction scoped, some have them connection scoped *cogh* MySQL *cough*) and if you use such settings in your queries.

  2. Great post, explaining the fundamentals is valuable.
    Found a typo: “The waiting step and the timeout mechanism are safety hooks, preventing excessive database server.” Also, the code example has “timer.time();”. But anyways, looking forward to the Flexy Pool post.

  3. Pingback: The Baeldung Weekly Review 15

    • That’s really intetesting but you still want to limit the number of connections per application. If your enterprise system has web apps, business schedulers, data import apps, internal web services, you don’t want one app to steal all available db connections during a spike. That would lead to connection starvation for the other apps.

  4. Hi Vlad,

    Thanks for the article. I am confused in one of the statement “The pooling connection close() method is going to return the connection to the pool, instead of actually closing it” compared to the image above it. Can you please explain it further. In the image it shows that when Application releases the connection it releases the connection in Database also ?

    • There are two sequence diagrams, one for no-pooling and the other for the connection pool case. The connection pooling diagram, which you can find under “Why is pooling so much faster?”, shows how the connection goes back to the pool when being closed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 472 other followers