The anatomy of Connection Pooling

Introduction

All projects I’ve been working on have used database connection pooling and that’s for very good reasons. Sometimes we might forget why we are employing one design pattern or a particular technology, so it’s worth stepping back and reason on it. Every technology or technological decision has both upsides and downsides, and if you can’t see any drawback you need to wonder what you are missing.

The database connection life-cycle

Every database read or write operation requires a connection. So let’s see how database connection flow looks like:

Image

The flow goes like this:

  1. The application data layer ask the DataSource for a database connection
  2. The DataSource will use the database Driver to open a database connection
  3. A database connection is created and a TCP socket is opened
  4. The application reads/writes to the database
  5. The connection is no longer required so it is closed
  6. The socket is closed

You can easily deduce that opening/closing connections is quite an expensive operation. PostgreSQL uses a separate OS process for every client connection, so a high rate of opening/closing connections is going to put a strain on your database management system.

The most obvious reasons for reusing a database connection would be:

  • reducing the application and database management system OS I/O overhead for creating/destroying a TCP connection
  • reducing JVM object garbage

Pooling vs No Pooling

Let’s compare how a no pooling solution compares to HikariCP which is probably the fastest connection pooling framework available.

The test will open and close 1000 connections.

private static final Logger LOGGER = LoggerFactory.getLogger(DataSourceConnectionTest.class);

private static final int MAX_ITERATIONS = 1000;

private Slf4jReporter logReporter;

private Timer timer;

protected abstract DataSource getDataSource();

@Before
public void init() {
	MetricRegistry metricRegistry = new MetricRegistry();
	this.logReporter = Slf4jReporter
			.forRegistry(metricRegistry)
			.outputTo(LOGGER)
			.build();
	timer = metricRegistry.timer("connection");
}

@Test
public void testOpenCloseConnections() throws SQLException {
	for (int i = 0; i < MAX_ITERATIONS; i++) {
		Timer.Context context = timer.time();
		getDataSource().getConnection().close();
		context.stop();
	}
	logReporter.report();
}

The chart displays the time spent during opening and closing connections so lower is better.

NoPoolingVsConnectionPooling

The connection pooling is 600 times faster than the no pooling alternative. Our enterprise system consists of tens of applications and just one batch processor system could issue more than 2 million database connections per hour, so a 2 orders of magnitude optimization is worth considering.

Type No Pooling Time (milliseconds) Connection Pooling Time (milliseconds)
min 74.551414 0.002633
max 146.69324 125.528047
mean 78.216549 0.128900
stddev 5.9438335 3.969438
median 76.150440 0.003218

Why is pooling so much faster?

To understand why the pooling solution performed so well, we need to analyse the pooling connection management flow:

PoolingConnectionLifeCycle

Whenever a connection is requested, the pooling data source will use the available connections pool to acquire a new connection. The pool will only create new connections when there are no available ones left and the pool hasn’t yet reached its maximum size. The pooling connection close() method is going to return the connection to the pool, instead of actually closing it.

ConnectionAcquireRequestStates

Faster and safer

The connection pool acts as a bounded buffer for the incoming connection requests. If there is a traffic spike the connection pool will level it instead of saturating all available database resources.

The waiting step and the timeout mechanism are safety hooks, preventing excessive database server load. If one application gets way too much database traffic, the connection pool is going to mitigate it therefore preventing it from taking down the database server (hence affecting the whole enterprise system).

With great power comes great responsibility

All these benefits come at a price, materialized in the extra complexity of the pool configuration (especially in large enterprise systems). So this is no silver-bullet and you need to pay attention to many pool settings such as:

  • minimum size
  • maximum size
  • max idle time
  • acquire timeout
  • timeout retry attempts

My next article will dig into enterprise connection pooling challenges and how Flexy Pool can assist you in find the right pool sizes.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Maven and Java multi-version modules

Introduction

Usually, a project has a minimum Java version requirement and that applies to all of its modules. But every rule has its exceptions, as recently I stumbled on the following issue.

One open source project of mine mandates Java 1.6 for most of its modules, except one requiring the 1.7 version.

This happens when integrating external libraries having different Java requirements than your own project.

Because that one module integrates the DBCP2 framework (supporting at least Java 1.7), I need to instruct Maven to use two different Java compilers.

Environment variables

We need to define the following environment variables

Environment Variable Name Environment Variable Value
JAVA_HOME_6 C:\Program Files\Java\jdk1.6.0_38
JAVA_HOME_7 C:\Program Files\Java\jdk1.7.0_25
JAVA_HOME %JAVA_HOME_6%

The parent pom.xml

The parent pom.xml defines the global java version settings

<properties>
	<jdk.version>6</jdk.version>
	<jdk>${env.JAVA_HOME_6}</jdk>
</properties>

We need to instruct both the compiler and the test plugins to use the configured java version.

<build>
	<plugins>
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-compiler-plugin</artifactId>
			<configuration>
				<source>${jdk.version}</source>
				<target>${jdk.version}</target>
				<showDeprecation>true</showDeprecation>
				<showWarnings>true</showWarnings>
				<executable>${jdk}/bin/javac</executable>
				<fork>true</fork>
			</configuration>
		</plugin>
		<plugin>
			<groupId>org.apache.maven.plugins</groupId>
			<artifactId>maven-surefire-plugin</artifactId>
			<configuration>
				<jvm>${jdk}/bin/java</jvm>
				<forkMode>once</forkMode>
			</configuration>
		</plugin>
	</plugins>
</build>

The specific module pom.xml

Those modules requiring a different java version, just need to override the default settings:

<properties>
	<jdk.version>7</jdk.version>
	<jdk>${env.JAVA_HOME_7}</jdk>
</properties>

And that’s it, we can now build each modules using its own specific minimum java version requirement.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

MongoDB 2.6 is $out

Introduction

MongoDB is evolving rapidly. The 2.2 version introduced the aggregation framework as an alternative to the Map-Reduce query model. Generating aggregated reports is a recurrent requirement for enterprise systems and MongoDB shines in this regard. If you’re new to it you might want to check this aggregation framework introduction or the performance tuning and the data modelling guides.

Let’s reuse the data model I first introduced while demonstrating the blazing fast MongoDB insert capabilities:

{
        "_id" : ObjectId("5298a5a03b3f4220588fe57c"),
        "created_on" : ISODate("2012-04-22T01:09:53Z"),
        "value" : 0.1647851116706831
}

MongoDB 2.6 Aggregation enhancements

In the 2.4 version, if I run the following aggregation query:

db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 10)) 
		} 
	} 
},  
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}]);

I hit the 16MB aggregation result limitation:

{
	"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
	"code" : 16389,
	"ok" : 0
}

MongoDB documents are limited to 16MB, and prior to the 2.6 version, the aggregation result was a BSON document. The 2.6 version replaced it with a cursor instead.

Running the same query on 2.6 yields the following result:

db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 10)) 
		} 
	} 
},  
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}])
.objsLeftInBatch();
14

I used the cursor-based objsLeftInBatch method to test the aggregation result type and the 16MB limitation no longer applies to the overall result. The cursor inner results are regular BSON documents, hence they are still limited to 16MB, but this is way more manageable than the previous overall result limit.

The 2.6 version also addresses the aggregation memory restrictions. A full collection scan such as:

db.randomData.aggregate( [   
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}])
.objsLeftInBatch();

can end up with the following error:

{
	"errmsg" : "exception: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.",
	"code" : 16945,
	"ok" : 0
}

So, we can now perform large sort operations using the allowDiskUse parameter:

db.randomData.aggregate( [   
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
}]
, 
{ 
	allowDiskUse : true 
})
.objsLeftInBatch();

The 2.6 version allows us to save the aggregation result to a different collection using the newly added $out stage.

db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 10)) 
		} 
	} 
},  
{ 
	$group: {
		_id : {
			"minute" : {
				$minute : "$created_on"
			} 
		},  
		"values": { 
			$addToSet: "$value" 
		} 
	} 
},
{ 
	$out : "randomAggregates" 
}
]);
db.randomAggregates.count();
60

New operators have been added such as let, map, cond, to name a few.

The next example will append AM or PM to the time info of each specific event entry.

var dataSet = db.randomData.aggregate( [ 
{ 
	$match: { 
		"created_on" : { 
			$gte : new Date(Date.UTC(2012, 0, 1)), 
			$lte : new Date(Date.UTC(2012, 0, 2)) 
		} 
	} 
},  
{ 
	$project: { 
		"clock" : { 
			$let: {
				vars: {
					"hour": { 
						$substr: ["$created_on", 11, -1]
					},				
					"am_pm": { $cond: { if: { $lt: [ {$hour : "$created_on" }, 12 ] } , then: 'AM',else: 'PM'} }
				},
				in: { $concat: [ "$$hour", " ", "$$am_pm"] }				
			}			
		}   
	} 
}, 
{
	$limit : 10
}
]);
dataSet.forEach(function(document)  {
	printjson(document);
});

Resulting in:

"clock" : "16:07:14 PM"
"clock" : "22:14:42 PM"
"clock" : "21:46:12 PM"
"clock" : "03:35:00 AM"
"clock" : "04:14:20 AM"
"clock" : "03:41:39 AM"
"clock" : "17:08:35 PM"
"clock" : "18:44:02 PM"
"clock" : "19:36:07 PM"
"clock" : "07:37:55 AM"

Conclusion

MongoDB 2.6 version comes with a lot of other enhancements such as bulk operations or index intersection. MongoDB is constantly evolving, offering a viable alternative for document-based storage. At such a development rate, there’s no wonder it was named 2013 database of the year.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Effective learning techniques for software craftsmen

Go in one ear and out the other

Programming languages, operating systems, SQL, NoSQL, web frameworks, Spring, Java EE, HTML, JavaScript, Agile methodologies , you name it. A developer must know a ridiculous amount of things to become effective. There’s no wonder many of us are struggling to keep pace with the ever-changing programming landscape.

When you’re a kid, doing stuff is the most natural way of learning, but then you go to school and you’re brainwashed into thinking that reading is the only way of studying.

Become an active learner

Ever since I started writing this blog I began to question my old ways of learning. I used to be a passive learner, reading books, articles or watching videos. But I realized this is not working the way it should, so I started looking for alternatives.

Edgar Dale‘s evaluated the most common learning techniques and came up with the famous Cone of experience. Basically, if you want to be a better developer you need to become an active learner.

Writing a blog

Until you start writing, you don’t really appreciate the actual effort put into coming up with a decent article. Writing down your findings not only helps the community, it helps your too. The writing process is going to teach you more about a given subject. You want to publish quality articles and the upcoming critique is going to be a good motivator.

Contributing to your favourite frameworks

The customer doesn’t want you to spend his money on writing frameworks, so your company makes use of high-quality open source frameworks. This is cost effective, but someone has to write those frameworks after all.

It’s time for our employers to realize that contributing is a form of investment. Getting involved is the best way to master a given technology. Passionate developers will allocate their spare time in this purpose, but it doesn’t always have to be that way.

What if all employers will allocate developers some hours for contributing to those open source projects they’ve been employing? You’ll meet other great guys with solid developing skills and this is probably a cheaper way of training your developers.

Contributing to your own frameworks

If you want developers to appreciate all the effort put into managing, testing or marketing, the best way is to have them lead their own open source project. Starting your own GitHub project is going to teach you a lot about Product ownership, software design or marketing techniques.

Writing frameworks is so much different than the current enterprise developing experience. You need to pay extra attention to your framework programming usability for your adopters to spend the minimum amount of time while employing your software.

Conclusion

Getting involved is the true way of the software craftsman. A pragmatic programmer never bashes his tools, giving a helping hand instead. Helping building a better software community is the most effective way of becoming a better developer.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The Builder pattern and the Spring framework

Introduction

I like to make use of the builder pattern whenever an object has both mandatory and optional properties. But building objects is usually the Spring framework responsibility, so let’s see how you can employ it using both Java and XML-based Spring configurations.

A Builder example

Let’s start from the following Builder class.

public final class Configuration<T extends DataSource> extends ConfigurationProperties<T, Metrics, PoolAdapter<T>> {

    public static final long DEFAULT_METRIC_LOG_REPORTER_PERIOD = 5;

    public static class Builder<T extends DataSource> {
        private final String uniqueName;
        private final T targetDataSource;
        private final PoolAdapterBuilder<T> poolAdapterBuilder;
        private final MetricsBuilder metricsBuilder;
        private boolean jmxEnabled = true;
        private long metricLogReporterPeriod = DEFAULT_METRIC_LOG_REPORTER_PERIOD;

        public Builder(String uniqueName, T targetDataSource, MetricsBuilder metricsBuilder, PoolAdapterBuilder<T> poolAdapterBuilder) {
            this.uniqueName = uniqueName;
            this.targetDataSource = targetDataSource;
            this.metricsBuilder = metricsBuilder;
            this.poolAdapterBuilder = poolAdapterBuilder;
        }

        public Builder setJmxEnabled(boolean enableJmx) {
            this.jmxEnabled = enableJmx;
            return this;
        }

        public Builder setMetricLogReporterPeriod(long metricLogReporterPeriod) {
            this.metricLogReporterPeriod = metricLogReporterPeriod;
            return this;
        }

        public Configuration<T> build() {
            Configuration<T> configuration = new Configuration<T>(uniqueName, targetDataSource);
            configuration.setJmxEnabled(jmxEnabled);
            configuration.setMetricLogReporterPeriod(metricLogReporterPeriod);
            configuration.metrics = metricsBuilder.build(configuration);
            configuration.poolAdapter = poolAdapterBuilder.build(configuration);
            return configuration;
        }
    }

    private final T targetDataSource;
    private Metrics metrics;
    private PoolAdapter poolAdapter;

    private Configuration(String uniqueName, T targetDataSource) {
        super(uniqueName);
        this.targetDataSource = targetDataSource;
    }

    public T getTargetDataSource() {
        return targetDataSource;
    }

    public Metrics getMetrics() {
        return metrics;
    }

    public PoolAdapter<T> getPoolAdapter() {
        return poolAdapter;
    }
}

Java-based configuration

If you’re using Spring Java-based configuration then this is how you’d do it:

@org.springframework.context.annotation.Configuration
public class FlexyDataSourceConfiguration {

    @Autowired
    private PoolingDataSource poolingDataSource;

    @Bean
    public Configuration configuration() {
        return new Configuration.Builder(
                UUID.randomUUID().toString(),
                poolingDataSource,
                CodahaleMetrics.BUILDER,
                BitronixPoolAdapter.BUILDER
        ).build();
    }

    @Bean(initMethod = "start", destroyMethod = "stop")
    public FlexyPoolDataSource dataSource() {
        Configuration configuration = configuration();
        return new FlexyPoolDataSource(configuration,
                new IncrementPoolOnTimeoutConnectionAcquiringStrategy.Builder(5),
                new RetryConnectionAcquiringStrategy.Builder(2)
        );
    }
}

XML-based configuration

The XML-based configuration is more verbose and not as intuitive as the Java-based configuration:

<bean id="configurationBuilder" class="com.vladmihalcea.flexypool.config.Configuration$Builder">
	<constructor-arg value="uniqueId"/>
	<constructor-arg ref="poolingDataSource"/>
	<constructor-arg value="#{ T(com.vladmihalcea.flexypool.metric.codahale.CodahaleMetrics).BUILDER }"/>
	<constructor-arg value="#{ T(com.vladmihalcea.flexypool.adaptor.BitronixPoolAdapter).BUILDER }"/>
</bean>

<bean id="configuration" factory-bean="configurationBuilder" factory-method="build"/>

<bean id="dataSource" class="com.vladmihalcea.flexypool.FlexyPoolDataSource" init-method="start" destroy-method="stop">
	<constructor-arg ref="configuration"/>
	<constructor-arg>
		<array>
			<bean class="com.vladmihalcea.flexypool.strategy.IncrementPoolOnTimeoutConnectionAcquiringStrategy$Builder">
				<constructor-arg value="5"/>
			</bean>
			<bean class="com.vladmihalcea.flexypool.strategy.RetryConnectionAcquiringStrategy$Builder">
				<constructor-arg value="2"/>
			</bean>
		</array>
	</constructor-arg>
</bean>

Conclusion

You can make use of the Builder pattern no matter the Spring configuration mode you’ve already chosen. If you have doubts about it’s usefulness, here are three compelling reasons you should be aware of.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Afraid of reopened issues?

Introduction

Reopened issues and developer feelings don’t mix well, a recurrent phenomenon I’ve seen on all projects I’ve worked on. Some might feel they’ve worked “in vain”, being reluctant to restart it all over again.

Reopened issues are bound to happen

There is a thin line between taking ownership of your current project and remaining professionally detached at all times. The only thing that matters is the value the customer gets for any given issue, even if it takes more steps than you previously anticipated. In software development the change is the only thing that never changes, that’s why you’ll always have to deal with reopened issues. Reopening an issue is not necessarily a bad thing, as you’ll soon find out.

What you can learn from reopened issues?

  1. The QA is doing it’s job

    There’s a good reason why we employ a “Testing” column on our Sprint boards. A task must obey the rules depicted by the “Definition of Done” policy, otherwise it might not deliver the promised business value. The sooner you test it, the least expensive the fix gets.

  2. The clients are not sure what they want

    Some clients have difficulties visualizing a flow until they are actually interacting with it. From a management point of view this is a waste of resources and it should be addressed accordingly. If it happens frequently then a “cheap mock-up” might be worth considering.

  3. A chance to challenge your design

    From a technical perspective the design is challenged to adapt with minimum effort. If you always have to rewrite everything to accommodate any unforeseen change, then you should definitely question your current architecture.

  4. A test for the peer review process

    If a task is reopened without a change of specification, it means the current technical solution is not properly functioning. The peer review process is aimed to prevent such situations, so you should check both the original problem and the review process.

  5. Recurrent reopened issues may indicate a brittle component design

    A bad design always surfaces in the form of reopened issues. If you happen to work twice as hard to accomplish a given task, you might reconsider your design or coding practices.

Conclusion

Reopening issues is just feed-back, and the sooner you receive it the better you can address it. Reopening issues is just a step in a task life-cycle. When you’ve finished developing a task, it doesn’t mean you’re done with it. This is the proper mindset for doing Agile software development. A task is done only when the customer accepts it’s business value. If you see the big picture you’ll be less frustrated by reworking a given functionality.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Choosing a leader like an agilist

The leader as a captain

I recently read Petri Kainulainen’s article on sharing leadership among team members and I am on the same wavelength in this regard, since the Agile methods emphasizes the importance of “motivated individuals, who should be trusted”.

While a team leader could be regarded as a reminiscence of the old rigid organization structures, I still see many benefits of having such a captain. When it comes to improving my people skills, I like to get inspired by other domains of activity that have been struggling with the very same challenges for decades. Think about the greatest sport teams, where all team members are stars, and yet they always have a captain for:

  • the captain is the role-model for every team member
  • he leads by example
  • he always mediates conflicts, knowing that prevention is always better than curing
  • he welcomes and facilitates new team members integration
  • he never loses his temper
  • in times of trouble, he is the voice of the team
  • he is constantly goal-oriented, making sure the team is on the right track

The leader is nominated by the team

But a great captain is always chosen by the team members. That’s the most natural way of nominating a leader and a leader doesn’t have to be imposed.

This is how a team leader should be chosen in our industry as well. Self-organizing teams need to have the power of deciding their leader as well. A leader is not someone who once proved his abilities, but a person who constantly validates his role. The team leader position is always backed by the team members feedback. A good leader is therefore self-reinforced by his positive actions, while a bad leader is simply replaced by someone with better skills.

The voting process

This is my recipe for choosing a team leader:

  1. The team must first understand what a good leader means. The voting is not a popularity contest. The team is always held responsible for their actions and choosing their leader is no different
  2. Allow the team members to register for the leader election. If you never wanted to be leader, there is little chance you’d ever become a great one
  3. Let everybody vote and explain their reasons for choosing a given team member. Without any reasonable explanation the voting could easily turn into a popularity contest
  4. Respect the decision even if the new leader is not who you thought of, in the first place

Conclusion

We need to trust our teams and respect their opinions. I like this approach since it’s a very good way of spotting leaders that you weren’t aware of. People with leadership potential are rare gems and I always stay open-minded to any method that can bring me the next great leader.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.