The data knowledge stack

Concurrency is not for the faint-hearted

We all know concurrency programming is difficult to get it right. That’s why threading tasks are followed by extensive design and code reviewing sessions.

You never assign concurrent issues to inexperienced developers. The problem space is carefully analyzed, a design emerges and the solution is both documented and reviewed.

That’s how threading related tasks are usually addressed. You will naturally choose a higher level abstraction since you don’t want to get tangled up in low-level details. That’s why the java.util.concurrent is usually better (unless you build a High Frequency Trading system) than hand-made producer/consumer Java 1.2 style thread-safe structures.

Is database programming any different?

In a database system, the data is spread across various structures (SQL tables or NoSQL collections) and multiple users may select/insert/update/delete whatever they choose to. From a concurrency point of view, this is a very challenging task and it’s not just the database system developer’s problem. It’s our problem as well.

A typical RDBMS data layer requires you to master various technologies and your solution is only as strong as your team’s weakest spot.

Continue reading “The data knowledge stack”

Advertisements

The simple scalability equation

Queuing Theory

The queueing theory allows us to predict queue lengths and waiting times, which is of paramount importance for capacity planning. For an architect this is a very handy tool, since queues are not just the appanage of messaging systems.

To avoid system over loading we use throttling. Whenever the number of incoming requests surpasses the available resources, we basically have two options:

  • discarding all overflowing traffic, therefore decreasing availability
  • queuing requests and wait (for as long as a time out threshold) for busy resources to become available

This behavior applies to thread-per-request web servers, batch processors or connection pools.

What’s in it for us?

Agner Krarup Erlang is the father of queuing theory and traffic engineering, being the first to postulated the mathematical models required to provisioning telecommunication networks.

Erlang formulas are modeled for M/M/k queue models, meaning the system is characterized by:

The Erlang formulas give us the servicing probability for:

This is not strictly applicable to thread pools, as requests are not fairly serviced and servicing times not always follow an exponential distribution.

A general purpose formula, applicable to any stable system (a system where the arrival rate is not greater than the departure rate) is Little’s Law.

Continue reading “The simple scalability equation”

How to import CSV data into PostgreSQL

Introduction

Many database servers support CSV data transfers and this post will show one way you can import CSV files to PostgreSQL.

SQL aggregation rocks!

My previous post demonstrated FlexyPool metrics capabilities and all connection related statistics were exported in CSV format.

When it comes to aggregation tabular data SQL is at its best. If your database engine supports SQL:2003 windows functions you should definitely make use of this great feature.

Continue reading “How to import CSV data into PostgreSQL”