A beginner’s guide to natural and surrogate database keys

Types of primary keys

All database tables must have one primary key column. The primary key uniquely identifies a row within a table therefore it’s bound by the following constraints:

  • UNIQUE
  • NOT NULL
  • IMMUTABLE

When choosing a primary key we must take into consideration the following aspects:

  • the primary key may be used for joining other tables through a foreign key relationship
  • the primary key usually has an associated default index, so the more compact the data type the less space the index will take
  • a simple key performs better than a compound one
  • the primary key assignment must ensure uniqueness even in highly concurrent environments

When choosing a primary key generator strategy the options are:

  1. natural keys, using a column combination that guarantees individual rows uniqueness
  2. surrogate keys, that are generated independently of the current row data

Natural keys

Natural keys’ uniqueness is enforced by external factors (e.g. person unique identifiers, social security numbers, vehicle identification numbers).

Natural keys are convenient because they have an outside world equivalent and they don’t require any extra database processing. We can therefore know the primary key even before inserting the actual row into the database, which simplifies batch inserts.

If the natural key is a single numeric value the performance is comparable to that of surrogate keys.

For compound keys we must be aware of possible performance penalties:

  • compound key joins are slower than single key ones
  • compound key indexes require more space than their single key counterparts

Non-numerical keys are less efficient than numeric ones (integer, bigint), for both indexing and joining. A CHAR(17) natural key (e.g. vehicle identification number) occupies 17 bytes as opposed to 4 bytes (32 bit integer) or 8 bytes (64 bit bigint).

The initial schema design uniqueness assumptions may not forever hold true. Let’s say we’d used one specific country citizen numeric code for identifying all application users. If we now need to support other countries that don’t have such citizen numeric code or the code clashed with existing entries, than we can conclude that the schema evolution is possibly hindered.

If the natural key uniqueness constraints change it’s going to be very difficult to update both the primary keys (if we manage to drop the primary key constraints anyway) and all associated foreign key relationships.

Surrogate keys

Surrogate keys are generated independently of the current row data, so the other column constraints may freely evolve according to the application business requirements.

The database system may manage the surrogate key generation and most often the key is of a numeric type (e.g. integer or bigint), being incremented whenever there is a need for a new key.

If we want to control the surrogate key generation we can employ a 128-bit GUID or UUID. This simplifies batching and may improve the insert performance since the additional database key generation processing is no longer required. Even if this strategy is not so widely adopted it’s worth considering when designing the database model.

When the database identifier generation responsibility falls to the database system, there are several strategies for auto incrementing surrogate keys:

Database engine Auto incrementing strategy
Oracle SEQUENCE, IDENTITY (Oracle 12c)
MSSQL IDENTITY, SEQUENCE (MSSQL 2012)
PostgreSQL SEQUENCE, SERIAL TYPE
MySQL AUTO_INCREMENT
DB2 IDENTITY, SEQUENCE
HSQLDB IDENTITY, SEQUENCE

Design aspects

If you liked this article, you might want to subscribe to my newsletter too.

7 thoughts on “A beginner’s guide to natural and surrogate database keys

  1. therefore it’s bound by the following constraints: IMMUTABLE

    Check out this perfectly valid T-SQL script:

    DROP TABLE t;
    CREATE TABLE t(pk INT PRIMARY KEY);
    INSERT INTO t VALUES (1);
    
    DROP TABLE u;
    CREATE TABLE u(fk INT);
    ALTER TABLE u ADD CONSTRAINT fk FOREIGN KEY (fk) REFERENCES t(pk) ON UPDATE CASCADE;
    INSERT INTO u VALUES (1);
    
    UPDATE t SET pk = 2;
    SELECT * FROM t, u;
    
    1. Should a primary key be Immutable?

      According to wiki:

      Primary keys should be immutable, that is, never changed or re-used; they should be deleted along with the associated record.

      1. A very interesting wiki discussion was triggered by the ON CASCADE trigger that might change a Primary key

      2. This question was also asked on StackExchanged too, and most answers say that an immutable primary key is a better choice.

      Now, let’s see what Oracle’s Thomas Kyte says:

      http://asktom.oracle.com/~tkyte/update_cascade/index.html

      Personally — I’ve never found a need or use for update cascade. I’m opposed to it. If
      your design requires it — change your design now if you can.

      Primary keys are supposed to be imutable, never changing, constant. It is an excessively
      bad practice to have to update them ever. If there is a 0.00001% chance you will have to
      update a primary key — then it is not a primary key, its a surrogate key and you need to
      find the true primary key (even if you have to make it up via a sequence)

  2. I think the point is that it is not a “constraint” that keys *must* be immutable, it’s just highly desirable not to change them. Stability is a design criteria rather than a technical or theoretical requirement of a primary key.

  3. Yes, you are right. The SQL standard doesn’t impose primary key to be immutable. There’s no such reference in the SQL standard, so technically they can be modified.

    But I still believe the benefits of immutable primary keys out pass the shortcomings. For natural keys, the design should clearly state if there’s a need for changing those keys. If that’s the case, then we shouldn’t make that column a primary key and rely on surrogate keys instead.

    1. It depends. Not all situations are alike and the assumptions you are making here will be right in some cases but not in others.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s