Hibernate hidden gem: the pooled-lo optimizer

Introduction

In this post we’ll uncover a sequence identifier generator combining identifier assignment efficiency and interoperability with other external systems (concurrently accessing the underlying database system).

Traditionally there have been two sequence identifier strategies to choose from.

  • The sequence identifier, always hitting the database for every new value assignment. Even with database sequence preallocation we have a significant database round-trip cost.
  • The seqhilo identifier, using the hi/lo algorithm. This generator calculates some identifier values in-memory, therefore reducing the database round-trip calls. The problem with this optimization technique is that the current database sequence value no longer reflects the current highest in-memory generated value. The database sequence is used as a bucket number, making it difficult for other systems to interoperate with the database table in question. Other applications must know the inner-workings of the hi/lo identifier strategy to properly generate non-clashing identifiers.

The enhanced identifiers

Hibernate offers a new class of identifier generators, addressing many shortcomings of the original ones. The enhanced identifier generators don’t come with a fixed identifier allocation strategy. The optimization strategy is configurable and we can even supply our own optimization implementation. By default Hibernate comes with the following built-in optimizers:

  • none: every identifier is fetched from the database, so it’s equivalent to the original sequence generator.
  • hi/lo: it uses the hi/lo algorithm and it’s equivalent to the original seqhilo generator.
  • pooled: This optimizer uses a hi/lo optimization strategy, but the current in-memory identifiers highest boundary is extracted from an actual database sequence value.
  • pooled-lo: It’s similar to the pooled optimizer but the database sequence value is used as the current in-memory lowest boundary

In the official release announcement, the pooled optimizers are advertised as being interoperable with other external systems:

Even if other applications are also inserting values, we’ll be perfectly safe because the SEQUENCE itself will handle applying this increment_size.

This is actually what we are looking for; an identifier generator that’s both efficient and doesn’t clash when other external systems are concurrently inserting rows in the same database tables.

Testing time

The following test is going to check how the new optimizers get along with other external database table inserts. In our case the external system will be some native JDBC insert statements on the same database table/sequence.

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 8; i++) {
			session.persist(newEntityInstance());
		}
		session.flush();
		assertEquals(8, ((Number) session.createSQLQuery("SELECT COUNT(*) FROM sequenceIdentifier").uniqueResult()).intValue());
		insertNewRow(session);
		insertNewRow(session);
		insertNewRow(session);
		assertEquals(11, ((Number) session.createSQLQuery("SELECT COUNT(*) FROM sequenceIdentifier").uniqueResult()).intValue());
		List<Number> ids = session.createSQLQuery("SELECT id FROM sequenceIdentifier").list();
		for (Number id : ids) {
			LOGGER.debug("Found id: {}", id);
		}
		for (int i = 0; i < 3; i++) {
			session.persist(newEntityInstance());
		}
		session.flush();
		return null;
	}
});

The pooled optimizer

We’ll first use the pooled optimizer strategy:

@Entity(name = "sequenceIdentifier")
public static class PooledSequenceIdentifier {

	@Id
	@GenericGenerator(name = "sequenceGenerator", strategy = "enhanced-sequence",
			parameters = {
					@org.hibernate.annotations.Parameter(name = "optimizer", value = "pooled"),
					@org.hibernate.annotations.Parameter(name = "initial_value", value = "1"),
					@org.hibernate.annotations.Parameter(name = "increment_size", value = "5")
			}
	)
	@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequenceGenerator")
	private Long id;
}

Running the test ends-up throwing the following exception:

DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][9]} 
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][10]} 
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][26]} 
WARN  [main]: o.h.e.j.s.SqlExceptionHelper - SQL Error: -104, SQLState: 23505
ERROR [main]: o.h.e.j.s.SqlExceptionHelper - integrity constraint violation: unique constraint or index violation; SYS_PK_10104 table: SEQUENCEIDENTIFIER
ERROR [main]: c.v.h.m.l.i.PooledSequenceIdentifierTest - Pooled optimizer threw
org.hibernate.exception.ConstraintViolationException: could not execute statement
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:72) ~[hibernate-core-4.3.5.Final.jar:4.3.5.Final]	
Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity constraint violation: unique constraint or index violation; SYS_PK_10104 table: SEQUENCEIDENTIFIER
	at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source) ~[hsqldb-2.3.2.jar:2.3.2]	

I am not sure if this is a bug or just a design limitation, but the pooled optimizer doesn’t meet the interoperability requirement.

To visualize what happens I summarized the sequence calls in the following diagram:

PooledOptimizer

When the pooled optimizer retrieves the current sequence value, it uses it to calculate the lowest in-memory boundary. The lowest value is the actual previous sequence value and this value might have been already used by some other external INSERT statement.

The pooled-lo optimizer

Fortunately, there is one more optimizer(not mentioned in the reference documentation) to be tested. The pooled-lo optimizer uses the current database sequence value as the lowest in-memory boundary, so other systems may freely use the next sequence values without risking identifier clashing:

@Entity(name = "sequenceIdentifier")
public static class PooledLoSequenceIdentifier {

	@Id
	@GenericGenerator(name = "sequenceGenerator", strategy = "enhanced-sequence",
			parameters = {
					@org.hibernate.annotations.Parameter(name = "optimizer",
							value = "pooled-lo"
					),
					@org.hibernate.annotations.Parameter(name = "initial_value", value = "1"),
					@org.hibernate.annotations.Parameter(name = "increment_size", value = "5")
			}
	)
	@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequenceGenerator")
	private Long id;
}

To better understand the inner-workings of this optimizer, the following diagram summarizes the identifier assignment process:

PooledLoOptimizer

Conclusion

A hidden gem is one of those great features that most don’t even know of its existence. The pooled-lo optimizer is extremely useful, yet most people don’t even know of its existence.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

From JPA to Hibernate’s legacy and enhanced identifier generators

JPA identifier generators

JPA defines the following identifier strategies:

Strategy Description
AUTO The persistence provider picks the most appropriate identifier strategy supported by the underlying database
IDENTITY Identifiers are assigned by a database IDENTITY column
SEQUENCE The persistence provider uses a database sequence for generating identifiers
TABLE The persistence provider uses a separate database table to emulate a sequence object

In my previous post I exampled the pros and cons of all these surrogate identifier strategies.

Identifier optimizers

While there’s not much application-side IDENTITY generator optimization (other than configuring database identity preallocation), the sequence identifiers offer much more flexibility in this regard. One of the most common optimization strategy is based on the hi/lo allocation algorithm.

For this Hibernate offers:

Generator Description
SequenceHiLoGenerator It uses a database sequence to generate the hi value, while the low value is incremented according to the hi/lo algorithm
TableHiLoGenerator A database table is used for generating the hi values. This generator is deprecated in favour of the MultipleHiLoPerTableGenerator, the enhanced TableGenerator or the SequenceStyleGenerator.
MultipleHiLo
PerTableGenerator
It’s a hi/lo table generator capable of using a single database table even for multiple identifier sequences.
SequenceStyleGenerator It’s an enhanced version of the previous sequence generator. It uses a sequence if the underlying database supports them. If the current database doesn’t support sequences it switches to using a table for generating sequence values. While the previous generators were having a predefined optimization algorithm, the enhanced generators can be configured with an optimizer strategy:

  • none: there is no optimizing strategy applied, so every identifier is fetched from the database
  • hi/lo: it uses the original hi/lo algorithm. This strategy makes it difficult for other systems to share the same identifier sequence, requiring other systems to implement the same identifier generation logic.
  • pooled: This optimizer uses a hi/lo optimization strategy, but instead of saving the current hi value it stores the current range upper boundary (or lower boundary – hibernate.id.optimizer.pooled.prefer_lo).

Pooled is the default optimizer strategy.

TableGenerator Like MultipleHiLoPerTableGenerator it may use one single table for multiple identifier generators, while offering configurable optimizer strategies.

Pooled is the default optimizer strategy.

JPA to Hibernate identifier mapping

Having such an abundant generator offer, we cannot help asking which of those is being used as the default JPA generators.

While the JPA specification doesn’t imply any particular optimization, Hibernate will prefer an optimized generator over one that always hit the database for every new identifier.

The JPA SequenceGenerator

We’ll define one entity configured with the SEQUENCE JPA identifier generator. A unit test is going to persists five such entities.

@Entity(name = "sequenceIdentifier")
public static class SequenceIdentifier {

    @Id
    @GeneratedValue(generator = "sequence", strategy=GenerationType.SEQUENCE)
    @SequenceGenerator(name = "sequence", allocationSize = 10)
    private Long id;
}

@Test
public void testSequenceIdentifierGenerator() {
    LOGGER.debug("testSequenceIdentifierGenerator");
    doInTransaction(new TransactionCallable<Void>() {
        @Override
        public Void execute(Session session) {
            for (int i = 0; i < 5; i++) {
                session.persist(new SequenceIdentifier());
            }
            session.flush();
            return null;
        }
    });
}

Running this test we’ll give us the following output

Query:{[call next value for hibernate_sequence][]} 
Generated identifier: 10, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 11, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 12, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 13, using strategy: org.hibernate.id.SequenceHiLoGenerator
Generated identifier: 14, using strategy: org.hibernate.id.SequenceHiLoGenerator
Query:{[insert into sequenceIdentifier (id) values (?)][10]} 
Query:{[insert into sequenceIdentifier (id) values (?)][11]} 
Query:{[insert into sequenceIdentifier (id) values (?)][12]} 
Query:{[insert into sequenceIdentifier (id) values (?)][13]} 
Query:{[insert into sequenceIdentifier (id) values (?)][14]} 

Hibernate chooses to use the legacy SequenceHiLoGenerator for backward compatibility with all those applications that were developed prior to releasing the enhanced generators. Migrating a legacy application to the new generators is not an easy process, so the enhanced generators are a better alternative for new applications instead.

Hibernate prefers using the “seqhilo” generator by default, which is not an intuitive assumption, since many might expect the raw “sequence” generator (always calling the database sequence for every new identifier value).

To enable the enhanced generators we need to set the following Hibernate property:

properties.put("hibernate.id.new_generator_mappings", "true");

Giveing us the following output:

Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Generated identifier: 1, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 2, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 3, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 4, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Generated identifier: 5, using strategy: org.hibernate.id.enhanced.SequenceStyleGenerator
Query:{[insert into sequenceIdentifier (id) values (?)][1]} 
Query:{[insert into sequenceIdentifier (id) values (?)][2]} 
Query:{[insert into sequenceIdentifier (id) values (?)][3]} 
Query:{[insert into sequenceIdentifier (id) values (?)][4]} 
Query:{[insert into sequenceIdentifier (id) values (?)][5]} 

The new SequenceStyleGenerator generates other identifier values than the legacy SequenceHiLoGenerator. The reason why the update statements differ between the old and the new generators is because the new generators default optimizer strategy is “pooled” while the old generators can only use the “hi/lo” strategy.

The JPA TableGenerator

@Entity(name = "tableIdentifier")
public static class TableSequenceIdentifier {

    @Id
    @GeneratedValue(generator = "table", strategy=GenerationType.TABLE)
    @TableGenerator(name = "table", allocationSize = 10)
    private Long id;
}

Running the following test:

@Test
public void testTableSequenceIdentifierGenerator() {
    LOGGER.debug("testTableSequenceIdentifierGenerator");
    doInTransaction(new TransactionCallable<Void>() {
        @Override
        public Void execute(Session session) {
            for (int i = 0; i < 5; i++) {
                session.persist(new TableSequenceIdentifier());
            }
            session.flush();
            return null;
        }
    });
}

Generates the following SQL statement output:

Query:{[select sequence_next_hi_value from hibernate_sequences where sequence_name = 'tableIdentifier' for update][]} 
Query:{[insert into hibernate_sequences(sequence_name, sequence_next_hi_value) values('tableIdentifier', ?)][0]} 
Query:{[update hibernate_sequences set sequence_next_hi_value = ? where sequence_next_hi_value = ? and sequence_name = 'tableIdentifier'][1,0]} 
Generated identifier: 1, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 2, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 3, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 4, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Generated identifier: 5, using strategy: org.hibernate.id.MultipleHiLoPerTableGenerator
Query:{[insert into tableIdentifier (id) values (?)][1]} 
Query:{[insert into tableIdentifier (id) values (?)][2]} 
Query:{[insert into tableIdentifier (id) values (?)][3]} 
Query:{[insert into tableIdentifier (id) values (?)][4]} 
Query:{[insert into tableIdentifier (id) values (?)][5]}

As with the previous SEQUENCE example, Hibernate uses the MultipleHiLoPerTableGenerator to maintain the backward compatibility.

Switching to the enhanced id generators:

properties.put("hibernate.id.new_generator_mappings", "true");

Give us the following output:

Query:{[select tbl.next_val from hibernate_sequences tbl where tbl.sequence_name=? for update][tableIdentifier]} 
Query:{[insert into hibernate_sequences (sequence_name, next_val)  values (?,?)][tableIdentifier,1]} 
Query:{[update hibernate_sequences set next_val=?  where next_val=? and sequence_name=?][11,1,tableIdentifier]} 
Query:{[select tbl.next_val from hibernate_sequences tbl where tbl.sequence_name=? for update][tableIdentifier]} 
Query:{[update hibernate_sequences set next_val=?  where next_val=? and sequence_name=?][21,11,tableIdentifier]} 
Generated identifier: 1, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 2, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 3, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 4, using strategy: org.hibernate.id.enhanced.TableGenerator
Generated identifier: 5, using strategy: org.hibernate.id.enhanced.TableGenerator
Query:{[insert into tableIdentifier (id) values (?)][1]} 
Query:{[insert into tableIdentifier (id) values (?)][2]} 
Query:{[insert into tableIdentifier (id) values (?)][3]} 
Query:{[insert into tableIdentifier (id) values (?)][4]} 
Query:{[insert into tableIdentifier (id) values (?)][5]} 

You can see that the new enhanced TableGenerator was used this time.

For more about these optimization strategies you can read the original release note.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate Identity, Sequence and Table (Sequence) generator

Introduction

In my previous post I talked about different database identifier strategies. This post will compare the most common surrogate primary key strategies:

  • IDENTITY
  • SEQUENCE
  • TABLE (SEQUENCE)

IDENTITY

The IDENTITY type (included in the SQL:2003 standard) is supported by:

The IDENTITY generator allows an integer/bigint column to be auto-incremented on demand. The increment process happens outside of the current running transaction, so a roll-back may end-up discarding already assigned values (value gaps may happen).

The increment process is very efficient since it uses a database internal lightweight locking mechanism as opposed to the more heavyweight transactional course-grain locks.

The only drawback is that we can’t know the newly assigned value prior to executing the INSERT statement. This restriction is hinderingthe “transactional write behind” flushing strategy adopted by Hibernate. For this reason Hibernates disables the JDBC batch support for entities using the IDENTITY generator.

For the following examples we’ll enable Session Factory JDBC batching:

properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");
properties.put("hibernate.jdbc.batch_size", "2");

Let’s define an Entity using the IDENTITY generation strategy:

@Entity(name = "identityIdentifier")
public static class IdentityIdentifier {

	@Id
	@GeneratedValue(strategy = GenerationType.IDENTITY)
	private Long id;
}

Persisting 5 entities:

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 5; i++) {
			session.persist(new IdentityIdentifier());
		}
		session.flush();
		return null;
	}
});

Will execute one query after the other (there is no JDBC batching involved):

Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 
Query:{[insert into identityIdentifier (id) values (default)][]} 

Aside from disabling JDBC batching, the IDENTITY generator strategy doesn’t work with the Table per concrete class inheritance model, because there could be multiple subclass entities having the same identifier and a base class query will end up retrieving entities with the same identifier (even if belonging to different types).

SEQUENCE

The SEQUENCE generator (defined in the SQL:2003 standard) is supported by:

A SEQUENCE is a database object that generates incremental integers on each successive request. SEQUENCES are much more flexible than IDENTIFIER columns because:

  • A SEQUENCE is table free and the same sequence can be assigned to multiple columns or tables
  • A SEQUENCE may preallocate values to improve performance
  • A SEQUENCE may define an incremental step, allowing us to benefit from a “pooled” Hilo algorithm
  • A SEQUENCE doesn’t restrict Hibernate JDBC batching
  • A SEQUENCE doesn’t restrict Hibernate inheritance models

Let’s define a Entity using the SEQUENCE generation strategy:

@Entity(name = "sequenceIdentifier")
public static class SequenceIdentifier {
	@Id
	@GenericGenerator(name = "sequence", strategy = "sequence", parameters = {
			@org.hibernate.annotations.Parameter(name = "sequenceName", value = "sequence"),
			@org.hibernate.annotations.Parameter(name = "allocationSize", value = "1"),
	})
	@GeneratedValue(generator = "sequence", strategy=GenerationType.SEQUENCE)
	private Long id;
}

I used the “sequence” generator because I didn’t want Hibernate to choose a SequenceHiLoGenerator or a SequenceStyleGenerator on our behalf.

Adding 5 entities:

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 5; i++) {
			session.persist(new SequenceIdentifier());
		}
		session.flush();
		return null;
	}
});

Generate the following queries:

Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[call next value for hibernate_sequence][]} 
Query:{[insert into sequenceIdentifier (id) values (?)][1]} {[insert into sequenceIdentifier (id) values (?)][2]} 
Query:{[insert into sequenceIdentifier (id) values (?)][3]} {[insert into sequenceIdentifier (id) values (?)][4]} 
Query:{[insert into sequenceIdentifier (id) values (?)][5]} 

This table the inserts are batched, but we know have 5 sequence calls prior to inserting the entities. This can be optimized by using a HILO algorithm.

TABLE (SEQUENCE)

There is another database independent alternative to generating sequences. One or multiple tables can be used to hold the identifier sequence counter. But it means trading write performance for database portability.

While IDENTITY and SEQUENCES are transaction-less, using a database table mandate ACID, for synchronizing multiple concurrent id generation requests.

This is made possible by using row-level locking which comes at a higher cost than IDENTITY or SEQUENCE generators.

The sequence must be calculated in a separate database transaction and this requires the IsolationDelegate mechanism, which has support for both local (JDBC) and global(JTA) transactions.

  • For local transactions, it must open a new JDBC connection, therefore putting more pressure on the current connection pooling mechanism.
  • For global transactions, it requires suspending the current running transaction. After the sequence value is generated, the actual transaction has to be resumed. This process has its own cost, so the overall application performance might be affected.

Let’s define a Entity using the TABLE generation strategy:

@Entity(name = "tableIdentifier")
public static class TableSequenceIdentifier {

	@Id
	@GenericGenerator(name = "table", strategy = "enhanced-table", parameters = {
			@org.hibernate.annotations.Parameter(name = "table_name", value = "sequence_table")
	})
	@GeneratedValue(generator = "table", strategy=GenerationType.TABLE)
	private Long id;
}
	

I used the newer “enhanced-table” generator, because the legacy “table” generator has been deprecated.

Adding 5 entities:

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 5; i++) {
			session.persist(new TableSequenceIdentifier());
		}
		session.flush();
		return null;
	}
});

Generate the following queries:

Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[insert into sequence_table (sequence_name, next_val)  values (?,?)][default,1]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][2,1,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][3,2,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][4,3,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][5,4,default]} 
Query:{[select tbl.next_val from sequence_table tbl where tbl.sequence_name=? for update][default]} 
Query:{[update sequence_table set next_val=?  where next_val=? and sequence_name=?][6,5,default]} 
Query:{[insert into tableIdentifier (id) values (?)][1]} {[insert into tableIdentifier (id) values (?)][2]} 
Query:{[insert into tableIdentifier (id) values (?)][3]} {[insert into tableIdentifier (id) values (?)][4]} 
Query:{[insert into tableIdentifier (id) values (?)][5]}

The table generator allows JDBC batching but it resorts to SELECT FOR UPDATE queries. The row level locking is definitely less efficient than using a native IDENTITY or SEQUENCE.

So, based on your application requirements you have multiple options to choose from. There isn’t one single winning strategy, each one having both advantages and disadvantages.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate and UUID identifiers

Introduction

In my previous post I talked about UUID surrogate keys and the use cases when there are more appropriate than the more common auto-incrementing identifiers.

A UUID database type

There are several ways to represent a 128-bit UUID, and whenever in doubt I like to resort to Stack Exchange for an expert advice.

Because table identifiers are usually indexed, the more compact the database type the less space will the index require. From the most efficient to the least, here are our options:

  1. Some databases (PostgreSQL, SQL Server) offer a dedicated UUID storage type
  2. Otherwise we can store the bits as a byte array (e.g. RAW(16) in Oracle or the standard BINARY(16) type)
  3. Alternatively we can use 2 bigint (64-bit) columns, but a composite identifier is less efficient than a single column one
  4. We can store the hex value in a CHAR(36) column (e.g 32 hex values and 4 dashes), but this will take the most amount of space, hence it’s the least efficient alternative

Hibernate offers many identifier strategies to choose from and for UUID identifiers we have three options:

  • the assigned generator accompanied by the application logic UUID generation
  • the hexadecimal “uuid” string generator
  • the more flexible “uuid2″ generator, allowing us to use java.lang.UUID, a 16 byte array or a hexadecimal String value

The assigned generator

The assigned generator allows the application logic to control the entity identifier generation process. By simply omitting the identifier generator definition, Hibernate will consider the assigned identifier. This example uses a BINARY(16) column type, since the target database is HSQLDB.

@Entity(name = "assignedIdentifier")
public static class AssignedIdentifier {

    @Id
    @Column(columnDefinition = "BINARY(16)")
    private UUID uuid;

    public AssignedIdentifier() {
    }

    public AssignedIdentifier(UUID uuid) {
        this.uuid = uuid;
    }
}

Persisting an Entity:

session.persist(new AssignedIdentifier(UUID.randomUUID()));
session.flush();

Generates exactly one INSERT statement:

Query:{[insert into assignedIdentifier (uuid) values (?)][[B@76b0f8c3]}

Let’s see what happens when issuing a merge instead:

session.merge(new AssignedIdentifier(UUID.randomUUID()));
session.flush();

We get both a SELECT and an INSERT this time:

Query:{[select assignedid0_.uuid as uuid1_0_0_ from assignedIdentifier assignedid0_ where assignedid0_.uuid=?][[B@23e9436c]} 
Query:{[insert into assignedIdentifier (uuid) values (?)][[B@2b37d486]} 

The persist method takes a transient entity and attaches it to the current Hibernate session. If there is an already attached entity or if the current entity is detached we’ll get an exception.

The merge operation will copy the current object state into the existing persisted entity (if any). This operation works for both transient and detached entities, but for transient entities persist is much more efficient than the merge operation.

For assigned identifiers, a merge will always require a select, since Hibernate cannot know if there is already a persisted entity having the same identifier. For other identifier generators Hibernate looks for a null identifier to figure out if the entity is in the transient state.

That’s why the Spring Data SimpleJpaRepository#save(S entity) method is not the best choice for Entities using an assigned identifier:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

For assigned identifiers, this method will always pick merge instead of persist, hence you will get both a SELECT and an INSERT for every newly inserted entity.

The UUID generators

This time we won’t assign the identifier ourselves but have Hibernate generate it on our behalf. When a null identifier is encountered, Hibernate assumes a transient entity, for whom it generates a new identifier value. This time, the merge operation won’t require a select query prior to inserting a transient entity.

The UUIDHexGenerator

The UUID hex generator is the oldest UUID identifier generator and it’s registered under the “uuid” type. It can generate a 32 hexadecimal UUID string value (it can also use a separator) having the following pattern: 8{sep}8{sep}4{sep}8{sep}4.

This generator is not IETF RFC 4122 compliant, which uses the 8-4-4-4-12 digit representation.

@Entity(name = "uuidIdentifier")
public static class UUIDIdentifier {

    @GeneratedValue(generator = "uuid")
    @GenericGenerator(name = "uuid", strategy = "uuid")
    @Column(columnDefinition = "CHAR(32)")
    @Id
    private String uuidHex;
}

Persisting or merging a transient entity:

session.persist(new UUIDIdentifier());
session.flush();
session.merge(new UUIDIdentifier());
session.flush();

Generates one INSERT statement per operation:

Query:{[insert into uuidIdentifier (uuidHex) values (?)][2c929c6646f02fda0146f02fdbfa0000]} 
Query:{[insert into uuidIdentifier (uuidHex) values (?)][2c929c6646f02fda0146f02fdbfc0001]} 

You can check out the string parameter value sent to the SQL INSERT queries.

The UUIDGenerator

The newer UUID generator is IETF RFC 4122 compliant (variant 2) and it offers pluggable generation strategies. It’s registered under the “uuid2″ type and it offers a broader type range to choose from:

@Entity(name = "uuid2Identifier")
public static class UUID2Identifier {

    @GeneratedValue(generator = "uuid2")
    @GenericGenerator(name = "uuid2", strategy = "uuid2")
    @Column(columnDefinition = "BINARY(16)")
    @Id
    private UUID uuid;
}

Persisting or merging a transient entity:

session.persist(new UUID2Identifier());
session.flush();
session.merge(new UUID2Identifier());
session.flush();

Generates one INSERT statement per operation:

Query:{[insert into uuid2Identifier (uuid) values (?)][[B@68240bb]} 
Query:{[insert into uuid2Identifier (uuid) values (?)][[B@577c3bfa]}

This SQL INSERT queries are using a byte array as we configured the @Id column definition.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The hi/lo algorithm

Introduction

In my previous post I talked about various database identifier strategies, you need to be aware of when designing the database model. We concluded that database sequences are very convenient, because they are both flexible and efficient for most use cases.

But even with cached sequences, the application requires a database round-trip for every new the sequence value. If your applications demands a high number of insert operations per transaction, the sequence allocation may be optimized with a hi/lo algorithm.

The hi/lo algorithm

The hi/lo algorithms splits the sequences domain into “hi” groups. A “hi” value is assigned synchronously. Every “hi” group is given a maximum number of “lo” entries, that can by assigned off-line without worrying about concurrent duplicate entries.

  1. The “hi” token is assigned by the database, and two concurrent calls are guaranteed to see unique consecutive values
  2. Once a “hi” token is retrieved we only need the “incrementSize” (the number of “lo” entries)
  3. The identifiers range is given by the following formula:
  4. [(hi -1) * incrementSize) + 1, (hi * incrementSize) + 1)

    and the “lo” value will be taken from:

    [0, incrementSize)

    starting from

    [(hi -1) * incrementSize) + 1)

  5. When all “lo” values are used, a new “hi” value is fetched and the cycle continues

Here you can have an example of two concurrent transactions, each one inserting multiple entities:

hi_lo_algorithm

Testing the theory

If we have the following entity:

@Entity
public class Hilo {

    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "hilo_sequence_generator")
    @GenericGenerator(
            name = "hilo_sequence_generator",
            strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator",
            parameters = {
                    @Parameter(name = "sequence_name", value = "hilo_seqeunce"),
                    @Parameter(name = "initial_value", value = "1"),
                    @Parameter(name = "increment_size", value = "3"),
                    @Parameter(name = "optimizer", value = "hilo")
            })
    @Id
    private Long id;
}

We can check how many database sequence round-trips are issued when inserting multiple entities:

@Test
public void testHiloIdentifierGenerator() {
	doInTransaction(new TransactionCallable<Void>() {
		@Override
		public Void execute(Session session) {
			for(int i = 0; i < 8; i++) {
				Hilo hilo = new Hilo();
				session.persist(hilo);
				session.flush();
			}
			return null;
		}
	});
}

Which end-ups generating the following SQL queries:

Query:{[call next value for hilo_seqeunce][]} 
Query:{[insert into Hilo (id) values (?)][1]} 
Query:{[insert into Hilo (id) values (?)][2]} 
Query:{[insert into Hilo (id) values (?)][3]} 
Query:{[call next value for hilo_seqeunce][]} 
Query:{[insert into Hilo (id) values (?)][4]} 
Query:{[insert into Hilo (id) values (?)][5]} 
Query:{[insert into Hilo (id) values (?)][6]} 
Query:{[call next value for hilo_seqeunce][]} 
Query:{[insert into Hilo (id) values (?)][7]} 
Query:{[insert into Hilo (id) values (?)][8]} 

As you can see we have only 3 sequence calls for 8 inserted entities. The more entity inserts a transaction will we require the better the performance gain we’ll obtain from reducing the database sequence round-trips.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Database primary key flavors

Types of primary keys

All database tables must have one primary key column. The primary key uniquely identifies a row within a table therefore it’s bound by the following constraints:

  • UNIQUE
  • NOT NULL
  • IMMUTABLE

When choosing a primary key we must take into consideration the following aspects:

  • the primary key may be used for joining other tables through a foreign key relationship
  • the primary key usually has an associated default index, so the more compact the data type the less space the index will take
  • a simple key performs better than a compound one
  • the primary key assignment must ensure uniqueness even in highly concurrent environments

When choosing a primary key generator strategy the options are:

  1. natural keys, using a column combination that guarantees individual rows uniqueness
  2. surrogate keys, that are generated independently of the current row data

Natural keys

Natural keys’ uniqueness is enforced by external factors (e.g. person unique identifiers, social security numbers, vehicle identification numbers).

Natural keys are convenient because they have an outside world equivalent and they don’t require any extra database processing. We can therefore know the primary key even before inserting the actual row into the database, which simplifies batch inserts.

If the natural key is a single numeric value the performance is comparable to that of surrogate keys.

For compound keys we must be aware of possible performance penalties:

  • compound key joins are slower than single key ones
  • compound key indexes require more space than their single key counterparts

Non-numerical keys are less efficient than numeric ones (integer, bigint), for both indexing and joining. A CHAR(17) natural key (e.g. vehicle identification number) occupies 17 bytes as opposed to 4 bytes (32 bit integer) or 8 bytes (64 bit bigint).

The initial schema design uniqueness assumptions may not forever hold true. Let’s say we’d used one specific country citizen numeric code for identifying all application users. If we now need to support other countries that don’t have such citizen numeric code or the code clashed with existing entries, than we can conclude that the schema evolution is possibly hindered.

If the natural key uniqueness constraints change it’s going to be very difficult to update both the primary keys (if we manage to drop the primary key constraints anyway) and all associated foreign key relationships.

Surrogate keys

Surrogate keys are generated independently of the current row data, so the other column constraints may freely evolve according to the application business requirements.

The database system may manage the surrogate key generation and most often the key is of a numeric type (e.g. integer or bigint), being incremented whenever there is a need for a new key.

If we want to control the surrogate key generation we can employ a 128-bit GUID or UUID. This simplifies batching and may improve the insert performance since the additional database key generation processing is no longer required. Even if this strategy is not so widely adopted it’s worth considering when designing the database model.

When the database identifier generation responsibility falls to the database system, there are several strategies for auto incrementing surrogate keys:

Database engine Auto incrementing strategy
Oracle SEQUENCE, IDENTITY (Oracle 12c)
MSSQL IDENTITY, SEQUENCE (MSSQL 2012)
PostgreSQL SEQUENCE, SERIAL TYPE
MySQL AUTO_INCREMENT
DB2 IDENTITY, SEQUENCE
HSQLDB IDENTITY, SEQUENCE

Design aspects

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

What I learned at Topconf Bucharest

Introduction

I’ve got back from Topconf Romania, a developer to developer conference that emerged in Tallinn and for the first time this year it was also held in Bucharest .

As an architect I assumed I’d be after technical speeches but I got really impressed by some management related presentations as well.

Lessons learned

A conference is a great learning experience. New technologies are being advertised and software paradigms get dissected and questioned by both the speakers and the attenders. There were some great ideas I came back with and I’ll share with you as follows:

It’s all about feed-back

Feed-back is the tool of wise people. Every action has an associated reaction and the feed-back is a reinforcing factor you should never ignore.

Nothing is perfect but feed-back can help you get better. Feed-back is probably the only suitable learning technique in the ever changing environment of software development.

We inherently use feed-back to build better relationships, to shape our personalities or understand a problem space whose function depends on way too many variables to think of any formula that can always give you the right result.

We like to follow rather than reason

Most people would rather follow a Dogma than question it. A very good example is how we’ve been managing software over the years.

Nigel Runnels-Moss spoke about Agile anti-patterns and the future of management which made me question the way we approach management in the first place.

Agile has become a buzzword, everybody wanting to be part of the Agile movement. Although it started as a feed-back driven methodology it recently spread to large rigid organizations, governments and even the military.

One reason for Scrum success is the rule based approach. You get a list of techniques and that’s all you need to care of. Most will religiously follow them without understanding the reason behind them or their effectiveness in their own specific contexts.

We always have to question everything we do and understand the reason for every action we take. Agile is not a set of rules but a philosophy that takes time and a great deal of thought to properly master it. You should understand what it offers before considering adopting it. You should also analyze feed-back and check how it works for you. There is no such thing of a methodology that fits all. The best methodologies are the ones that are tailored to specific projects and teams, being constantly driven by feed-back throughout a project life-cycle.

Software is more about people

The technical aspects of software are deterministic, it’s people that make it hard to manage deterministically. People actions, inner and outer team communication and collaboration bring the chaos theory to software industry.

Communication is our best friend

“How to win friend and influence people” should be the first book you ever read. Lynn Myrick gave a great introduction to the true importance of communication. She runs communication dojos that help people become better communicators and we should all follow her example and invest time and effort for overcoming communication burdens.

Never underestimate security threats

Another great talk was Joseph Carson‘s Future of Security presentation. In 2013 were recorded more security threads than in all previous Internet years combined. The Mobile market, the Cloud computing and the Internet of things demand careful security planning and you shouldn’t underestimate its importance.

Scalable real-time searching

Costin Leau gave a great talk on ElasticSearch capabilities and how it enables scalable real-time searching. We’ve previously used Lucene through Hibernate Search and SOLR but ElasticSearch addresses the scalability aspects of real-time searching. It also offers visualization (Kibana) and logging (LogStash) tooling.

Conclusion

Topconf Bucharest was a great success and it triggered the urge to become a speaker myself. Watching all these great people made realize how much they have invested in becoming domain experts. A great speaker both a domain expert and a great communicator. Striving for becoming a speaker can make you a better professional so I think it’s worth the effort.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.