Hibernate collections optimistic locking

Introduction

Hibernate provides an optimistic locking mechanism to prevent lost updates even for long-conversations. In conjunction with an entity storage, spanning over multiple user requests (extended persistence context or detached entities) Hibernate can guarantee application-level repeatable-reads.

The dirty checking mechanism detects entity state changes and increments the entity version. While basic property changes are always taken into consideration, Hibernate collections are more subtle in this regard.

Owned vs Inverse collections

In relational databases, two records are associated through a foreign key reference. In this relationship, the referenced record is the parent while the referencing row (the foreign key side) is the child. A non-null foreign key may only reference an existing parent record.

In the Object-oriented space this association can be represented in both directions. We can have a many-to-one reference from a child to parent and the parent can also have a one-to-many children collection.

Because both sides could potentially control the database foreign key state, we must ensure that only one side is the owner of this association. Only the owning side state changes are propagated to the database. The non-owning side has been traditionally referred as the inverse side.

Next I’ll describe the most common ways of modelling this association.

The unidirectional parent-owning-side-child association mapping

Only the parent side has a @OneToMany non-inverse children collection. The child entity doesn’t reference the parent entity at all.

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
	private List<Comment> comments = new ArrayList<Comment>();
	...
}	

The unidirectional parent-owning-side-child component association mapping mapping

The child side doesn’t always have to be an entity and we might model it as a component type instead. An Embeddable object (component type) may contain both basic types and association mappings but it can never contain an @Id. The Embeddable object is persisted/removed along with its owning entity.

The parent has an @ElementCollection children association. The child entity may only reference the parent through the non-queryable Hibernate specific @Parent annotation.

@Entity(name = "post")
public class Post {
	...
	@ElementCollection
	@JoinTable(name = "post_comments", joinColumns = @JoinColumn(name = "post_id"))
	@OrderColumn(name = "comment_index")
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Embeddable
public class Comment {
	...
	@Parent
	private Post post;
	...
}

The bidirectional parent-owning-side-child association mapping

The parent is the owning side so it has a @OneToMany non-inverse (without a mappedBy directive) children collection. The child entity references the parent entity through a @ManyToOne association that’s neither insertable nor updatable:

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Entity(name = "comment")
public class Comment {
	...
	@ManyToOne
	@JoinColumn(name = "post_id", insertable = false, updatable = false)
	private Post post;
	...
}

The bidirectional child-owning-side-parent association mapping

The child entity references the parent entity through a @ManyToOne association, and the parent has a mappedBy @OneToMany children collection. The parent side is the inverse side so only the @ManyToOne state changes are propagated to the database.

Even if there’s only one owning side, it’s always a good practice to keep both sides in sync by using the add/removeChild() methods.

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true, mappedBy = "post")
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Entity(name = "comment")
public class Comment {
	...
	@ManyToOne
	private Post post;	
	...
}

The unidirectional child-owning-side-parent association mapping

The child entity references the parent through a @ManyToOne association. The parent doesn’t have a @OneToMany children collection so the child entity becomes the owning side. This association mapping resembles the relational data foreign key linkage.

@Entity(name = "comment")
public class Comment {
    ...
    @ManyToOne
    private Post post;	
    ...
}

Collection versioning

The 3.4.2 section of the JPA 2.1 specification defines optimistic locking as:

The version attribute is updated by the persistence provider runtime when the object is written to the database. All non-relationship fields and proper ties and all relationships owned by the entity are included in version checks[35].

[35] This includes owned relationships maintained in join tables

N.B. Only owning-side children collection can update the parent version.

Testing time

Let’s test how the parent-child association type affects the parent versioning. Because we are interested in the children collection dirty checking, the unidirectional child-owning-side-parent association is going to be skipped, as in that case the parent doesn’t contain a children collection.

Test case

The following test case is going to be used for all collection type use cases:

protected void simulateConcurrentTransactions(final boolean shouldIncrementParentVersion) {
	final ExecutorService executorService = Executors.newSingleThreadExecutor();

	doInTransaction(new TransactionCallable<Void>() {
		@Override
		public Void execute(Session session) {
			try {
				P post = postClass.newInstance();
				post.setId(1L);
				post.setName("Hibernate training");
				session.persist(post);
				return null;
			} catch (Exception e) {
				throw new IllegalArgumentException(e);
			}
		}
	});

	doInTransaction(new TransactionCallable<Void>() {
		@Override
		public Void execute(final Session session) {
			final P post = (P) session.get(postClass, 1L);
			try {
				executorService.submit(new Callable<Void>() {
					@Override
					public Void call() throws Exception {
						return doInTransaction(new TransactionCallable<Void>() {
							@Override
							public Void execute(Session _session) {
								try {
									P otherThreadPost = (P) _session.get(postClass, 1L);
									int loadTimeVersion = otherThreadPost.getVersion();
									assertNotSame(post, otherThreadPost);
									assertEquals(0L, otherThreadPost.getVersion());
									C comment = commentClass.newInstance();
									comment.setReview("Good post!");
									otherThreadPost.addComment(comment);
									_session.flush();
									if (shouldIncrementParentVersion) {
										assertEquals(otherThreadPost.getVersion(), loadTimeVersion + 1);
									} else {
										assertEquals(otherThreadPost.getVersion(), loadTimeVersion);
									}
									return null;
								} catch (Exception e) {
									throw new IllegalArgumentException(e);
								}
							}
						});
					}
				}).get();
			} catch (Exception e) {
				throw new IllegalArgumentException(e);
			}
			post.setName("Hibernate Master Class");
			session.flush();
			return null;
		}
	});
}

The unidirectional parent-owning-side-child association testing

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comment (post_id bigint not null, comments_id bigint not null, comment_index integer not null, primary key (post_id, comment_index))][]} 
Query:{[alter table post_comment add constraint UK_se9l149iyyao6va95afioxsrl  unique (comments_id)][]} 
Query:{[alter table post_comment add constraint FK_se9l149iyyao6va95afioxsrl foreign key (comments_id) references comment][]} 
Query:{[alter table post_comment add constraint FK_6o1igdm04v78cwqre59or1yj1 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 

#insert comment in secondary transaction
#optimistic locking post version update in secondary transaction
Query:{[insert into comment (id, review) values (default, ?)][Good post!]} 
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate training,1,1,0]} 
Query:{[insert into post_comment (post_id, comment_index, comments_id) values (?, ?, ?)][1,0,1]} 

#optimistic locking exception in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]}
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.EntityOptimisticLockingOnUnidirectionalCollectionTest$Post#1] 

The unidirectional parent-owning-side-child component association testing

#create tables
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comments (post_id bigint not null, review varchar(255), comment_index integer not null, primary key (post_id, comment_index))][]} 
Query:{[alter table post_comments add constraint FK_gh9apqeduab8cs0ohcq1dgukp foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_0_0_, entityopti0_.name as name2_0_0_, entityopti0_.version as version3_0_0_ from post entityopti0_ where entityopti0_.id=?][1]} 
Query:{[select comments0_.post_id as post_id1_0_0_, comments0_.review as review2_1_0_, comments0_.comment_index as comment_3_0_ from post_comments comments0_ where comments0_.post_id=?][1]} 

#insert comment in secondary transaction
#optimistic locking post version update in secondary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate training,1,1,0]} 
Query:{[insert into post_comments (post_id, comment_index, review) values (?, ?, ?)][1,0,Good post!]} 

#optimistic locking exception in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.EntityOptimisticLockingOnComponentCollectionTest$Post#1]	

The bidirectional parent-owning-side-child association testing

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), post_id bigint, primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comment (post_id bigint not null, comments_id bigint not null)][]} 
Query:{[alter table post_comment add constraint UK_se9l149iyyao6va95afioxsrl  unique (comments_id)][]} 
Query:{[alter table comment add constraint FK_f1sl0xkd2lucs7bve3ktt3tu5 foreign key (post_id) references post][]} 
Query:{[alter table post_comment add constraint FK_se9l149iyyao6va95afioxsrl foreign key (comments_id) references comment][]} 
Query:{[alter table post_comment add constraint FK_6o1igdm04v78cwqre59or1yj1 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 
Query:{[select comments0_.post_id as post_id1_1_0_, comments0_.comments_id as comments2_2_0_, entityopti1_.id as id1_0_1_, entityopti1_.post_id as post_id3_0_1_, entityopti1_.review as review2_0_1_, entityopti2_.id as id1_1_2_, entityopti2_.name as name2_1_2_, entityopti2_.version as version3_1_2_ from post_comment comments0_ inner join comment entityopti1_ on comments0_.comments_id=entityopti1_.id left outer join post entityopti2_ on entityopti1_.post_id=entityopti2_.id where comments0_.post_id=?][1]} 

#insert comment in secondary transaction
#optimistic locking post version update in secondary transaction
Query:{[insert into comment (id, review) values (default, ?)][Good post!]} 
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate training,1,1,0]} 
Query:{[insert into post_comment (post_id, comments_id) values (?, ?)][1,1]} 

#optimistic locking exception in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.EntityOptimisticLockingOnBidirectionalParentOwningCollectionTest$Post#1]

The bidirectional child-owning-side-parent association testing

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), post_id bigint, primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[alter table comment add constraint FK_f1sl0xkd2lucs7bve3ktt3tu5 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 

#insert comment in secondary transaction
#post version is not incremented in secondary transaction
Query:{[insert into comment (id, post_id, review) values (default, ?, ?)][1,Good post!]} 
Query:{[select count(id) from comment where post_id =?][1]} 

#update works in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 

Overruling default collection versioning

If the default owning-side collection versioning is not suitable for your use case, you can always overrule it with Hibernate @OptimisticLock annotation.

Let’s overrule the default parent version update mechanism for bidirectional parent-owning-side-child association:

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
	@OptimisticLock(excluded = true)
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Entity(name = "comment")
public class Comment {
	...
	@ManyToOne
	@JoinColumn(name = "post_id", insertable = false, updatable = false)
	private Post post;
	...
}

This time, the children collection changes won’t trigger a parent version update:

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), post_id bigint, primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comment (post_id bigint not null, comments_id bigint not null)][]} 
Query:{[alter table post_comment add constraint UK_se9l149iyyao6va95afioxsrl  unique (comments_id)][]} 
Query:{[alter table comment add constraint FK_f1sl0xkd2lucs7bve3ktt3tu5 foreign key (post_id) references post][]} 
Query:{[alter table post_comment add constraint FK_se9l149iyyao6va95afioxsrl foreign key (comments_id) references comment][]} 
Query:{[alter table post_comment add constraint FK_6o1igdm04v78cwqre59or1yj1 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 
Query:{[select comments0_.post_id as post_id1_1_0_, comments0_.comments_id as comments2_2_0_, entityopti1_.id as id1_0_1_, entityopti1_.post_id as post_id3_0_1_, entityopti1_.review as review2_0_1_, entityopti2_.id as id1_1_2_, entityopti2_.name as name2_1_2_, entityopti2_.version as version3_1_2_ from post_comment comments0_ inner join comment entityopti1_ on comments0_.comments_id=entityopti1_.id left outer join post entityopti2_ on entityopti1_.post_id=entityopti2_.id where comments0_.post_id=?][1]} 

#insert comment in secondary transaction
Query:{[insert into comment (id, review) values (default, ?)][Good post!]} 
Query:{[insert into post_comment (post_id, comments_id) values (?, ?)][1,1]} 

#update works in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 

Conclusion

It’s very important to understand how various modelling structures impact concurrency patterns. The owning-side collections changes are taken into consideration when incrementing the parent version number, and you can always bypass it using the @OptimisticLock annotation.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate application-level repeatable reads

Introduction

In my previous post I described how application-level transactions offer a suitable concurrency control mechanism for long conversations.

All entities are loaded within the context of a Hibernate Session, acting as a transactional write-behind cache.

A Hibernate persistence context can hold one and only one reference of a given entity. The first level cache guarantees session-level repeatable reads.

If the conversation spans over multiple requests we can have application-level repeatable reads. Long conversations are inherently stateful so we can opt for detached objects or long persistence contexts. But application-level repeatable reads require an application-level concurrency control strategy such as optimistic locking.

The catch

But this behavior may prove unexpected at times.

If your Hibernate Session has already loaded a given entity then any successive entity query (JPQL/HQL) is going to return the very same object reference (disregarding the current loaded database snapshot):

ApplicationLevelRepeatableRead

In this example we can see that the first level cache prevents overwriting an already loaded entity. To prove this behavior, I came up with the following test case:

final ExecutorService executorService = Executors.newSingleThreadExecutor();

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		Product product = new Product();
		product.setId(1L);
		product.setQuantity(7L);
		session.persist(product);
		return null;
	}
});

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		final Product product = (Product) session.get(Product.class, 1L);
		try {
			executorService.submit(new Callable<Void>() {
				@Override
				public Void call() throws Exception {
					return doInTransaction(new TransactionCallable<Void>() {
						@Override
						public Void execute(Session _session) {
							Product otherThreadProduct = (Product) _session.get(Product.class, 1L);
							assertNotSame(product, otherThreadProduct);
							otherThreadProduct.setQuantity(6L);
							return null;
						}
					});
				}
			}).get();
			Product reloadedProduct = (Product) session.createQuery("from Product").uniqueResult();
			assertEquals(7L, reloadedProduct.getQuantity());
			assertEquals(6L, ((Number) session.createSQLQuery("select quantity from Product where id = :id").setParameter("id", product.getId()).uniqueResult()).longValue());
		} catch (Exception e) {
			fail(e.getMessage());
		}
		return null;
	}
});

This test case clearly illustrates the differences between entity queries and SQL projections. While SQL query projections always load the latest database state, entity query results are managed by the first level cache, ensuring session-level repeatable reads.

Workaround 1: If your use case demands reloading the latest database entity state then you can simply refresh the entity in question.

Workaround 2: If you want an entity to be disassociated from the Hibernate first level cache you can easily evict it, so the next entity query can use the latest database entity value.

Beyond prejudice

Hibernate is a means, not a goal. A data access layer requires both reads and writes and neither plain-old JDBC nor Hibernate are one-size-fits-all solutions. A data knowledge stack is much more appropriate for getting the most of your data read queries and write DML statements.

While native SQL remains the de facto relational data reading technique, Hibernate excels in writing data. Hibernate is a persistence framework and you should never forget that. Loading entities makes sense if you plan on propagating changes back to the database. You don’t need to load entities for displaying read-only views, an SQL projection being a much better alternative in this case.

Session-level repeatable reads prevent lost updates in concurrent writes scenarios, so there’s a good reason why entities don’t get refreshed automatically. Maybe we’ve chosen to manually flush dirty properties and an automated entity refresh might overwrite synchronized pending changes.

Designing the data access patterns is not a trivial task to do and a solid integration testing foundation is worth investing in. To avoid any unknown behaviors, I strongly advise you to validate all automatically generated SQL statements to prove their effectiveness and efficiency.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

MongoDB Incremental Migration Scripts

Introduction

An incremental software development process requires an incremental database migration strategy.

I remember working on an enterprise application where the hibernate.hbm2ddl.auto was the default data migration tool.

Updating the production environment required intensive preparation and the migration scripts were only created on-the-spot. An unforeseen error could have led production data corruption.

Incremental updates to the rescue

The incremental database update is a technical feature that needs to be addressed in the very first application development iterations.

We used to develop our own custom data migration implementations and spending time on writing/supporting frameworks is always working against your current project budget.

A project must be packed with both application code and all associated database schema/data updates scripts. Using incremental migration scripts allows us to automate the deployment process and to take advantage of continuous delivery.

Nowadays you don’t have to implement data migration tools, Flyway does a better job than all our previous custom frameworks. All database schema and data changes have to be recorded in incremental update scripts following a well-defined naming convention.

A RDBMS migration plan addresses both schema and data changes. It’s always good to separate schema and data changes. Integration tests might only use the schema migration scripts in conjunction with test-time related data .

Flyway supports all major relation database systems but for NoSQL (e.g. MongoDB) your need to look somewhere else.

Mongeez

Mongeez is an open-source project aiming to automate MongoDB data migration. MongoDB is schema-less, so migration scripts are only targeting data updates only.

Integrating mongeez

First you have to define a mongeez configuration file:

mongeez.xml

<changeFiles>
    <file path="v1_1__initial_data.js"/>
    <file path="v1_2__update_products.js"/>
</changeFiles>

Then you add the actual migrate scripts:

v1_1__initial_data.js

//mongeez formatted javascript
//changeset system:v1_1
db.product.insert({
    "_id": 1,
    "name" : "TV",
    "price" : 199.99,
    "currency" : 'USD',
    "quantity" : 5,
    "version" : 1
});
db.product.insert({
    "_id": 2,
    "name" : "Radio",
    "price" : 29.99,
    "currency" : 'USD',
    "quantity" : 3,
    "version" : 1
});

v1_2__update_products.js

//mongeez formatted javascript
//changeset system:v1_2
db.product.update(
    {
        name : 'TV'
    },
    {
         $inc : {
             price : -10,
             version : 1
         }
    },
    {
        multi: true
    }
);

And you need to add the MongeezRunner too:

<bean id="mongeez" class="org.mongeez.MongeezRunner" depends-on="mongo">
	<property name="mongo" ref="mongo"/>
	<property name="executeEnabled" value="true"/>
	<property name="dbName" value="${mongo.dbname}"/>
	<property name="file" value="classpath:mongodb/migration/mongeez.xml"/>
</bean>

Running mongeez

When the application first starts, the incremental scripts will be analyzed and only run if necessary:

INFO  [main]: o.m.r.FilesetXMLReader - Num of changefiles 2
INFO  [main]: o.m.ChangeSetExecutor - ChangeSet v1_1 has been executed
INFO  [main]: o.m.ChangeSetExecutor - ChangeSet v1_2 has been executed

Mongeez uses a separate MongoDB collection to record previously run scripts:

db.mongeez.find().pretty();
{
        "_id" : ObjectId("543b69eeaac7e436b2ce142d"),
        "type" : "configuration",
        "supportResourcePath" : true
}
{
        "_id" : ObjectId("543b69efaac7e436b2ce142e"),
        "type" : "changeSetExecution",
        "file" : "v1_1__initial_data.js",
        "changeId" : "v1_1",
        "author" : "system",
        "resourcePath" : "mongodb/migration/v1_1__initial_data.js",
        "date" : "2014-10-13T08:58:07+03:00"
}
{
        "_id" : ObjectId("543b69efaac7e436b2ce142f"),
        "type" : "changeSetExecution",
        "file" : "v1_2__update_products.js",
        "changeId" : "v1_2",
        "author" : "system",
        "resourcePath" : "mongodb/migration/v1_2__update_products.js",
        "date" : "2014-10-13T08:58:07+03:00"
}

Conclusion

To automate the deployment process you need to create self-sufficient packs, containing both bytecode and all associated configuration (xml files, resource bundles and data migration scripts). Before starting writing your own custom framework, you should always investigate for available open-source alternatives.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Integration testing done right with Embedded MongoDB

Introduction

Unit testing requires isolating individual components from their dependencies. Dependencies are replaced with mocks, which simulate certain use cases. This way, we can validate the in-test component behavior across various external context scenarios.

Web components can be unit tested using mock business logic services. Services can be tested against mock data access repositories. But the data access layer is not a good candidate for unit testing, because database statements need to be validated against an actual running database system.

Integration testing database options

Ideally, our tests should run against a production-like database. But using a dedicated database server is not feasible, as we most likely have more than one developer to run such integration test-suites. To isolate concurrent test runs, each developer would require a dedicated database catalog. Adding a continuous integration tool makes matters worse since more tests would have to be run in parallel.

Lesson 1: We need a forked test-suite bound database

When a test suite runs, a database must be started and only made available to that particular test-suite instance. Basically we have the following options:

  • An in-memory embedded database
  • A temporary spawned database process

The fallacy of in-memory database testing

Java offers multiple in-memory relational database options to choose from:

Embedding an in-memory database is fast and each JVM can run it’s own isolated database. But we no longer test against the actual production-like database engine because our integration tests will validate the application behavior for a non-production database system.

Using an ORM tool may provide the false impression that all database are equal, especially when all generated SQL code is SQL-92 compliant.

What’s good for the ORM tool database support may deprive you from using database specific querying features (window functions, Common table expressions, PIVOT).

So the integration testing in-memory database might not support such advanced queries. This can lead to reduced code coverage or to pushing developers to only use the common-yet-limited SQL querying features.

Even if your production database engine provides an in-memory variant, there may still be operational differences between the actual and the lightweight database versions.

Lesson 2: In-memory databases may give you the false impression that your code will also run on a production database

Spawning a production-like temporary database

Testing against the actual production database is much more valuable and that’s why I grew to appreciate this alternative.

When using MongoDB we can choose the embedded mongo plugin. This open-source project creates an external database process that can be bound to the current test-suite life-cycle.

If you’re using Maven, you can take advantage of the embedmongo-maven-plugin:

<plugin>
	<groupId>com.github.joelittlejohn.embedmongo</groupId>
	<artifactId>embedmongo-maven-plugin</artifactId>
	<version>${embedmongo.plugin.version}</version>
	<executions>
		<execution>
			<id>start</id>
			<goals>
				<goal>start</goal>
			</goals>
			<configuration>
				<port>${embedmongo.port}</port>
				<version>${mongo.test.version}</version>
				<databaseDirectory>${project.build.directory}/mongotest</databaseDirectory>
				<bindIp>127.0.0.1</bindIp>
			</configuration>
		</execution>
		<execution>
			<id>stop</id>
			<goals>
				<goal>stop</goal>
			</goals>
		</execution>
	</executions>
</plugin>

When running the plugin, the following actions are taken:

  1. A MongoDB pack is downloaded

    [INFO] --- embedmongo-maven-plugin:0.1.12:start (start) @ mongodb-facts ---
    Download Version{2.6.1}:Windows:B64 START
    Download Version{2.6.1}:Windows:B64 DownloadSize: 135999092
    Download Version{2.6.1}:Windows:B64 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100% Download Version{2.6.1}:Windows:B64 downloaded with 3320kb/s
    Download Version{2.6.1}:Windows:B64 DONE
    
  2. Upon starting a new test suite, the MongoDB pack is unzipped under a unique location in the OS temp folder

    Extract C:\Users\vlad\.embedmongo\win32\mongodb-win32-x86_64-2008plus-2.6.1.zip START
    Extract C:\Users\vlad\.embedmongo\win32\mongodb-win32-x86_64-2008plus-2.6.1.zip DONE
    
  3. The embedded MongoDB instance is started.

    [mongod output]note: noprealloc may hurt performance in many applications
    [mongod output] 2014-10-09T23:25:16.889+0300 [DataFileSync] warning: --syncdelay 0 is not recommended and can have strange performance
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] MongoDB starting : pid=2384 port=51567 dbpath=D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest 64-bit host=VLAD
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] db version v2.6.1
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] git version: 4b95b086d2374bdcfcdf2249272fb552c9c726e8
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] allocator: system
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] options: { net: { bindIp: "127.0.0.1", http: { enabled: false }, port: 51567 }, security: { authorization: "disabled" }, storage: { dbPath: "D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest", journal: { enabled: false }, preallocDataFiles: false, smallFiles: true, syncPeriodSecs: 0.0 } }
    [mongod output] 2014-10-09T23:25:17.179+0300 [FileAllocator] allocating new datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.ns, filling with zeroes...
    [mongod output] 2014-10-09T23:25:17.179+0300 [FileAllocator] creating directory D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\_tmp
    [mongod output] 2014-10-09T23:25:17.240+0300 [FileAllocator] done allocating datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.ns, size: 16MB,  took 0.059 secs
    [mongod output] 2014-10-09T23:25:17.240+0300 [FileAllocator] allocating new datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.0, filling with zeroes...
    [mongod output] 2014-10-09T23:25:17.262+0300 [FileAllocator] done allocating datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.0, size: 16MB,  took 0.021 secs
    [mongod output] 2014-10-09T23:25:17.262+0300 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
    [mongod output] 2014-10-09T23:25:17.262+0300 [initandlisten]     added index to empty collection
    [mongod output] 2014-10-09T23:25:17.263+0300 [initandlisten] waiting for connections on port 51567
    [mongod output] Oct 09, 2014 11:25:17 PM MongodExecutable start
    INFO: de.flapdoodle.embed.mongo.config.MongodConfigBuilder$ImmutableMongodConfig@26b3719c
    
  4. For the life-time of the current test-suite you can see the embedded-mongo process:

    C:\Users\vlad>netstat -ano | findstr 51567
      TCP    127.0.0.1:51567        0.0.0.0:0              LISTENING       8500
      
    C:\Users\vlad>TASKLIST /FI "PID eq 8500"
    
    Image Name                     PID Session Name        Session#    Mem Usage
    ========================= ======== ================ =========== ============
    extract-0eecee01-117b-4d2     8500 RDP-Tcp#0                  1     44,532 K  
    

    embed-mongo

  5. When the test-suite is finished the embeded-mongo is stopped

    [INFO] --- embedmongo-maven-plugin:0.1.12:stop (stop) @ mongodb-facts ---
    2014-10-09T23:25:21.187+0300 [initandlisten] connection accepted from 127.0.0.1:64117 #11 (1 connection now open)
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] terminating, shutdown command received
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] dbexit: shutdown called
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to close listening sockets...
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] closing listening socket: 520
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to flush diaglog...
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to close sockets...
    [mongod output] 2014-10-09T23:25:21.190+0300 [conn11] shutdown: waiting for fs preallocator...
    [mongod output] 2014-10-09T23:25:21.190+0300 [conn11] shutdown: closing all files...
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] closeAllFiles() finished
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] shutdown: removing fs lock...
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] dbexit: really exiting now
    [mongod output] Oct 09, 2014 11:25:21 PM de.flapdoodle.embed.process.runtime.ProcessControl stopOrDestroyProcess
    

Conclusion

The embed-mongo plugin is nowhere slower than any in-memory relation database systems. It makes me wonder why there isn’t such an option for open-source RDBMS (e.g. PostgreSQL). This is a great open-source project idea and maybe Flapdoodle OSS will offer support for relational databases too.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Logical vs physical clock optimistic locking

Introduction

In my previous post I demonstrated why optimistic locking is the only viable solution for application-level transactions. Optimistic locking requires a version column that can be represented as:

  • a physical clock (a timestamp value taken from the system clock)
  • a logical clock (an incrementing numeric value)

This article will demonstrate why logical clocks are better suited for optimistic locking mechanisms.

System time

The system time is provided by the operating system internal clocking algorithm. The programmable interval timer periodically sends an interrupt signal (with a frequency of 1.193182 MHz). The CPU receives the time interruption and increments a tick counter.

Both Unix and Window record time as the number of ticks since a predefined absolute time reference (an epoch). The operating system clock resolution varies from 1ms (Android) to 100ns (Windows) and to 1ns (Unix).

Monotonic time

To order events, the version must advance monotonically. While incrementing a local counter is a monotonic function, system time might not always return monotonic timestamps.

Java has two ways of fetching the current system time. You can either use:

  1. System#currentTimeMillis(), that gives you the number of milliseconds elapsed since Unix epoch

    This method doesn’t give you monotonic time results because it returns the wall clock time which is prone to both forward and backward adjustments (if NTP is used for system time synchronization).

    For monotonic currentTimeMillis, you can check Peter Lawrey’s solution or Bitronix Transaction Manager Monotonic Clock.

  2. System#nanoTime(), that returns the number of nanoseconds elapsed since an arbitrarily chosen time reference
  3. This method tries to use the current operating system monotonic clock implementation, but it falls back to wall clock time if no monotonic clock could be found.

Argument 1: System time is not always monotonically incremented.

Database timestamp precision

The SQL-92 standard defines the TIMESTAMP data type as YYYY-MM-DD hh:mm:ss. The fraction part is optional and each database implements a specific timestamp data type:

RDBMS Timestamp resolution
Oracle TIMESTAMP(9) may use up to 9 fractional digits (nano second precision).
MSSQL DATETIME2 has a precision of 100ns.
MySQL MySQL 5.6.4 added microseconds precision support for TIME, DATETIME, and TIMESTAMP types (e.g. TIMESTAMP(6)).
Previous MySQL versions discard the fractional part of all temporal types.
PostgreSQL Both TIME and TIMESTAMP types have microsecond precision.
DB2 TIMESTAMP(12) may use up to 12 fractional digits (picosecond precision).

When it comes to persisting timestamps, most database servers offer at least 6 fractional digits. MySQL users have long been waiting for a more precise temporal type and the 5.6.4 version had finally added microsecond precision.

On a pre-5.6.4 MySQL database server, updates might be lost during the lifespan of any given second. That’s because all transactions updating the same database row will see the same version timestamp (which points to the beginning of the current running second).

Argument 2: Pre-5.6.4 MySQL versions only support second precision timestamps.

Handling time is not that easy

Incrementing a local version number is always safer because this operation doesn’t depends on any external factors. If the database row already contains a higher version number your data has become stale. It’s as simple as that.

On the other hand, time is one of the most complicated dimension to deal with. If you don’t believe me, check the for daylight saving time handling considerations.

It took 8 versions for Java to finally come up with a mature Date/Time API. Handling time across application layers (from JavaScript, to Java middle-ware to database date/time types) makes matters worse.

Argument 3: Handling system time is a challenging job. You have to hanlde leap seconds, daylight saving, time zones and various time standards

Lessons from distributed computing

Optimistic locking is all about event ordering, so naturally we’re only interested in the happened-before relationship.

In distributed computing, logical clocks are favored over physical ones (system clock), because networks time synchronization implies variable latencies.

Sequence number versioning is similar to Lamport timestamps algorithm, each event incrementing only one counter.

While Lamport timestamps was defined for multiple distributed nodes event synchronization, database optimistic locking is much simpler, because there is only on node (the database server) where all transactions are synchronized (coming from concurrent client connections).

Argument 4: Distributed computing favors logical clock over physical ones, because we are only interested in event ordering anyway.

Conclusion

Using physical time might seem convenient at first, but it turns out to be a naive solution. In a distributed environment, perfect system time synchronization is mostly unlikely. All in all, you should always prefer logical clocks when implementing an optimistic locking mechanism.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

One year of blogging

Teaching is my way of learning

Exactly one year ago today, I wrote my very first blog post. It’s been such a long journey ever since, so it’s time to draw a line and review all my technical writing accomplishments.

I realized that sharing knowledge is a way of pushing myself to reason thoroughly on a particular subject. So, both my readers and I have something to learn from my writing. Finding time to think of future blog topics, researching particular subjects, writing code snippets and the ever-present pre-publishing reviews is worth the hassle.

Under the umbrella

Internet is huge, so being heard is not something you would leave to chance. From the start I knew that I needed to do more than writing high quality articles. When nobody knows anything about you, your only chance is strategic marketing.

Being an avid Java DZone reader I was already familiar with their MVB program, so I decided to give it a shoot. I also submitted a collaboration proposal to JavaCodeGeeks and to my surprise I got accepted soon after my first published post.

Several well-received articles and Allen Coin proposed me for the Dev of the Week column. That’s when I also became a DZone MVB.

Both DZone and JavaCodeGeeks allowed me to reach a much larger audience, so I am grateful for the chance they offered me.

Meeting true Java heroes

This journey allowed me to meet so many great people I would never have had the chance of knowing otherwise.

Lukas Eder (jOOQ founder) was one the first people to find my articles interesting. After two months of blogging, he proposed me for the 100 High-Quality Java Developers’ Blogs list. With his great jOOQ framework and clever marketing skills, Lukas managed to build a large audience on various networking channels (blog, Reddit, Twitter, Google+). Without him promoting my posts, it would have been much more difficult to create so many connections with other software enthusiasts.

Eugen Paraschiv (owner of Baeldung) is definitely the person we should all look up to. Romanian IT industry has developed considerably, but I always felt we fall short on great software figures. Well, he’s passion for software craftsmanship is a secret ingredient for becoming successful in our industry. He’s been listing my articles in many of his personal weekly reviews, allowing my posts to reach his very impressive followers network. I’ve been applying many of his wise marketing advices and I can assure you they work like magic.

Petri Kainulainen (blogger and Spring Data book author) has been a great influence throughout my technical writing apprenticeship. I am a big fan of his articles and I’m fascinated by his ever improving concerns. Without him retweeting my articles, I wouldn’t have got to almost 300 Twitter followers.

The list can go on with Thorben Janssen, Rob Diana and many other Twitter followers finding my articles interesting and worth sharing.

While I first decided joining Twitter for article sharing, I soon discovered a great network of passionate developers. In less than a year I managed to get 286 followers:

meta_12m_twitter

Open-source contribution

From the very beginning, I created a GitHub account to host blog posts code samples. On one project of ours, I realized we were missing a connection pooling monitoring tool, so I decided to write my own open-source framework.

That’s how FlexyPool was born and from real-estate platforms to banking industry (US and Swiss), from GitHub traffic statistics, I can tell that some major companies have added connection pooling improvements tickets back-linking FlexyPool.

Towards becoming a professional trainer

Throughout my software development career, I kept on seeing all sorts of ORM and Data Access anti-patterns. That’s why I decided to create my own open-source Hibernate Master Class training material.

I started answering Hibernate StackOverflow questions since May 2014. StackOverflow allows you to see what users are struggling with, so you can better sense what’s more important to address in your training material.

meta_12m_stackoverflow

Tokens of appreciation

Jelastic started searching for the most interesting developers in the world and after submitting my request I was chosen as one of August most interesting developers.

Time for statistics

In my first year, I managed to write 70 posts which have been visited 88k times (on average 1250 views per article):

meta_12m_WP_months

DZone published 65 articles that have been viewed 388k times (on average 6000 views per article):

meta_12m_DZone

Top viewers by country

meta_12m_WP_world

My top five articles

Name Views
Time to break free from the SQL-92 mindset 4430
MongoDB and the fine art of data modelling 3773
The anatomy of Connection Pooling 3051
A beginner’s guide to ACID and database transactions 2752
JOOQ Facts: From JPA Annotations to JOOQ Table Mappings 2614

My Java DZone top five articles

Name Views
Code Review Best Practices 18156
MongoDB Time Series: Introducing the Aggregation Framework 16152
Batch Processing Best Practices 14415
Good vs Bad Leader 12671
MongoDB Facts: Over 80,000 Inserts/Second on Commodity Hardware 11991

Blog followers

meta_12m_WP_followers

Conclusion

Thank you for reading my blog. Without my readers, I would be writing in vain. Thanks for helping me throughout my first year of technical writing.

For the next year, I plan on finishing Hibernate Master Class and the Unfolding Java Transaction open-source book.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The fastest way of drawing UML class diagrams

A picture is worth a thousand words

Understanding a software design proposal is so much easier once you can actually visualize it. While writing diagrams might take you an extra effort, the small time investment will pay off when others will require less time understanding your proposal.

Software is a means, not a goal

We are writing software to supports other people business requirements. Understanding business goals is the first step towards coming up with an effective design proposal. After gathering input from your product owner, you should write down the business story. Writing it makes you reason more about the business goal and the product owner can validate your comprehension.

After the business goals are clear you need to move to technical challenges. A software design proposal is derived from both business and technical requirements. The quality of service may pose certain challenges that are better addressed by a specific design pattern or software architecture.

The class diagram drawing hassle

My ideal diagram drawing tool will simply transpose my hand-drawing sketches to a digital format. Unfortunately I haven’t yet found such tool, so this is how I do it:

  1. I hand draw all concepts and interactions on a piece of paper. That’s the most rapid way of design prototyping. While I could use a UML drawing tool, I prefer the paper-and-pencil approach, because changes require much less effort
  2. Once I settle for a design proposal, I start writing down the interfaces and request/response objects in plain Java classes. Changing the classes is pretty easy, thanks to IntelliJ IDEA refactoring tools.
  3. When all Java classes are ready, I simply delegate the class diagram drawing to IntelliJ IDEA

In the end, this is what you end up with:

flexy-pool-class-diagram

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.