A beginner’s guide to Java time zone handling

Basic time notions

Most web applications have to support different time-zones and properly handling time-zones is no way easy. To make matters worse, you have to make sure that timestamps are consistent across various programming languages (e.g. JavaScript on the front-end, Java in the middle-ware and MongoDB as the data repository). This post aims to explain the basic notions of absolute and relative time.

Epoch

An epoch is a an absolute time reference. Most programming languages (e.g Java, JavaScript, Python) use the Unix epoch (Midnight 1 January 1970) when expressing a given timestamp as the number of milliseconds elapsed since a fixed point-in-time reference.

Relative numerical timestamp

The relative numerical timestamp is expressed as the number of milliseconds elapsed since epoch.

Time zone

The coordinated universal time (UTC) is the most common time standard. The UTC time zone (equivalent to GMT) represents the time reference all other time zones relate to (through a positive/negative offset).

UTC time zone is commonly refereed as Zulu time (Z) or UTC+0. Japan time zone is UTC+9 and Honolulu time zone is UTC-10. At the time of Unix epoch (1 January 1970 00:00 UTC time zone) it was 1 January 1970 09:00 in Tokyo and 31 December 1969 14:00 in Honolulu.

ISO 8601

ISO 8601 is the most widespread date/time representation standard and it uses the following date/time formats:

Time zone Notation
UTC 1970-01-01T00:00:00.000+00:00
UTC Zulu time 1970-01-01T00:00:00.000+Z
Tokio 1970-01-01T00:00:00.000+09:00
Honolulu 1969-12-31T14:00:00.000-10:00

Java time basics

java.util.Date

java.util.Date is definitely the most common time-related class. It represents a fixed point in time, expressed as the relative number of milliseconds elapsed since epoch. java.util.Date is time zone independent, except for the toString method which uses a the local time zone for generating a String representation.

java.util.Calendar

The java.util.Calendar is both a Date/Time factory as well as a time zone aware timing instance. It’s one of the least user-friendly Java API class to work with and we can demonstrate this in the following example:

@Test
public void testTimeZonesWithCalendar() throws ParseException {
	assertEquals(0L, newCalendarInstanceMillis("GMT").getTimeInMillis());
	assertEquals(TimeUnit.HOURS.toMillis(-9), newCalendarInstanceMillis("Japan").getTimeInMillis());
	assertEquals(TimeUnit.HOURS.toMillis(10), newCalendarInstanceMillis("Pacific/Honolulu").getTimeInMillis());
	Calendar epoch = newCalendarInstanceMillis("GMT");
	epoch.setTimeZone(TimeZone.getTimeZone("Japan"));
	assertEquals(TimeUnit.HOURS.toMillis(-9), epoch.getTimeInMillis());
}

private Calendar newCalendarInstance(String timeZoneId) {
	Calendar calendar = new GregorianCalendar();
	calendar.set(Calendar.YEAR, 1970);
	calendar.set(Calendar.MONTH, 0);
	calendar.set(Calendar.DAY_OF_MONTH, 1);
	calendar.set(Calendar.HOUR_OF_DAY, 0);
	calendar.set(Calendar.MINUTE, 0);
	calendar.set(Calendar.SECOND, 0);
	calendar.set(Calendar.MILLISECOND, 0);
	calendar.setTimeZone(TimeZone.getTimeZone(timeZoneId));
	return calendar;
}

At the time of Unix epoch (the UTC time zone), Tokyo time was nine hours ahead, while Honolulu was ten hours behind.

Changing a Calendar time zone preserves the actual time while shifting the zone offset. The relative timestamp changes along with the Calendar time zone offset.

Joda-Time and Java 8 Date Time API simply make java.util.Calandar obsolete so you no longer have to employ this quirky API.

org.joda.time.DateTime

Joda-Time aims to fix the legacy Date/Time API by offering:

With Joda-Time this is how our previous test case looks like:


@Test
public void testTimeZonesWithDateTime() throws ParseException {
	assertEquals(0L, newDateTimeMillis("GMT").toDate().getTime());
	assertEquals(TimeUnit.HOURS.toMillis(-9), newDateTimeMillis("Japan").toDate().getTime());
	assertEquals(TimeUnit.HOURS.toMillis(10), newDateTimeMillis("Pacific/Honolulu").toDate().getTime());
	DateTime epoch = newDateTimeMillis("GMT");
	assertEquals("1970-01-01T00:00:00.000Z", epoch.toString());
	epoch = epoch.toDateTime(DateTimeZone.forID("Japan"));
	assertEquals(0, epoch.toDate().getTime());
	assertEquals("1970-01-01T09:00:00.000+09:00", epoch.toString());
	MutableDateTime mutableDateTime = epoch.toMutableDateTime();
	mutableDateTime.setChronology(ISOChronology.getInstance().withZone(DateTimeZone.forID("Japan")));
	assertEquals("1970-01-01T09:00:00.000+09:00", epoch.toString());
}


private DateTime newDateTimeMillis(String timeZoneId) {
	return new DateTime(DateTimeZone.forID(timeZoneId))
			.withYear(1970)
			.withMonthOfYear(1)
			.withDayOfMonth(1)
			.withTimeAtStartOfDay();
}

The DateTime fluent API is much easier to use than java.util.Calendar#set. DateTime is immutable but we can easily switch to a MutableDateTime if it’s appropriate for our current use case.

Compared to our Calendar test case, when changing the time zone the relative timestamp doesn’t change a bit, therefore remaining the same original point in time.

It’s only the human time perception that changes (1970-01-01T00:00:00.000Z and 1970-01-01T09:00:00.000+09:00 pointing to the very same absolute time).

Relative vs Absolute time instances

When supporting time zones, you basically have two main alternatives: a relative timestamp and an absolute time info.

Relative timestamp

The numeric timestamp representation (the numbers of milliseconds since epoch) is a relative info. This value is given against the UTC epoch but you still need a time zone to properly represent the actual time on a particular region.

Being a long value, it’s the most compact time representation and it’s ideal when exchanging huge amounts of data.

If you don’t know the original event time zone, you risk of displaying a timestamp against the current local time zone and this is not always desirable.

Absolute timestamp

The absolute timestamp contains both the relative time as well as the time zone info. It’s quite common to express timestamps in their ISO 8601 string representation.

Compared to the numerical form (a 64 bit long) the string representation is less compact and it might take up to 25 characters (200 bits in UTF-8 encoding).

The ISO 8601 is quite common in XML files because the XML schema uses a lexical format inspired by the ISO 8601 standard.

An absolute time representation is much more convenient when we want to reconstruct the time instance against the original time zone. An e-mail client might want to display the email creation date using the sender’s time zone, and this can only be achieved using absolute timestamps.

Puzzles

The following exercise aims to demonstrate how difficult is to properly handle an ISO 8601 compliant date/time structure using the ancient java.text.DateFormat utilities.

java.text.SimpleDateFormat

First we are going to test the java.text.SimpleDateFormat parsing capabilities using the following test logic:

/**
 * DateFormat parsing utility
 * @param pattern date/time pattern
 * @param dateTimeString date/time string value
 * @param expectedNumericTimestamp expected millis since epoch 
 */
private void dateFormatParse(String pattern, String dateTimeString, long expectedNumericTimestamp) {
	try {
		Date utcDate = new SimpleDateFormat(pattern).parse(dateTimeString);
		if(expectedNumericTimestamp != utcDate.getTime()) {
			LOGGER.warn("Pattern: {}, date: {} actual epoch {} while expected epoch: {}", new Object[]{pattern, dateTimeString, utcDate.getTime(), expectedNumericTimestamp});
		}
	} catch (ParseException e) {
		LOGGER.warn("Pattern: {}, date: {} threw {}", new Object[]{pattern, dateTimeString, e.getClass().getSimpleName()});
	}
}

Use case 1

Let’s see how various ISO 8601 patterns behave against this first parser:

dateFormatParse("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", "1970-01-01T00:00:00.200Z", 200L);

Yielding the following outcome:

Pattern: yyyy-MM-dd'T'HH:mm:ss.SSS'Z', date: 1970-01-01T00:00:00.200Z actual epoch -7199800 while expected epoch: 200

This pattern is not ISO 8601 compliant. The single quote character is an escape sequence so the final ‘Z’ symbol is not treated as a time directive (e.g. Zulu time). After parsing, we’ll simply get a local time zone Date reference.

This test was run using my current system default Europe/Athens time zone, which as of writing this post, it’s two hours ahead of UTC.

Use case 2

According to java.util.SimpleDateFormat documentation the following pattern: yyyy-MM-dd’T’HH:mm:ss.SSSZ should match an ISO 8601 date/time string value:

dateFormatParse("yyyy-MM-dd'T'HH:mm:ss.SSSZ", "1970-01-01T00:00:00.200Z", 200L);

But instead we got the following exception:

Pattern: yyyy-MM-dd'T'HH:mm:ss.SSSZ, date: 1970-01-01T00:00:00.200Z threw ParseException

So this pattern doesn’t seem to parse the Zulu time UTC string values.

Use case 3

The following patterns works just fine for explicit offsets:

dateFormatParse("yyyy-MM-dd'T'HH:mm:ss.SSSZ", "1970-01-01T00:00:00.200+0000", 200L);

Use case 4

This pattern is also compatible with other time zone offsets:

dateFormatParse("yyyy-MM-dd'T'HH:mm:ss.SSSZ", "1970-01-01T00:00:00.200+0100", 200L - 1000 * 60 * 60);

Use case 5

To match the Zulu time notation we need to use the following pattern:

dateFormatParse("yyyy-MM-dd'T'HH:mm:ss.SSSXXX", "1970-01-01T00:00:00.200Z", 200L);

Use case 6

Unfortunately, this last pattern is not compatible with explicit time zone offsets:

dateFormatParse("yyyy-MM-dd'T'HH:mm:ss.SSSXXX", "1970-01-01T00:00:00.200+0000", 200L);

Ending-up with the following exception:

Pattern: yyyy-MM-dd'T'HH:mm:ss.SSSXXX, date: 1970-01-01T00:00:00.200+0000 threw ParseException

org.joda.time.DateTime

As opposed to java.text.SimpleDateFormat, Joda-Time is compatible with any ISO 8601 pattern. The following test case is going to be used for the upcoming test cases:

/**
 * Joda-Time parsing utility
 * @param dateTimeString date/time string value
 * @param expectedNumericTimestamp expected millis since epoch
 */
private void jodaTimeParse(String dateTimeString, long expectedNumericTimestamp) {
	Date utcDate = DateTime.parse(dateTimeString).toDate();
	if(expectedNumericTimestamp != utcDate.getTime()) {
		LOGGER.warn("date: {} actual epoch {} while expected epoch: {}", new Object[]{dateTimeString, utcDate.getTime(), expectedNumericTimestamp});
	}
}

Joda-Time is compatible with all standard ISO 8601 date/time formats:

jodaTimeParse("1970-01-01T00:00:00.200Z", 200L);
jodaTimeParse("1970-01-01T00:00:00.200+0000", 200L);
jodaTimeParse("1970-01-01T00:00:00.200+0100", 200L - 1000 * 60 * 60);

Conclusion

As you can see, the ancient Java Date/Time utilities are not easy to work with. Joda-Time is a much better alternative, offering better time handling features.

If you happen to work with Java 8, it’s worth switching to the Java 8 Date/Time API, being designed from scratch but very much inspired by Joda-Time.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

An entity modelling strategy for scaling optimistic locking

Introduction

Application-level repeatable reads are suitable for preventing lost updates in web conversations. Enabling entity-level optimistic locking is fairly easy. You just have to mark one logical-clock property (usually an integer counter) with the JPA @Version annotation and Hibernate takes care of the rest.

The catch

Optimistic locking discards all incoming changes that are relative to an older entity version. But everything has a cost and optimistic locking makes no difference.

The optimistic concurrency control mechanism takes an all-or-nothing approach even for non-overlapping changes. If two concurrent transactions are changing distinct entity property subsets then there’s no risk of losing updates.

Two concurrent updates, starting from the same entity version are always going to collide. It’s only the first update that’s going to succeed, the second one failing with an optimistic locking exception. This strict policy acts as if all changes are overlapping. For highly concurrent write scenarios, this single-version check strategy can lead to a large number of roll-backed updates.

Time for testing

Let’s say we have the following Product entity:

OptimisticLockingOneProductEntityOneVersion

This entity is updated by three users (e.g. Alice, Bob and Vlad), each one updating a distinct property subset. The following diagram depicts their actions:

OptimisticLockingOneRootEntityOneVersion

The SQL DML statement sequence goes like this:

#create tables
Query:{[create table product (id bigint not null, description varchar(255) not null, likes integer not null, name varchar(255) not null, price numeric(19,2) not null, quantity bigint not null, version integer not null, primary key (id))][]} 
Query:{[alter table product add constraint UK_jmivyxk9rmgysrmsqw15lqr5b  unique (name)][]} 

#insert product
Query:{[insert into product (description, likes, name, price, quantity, version, id) values (?, ?, ?, ?, ?, ?, ?)][Plasma TV,0,TV,199.99,7,0,1]} 

#Alice selects the product
Query:{[select optimistic0_.id as id1_0_0_, optimistic0_.description as descript2_0_0_, optimistic0_.likes as likes3_0_0_, optimistic0_.name as name4_0_0_, optimistic0_.price as price5_0_0_, optimistic0_.quantity as quantity6_0_0_, optimistic0_.version as version7_0_0_ from product optimistic0_ where optimistic0_.id=?][1]} 
#Bob selects the product
Query:{[select optimistic0_.id as id1_0_0_, optimistic0_.description as descript2_0_0_, optimistic0_.likes as likes3_0_0_, optimistic0_.name as name4_0_0_, optimistic0_.price as price5_0_0_, optimistic0_.quantity as quantity6_0_0_, optimistic0_.version as version7_0_0_ from product optimistic0_ where optimistic0_.id=?][1]} 
#Vlad selects the product
Query:{[select optimistic0_.id as id1_0_0_, optimistic0_.description as descript2_0_0_, optimistic0_.likes as likes3_0_0_, optimistic0_.name as name4_0_0_, optimistic0_.price as price5_0_0_, optimistic0_.quantity as quantity6_0_0_, optimistic0_.version as version7_0_0_ from product optimistic0_ where optimistic0_.id=?][1]} 

#Alice updates the product
Query:{[update product set description=?, likes=?, name=?, price=?, quantity=?, version=? where id=? and version=?][Plasma TV,0,TV,199.99,6,1,1,0]} 

#Bob updates the product
Query:{[update product set description=?, likes=?, name=?, price=?, quantity=?, version=? where id=? and version=?][Plasma TV,1,TV,199.99,7,1,1,0]} 
c.v.h.m.l.c.OptimisticLockingOneRootOneVersionTest - Bob: Optimistic locking failure
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.OptimisticLockingOneRootOneVersionTest$Product#1]

#Vlad updates the product
Query:{[update product set description=?, likes=?, name=?, price=?, quantity=?, version=? where id=? and version=?][Plasma HDTV,0,TV,199.99,7,1,1,0]} 
c.v.h.m.l.c.OptimisticLockingOneRootOneVersionTest - Vlad: Optimistic locking failure
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.OptimisticLockingOneRootOneVersionTest$Product#1]

Because there’s only one entity version, it’s just the first transaction that’s going to succeed. The second and the third updates are discarded since they reference an older entity version.

Divide et impera

If there are more than one writing patterns, we can divide the original entity into several sub-entities. Instead of only one optimistic locking counter, we now have one distinct counter per each sub-entity. In our example, the quantity can be moved to ProductStock and the likes to ProductLiking.

OptimisticLockingOneProductEntityMultipleVersions

Whenever we change the product quantity, it’s only the ProductStock version that’s going to be checked, so other competing quantity updates are prevented. But now, we can concurrently update both the main entity (e.g. Product) and each individual sub-entity (e.g. ProductStock and ProductLiking):

OptimisticLockingOneRootEntityMultipleVersions

Running the previous test case yields the following output:

#create tables
Query:{[create table product (id bigint not null, description varchar(255) not null, name varchar(255) not null, price numeric(19,2) not null, version integer not null, primary key (id))][]} 
Query:{[create table product_liking (likes integer not null, product_id bigint not null, primary key (product_id))][]} 
Query:{[create table product_stock (quantity bigint not null, product_id bigint not null, primary key (product_id))][]} 
Query:{[alter table product add constraint UK_jmivyxk9rmgysrmsqw15lqr5b  unique (name)][]} 
Query:{[alter table product_liking add constraint FK_4oiot8iambqw53dwcldltqkco foreign key (product_id) references product][]} 
Query:{[alter table product_stock add constraint FK_hj4kvinsv4h5gi8xi09xbdl46 foreign key (product_id) references product][]} 

#insert product
Query:{[insert into product (description, name, price, version, id) values (?, ?, ?, ?, ?)][Plasma TV,TV,199.99,0,1]} 
Query:{[insert into product_liking (likes, product_id) values (?, ?)][0,1]} 
Query:{[insert into product_stock (quantity, product_id) values (?, ?)][7,1]} 

#Alice selects the product
Query:{[select optimistic0_.id as id1_0_0_, optimistic0_.description as descript2_0_0_, optimistic0_.name as name3_0_0_, optimistic0_.price as price4_0_0_, optimistic0_.version as version5_0_0_ from product optimistic0_ where optimistic0_.id=?][1]} 
Query:{[select optimistic0_.product_id as product_2_1_0_, optimistic0_.likes as likes1_1_0_ from product_liking optimistic0_ where optimistic0_.product_id=?][1]} 
Query:{[select optimistic0_.product_id as product_2_2_0_, optimistic0_.quantity as quantity1_2_0_ from product_stock optimistic0_ where optimistic0_.product_id=?][1]} 

#Bob selects the product
Query:{[select optimistic0_.id as id1_0_0_, optimistic0_.description as descript2_0_0_, optimistic0_.name as name3_0_0_, optimistic0_.price as price4_0_0_, optimistic0_.version as version5_0_0_ from product optimistic0_ where optimistic0_.id=?][1]} 
Query:{[select optimistic0_.product_id as product_2_1_0_, optimistic0_.likes as likes1_1_0_ from product_liking optimistic0_ where optimistic0_.product_id=?][1]} 
Query:{[select optimistic0_.product_id as product_2_2_0_, optimistic0_.quantity as quantity1_2_0_ from product_stock optimistic0_ where optimistic0_.product_id=?][1]} 

#Vlad selects the product
Query:{[select optimistic0_.id as id1_0_0_, optimistic0_.description as descript2_0_0_, optimistic0_.name as name3_0_0_, optimistic0_.price as price4_0_0_, optimistic0_.version as version5_0_0_ from product optimistic0_ where optimistic0_.id=?][1]} 
Query:{[select optimistic0_.product_id as product_2_1_0_, optimistic0_.likes as likes1_1_0_ from product_liking optimistic0_ where optimistic0_.product_id=?][1]} 
Query:{[select optimistic0_.product_id as product_2_2_0_, optimistic0_.quantity as quantity1_2_0_ from product_stock optimistic0_ where optimistic0_.product_id=?][1]} 

#Alice updates the product
Query:{[update product_stock set quantity=? where product_id=?][6,1]} 

#Bob updates the product
Query:{[update product_liking set likes=? where product_id=?][1,1]} 

#Vlad updates the product
Query:{[update product set description=?, name=?, price=?, version=? where id=? and version=?][Plasma HDTV,TV,199.99,1,1,0]} 

All three concurrent transactions are successful because we no longer have only one logical-clock version but three of them, according to three distinct write responsibilities.

Conclusion

When designing the persistence domain model, you have to take into consideration both the querying and writing responsibility patterns.

Breaking a larger entity into several sub-entities can help you scale updates, while reducing the chance of optimistic locking failures. If you wary of possible performance issues (due to entity state fragmentation) you should then know that Hibernate offers several optimization techniques for overcoming the scattered entity info side-effect.

You can always join all sub-entities in a single SQL query, in case you need all entity related data.

The second-level caching is also a good solution for fetching sub-entities without hitting the database. Because we split the root entity in several entities, the cache can be better utilized. A stock update is only going to invalidate the associated ProductStock cache entry, without interfering with Product and ProductLiking cache regions.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate collections optimistic locking

Introduction

Hibernate provides an optimistic locking mechanism to prevent lost updates even for long-conversations. In conjunction with an entity storage, spanning over multiple user requests (extended persistence context or detached entities) Hibernate can guarantee application-level repeatable-reads.

The dirty checking mechanism detects entity state changes and increments the entity version. While basic property changes are always taken into consideration, Hibernate collections are more subtle in this regard.

Owned vs Inverse collections

In relational databases, two records are associated through a foreign key reference. In this relationship, the referenced record is the parent while the referencing row (the foreign key side) is the child. A non-null foreign key may only reference an existing parent record.

In the Object-oriented space this association can be represented in both directions. We can have a many-to-one reference from a child to parent and the parent can also have a one-to-many children collection.

Because both sides could potentially control the database foreign key state, we must ensure that only one side is the owner of this association. Only the owning side state changes are propagated to the database. The non-owning side has been traditionally referred as the inverse side.

Next I’ll describe the most common ways of modelling this association.

The unidirectional parent-owning-side-child association mapping

Only the parent side has a @OneToMany non-inverse children collection. The child entity doesn’t reference the parent entity at all.

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
	private List<Comment> comments = new ArrayList<Comment>();
	...
}	

The unidirectional parent-owning-side-child component association mapping mapping

The child side doesn’t always have to be an entity and we might model it as a component type instead. An Embeddable object (component type) may contain both basic types and association mappings but it can never contain an @Id. The Embeddable object is persisted/removed along with its owning entity.

The parent has an @ElementCollection children association. The child entity may only reference the parent through the non-queryable Hibernate specific @Parent annotation.

@Entity(name = "post")
public class Post {
	...
	@ElementCollection
	@JoinTable(name = "post_comments", joinColumns = @JoinColumn(name = "post_id"))
	@OrderColumn(name = "comment_index")
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Embeddable
public class Comment {
	...
	@Parent
	private Post post;
	...
}

The bidirectional parent-owning-side-child association mapping

The parent is the owning side so it has a @OneToMany non-inverse (without a mappedBy directive) children collection. The child entity references the parent entity through a @ManyToOne association that’s neither insertable nor updatable:

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Entity(name = "comment")
public class Comment {
	...
	@ManyToOne
	@JoinColumn(name = "post_id", insertable = false, updatable = false)
	private Post post;
	...
}

The bidirectional child-owning-side-parent association mapping

The child entity references the parent entity through a @ManyToOne association, and the parent has a mappedBy @OneToMany children collection. The parent side is the inverse side so only the @ManyToOne state changes are propagated to the database.

Even if there’s only one owning side, it’s always a good practice to keep both sides in sync by using the add/removeChild() methods.

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true, mappedBy = "post")
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Entity(name = "comment")
public class Comment {
	...
	@ManyToOne
	private Post post;	
	...
}

The unidirectional child-owning-side-parent association mapping

The child entity references the parent through a @ManyToOne association. The parent doesn’t have a @OneToMany children collection so the child entity becomes the owning side. This association mapping resembles the relational data foreign key linkage.

@Entity(name = "comment")
public class Comment {
    ...
    @ManyToOne
    private Post post;	
    ...
}

Collection versioning

The 3.4.2 section of the JPA 2.1 specification defines optimistic locking as:

The version attribute is updated by the persistence provider runtime when the object is written to the database. All non-relationship fields and proper ties and all relationships owned by the entity are included in version checks[35].

[35] This includes owned relationships maintained in join tables

N.B. Only owning-side children collection can update the parent version.

Testing time

Let’s test how the parent-child association type affects the parent versioning. Because we are interested in the children collection dirty checking, the unidirectional child-owning-side-parent association is going to be skipped, as in that case the parent doesn’t contain a children collection.

Test case

The following test case is going to be used for all collection type use cases:

protected void simulateConcurrentTransactions(final boolean shouldIncrementParentVersion) {
	final ExecutorService executorService = Executors.newSingleThreadExecutor();

	doInTransaction(new TransactionCallable<Void>() {
		@Override
		public Void execute(Session session) {
			try {
				P post = postClass.newInstance();
				post.setId(1L);
				post.setName("Hibernate training");
				session.persist(post);
				return null;
			} catch (Exception e) {
				throw new IllegalArgumentException(e);
			}
		}
	});

	doInTransaction(new TransactionCallable<Void>() {
		@Override
		public Void execute(final Session session) {
			final P post = (P) session.get(postClass, 1L);
			try {
				executorService.submit(new Callable<Void>() {
					@Override
					public Void call() throws Exception {
						return doInTransaction(new TransactionCallable<Void>() {
							@Override
							public Void execute(Session _session) {
								try {
									P otherThreadPost = (P) _session.get(postClass, 1L);
									int loadTimeVersion = otherThreadPost.getVersion();
									assertNotSame(post, otherThreadPost);
									assertEquals(0L, otherThreadPost.getVersion());
									C comment = commentClass.newInstance();
									comment.setReview("Good post!");
									otherThreadPost.addComment(comment);
									_session.flush();
									if (shouldIncrementParentVersion) {
										assertEquals(otherThreadPost.getVersion(), loadTimeVersion + 1);
									} else {
										assertEquals(otherThreadPost.getVersion(), loadTimeVersion);
									}
									return null;
								} catch (Exception e) {
									throw new IllegalArgumentException(e);
								}
							}
						});
					}
				}).get();
			} catch (Exception e) {
				throw new IllegalArgumentException(e);
			}
			post.setName("Hibernate Master Class");
			session.flush();
			return null;
		}
	});
}

The unidirectional parent-owning-side-child association testing

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comment (post_id bigint not null, comments_id bigint not null, comment_index integer not null, primary key (post_id, comment_index))][]} 
Query:{[alter table post_comment add constraint UK_se9l149iyyao6va95afioxsrl  unique (comments_id)][]} 
Query:{[alter table post_comment add constraint FK_se9l149iyyao6va95afioxsrl foreign key (comments_id) references comment][]} 
Query:{[alter table post_comment add constraint FK_6o1igdm04v78cwqre59or1yj1 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 

#insert comment in secondary transaction
#optimistic locking post version update in secondary transaction
Query:{[insert into comment (id, review) values (default, ?)][Good post!]} 
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate training,1,1,0]} 
Query:{[insert into post_comment (post_id, comment_index, comments_id) values (?, ?, ?)][1,0,1]} 

#optimistic locking exception in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]}
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.EntityOptimisticLockingOnUnidirectionalCollectionTest$Post#1] 

The unidirectional parent-owning-side-child component association testing

#create tables
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comments (post_id bigint not null, review varchar(255), comment_index integer not null, primary key (post_id, comment_index))][]} 
Query:{[alter table post_comments add constraint FK_gh9apqeduab8cs0ohcq1dgukp foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_0_0_, entityopti0_.name as name2_0_0_, entityopti0_.version as version3_0_0_ from post entityopti0_ where entityopti0_.id=?][1]} 
Query:{[select comments0_.post_id as post_id1_0_0_, comments0_.review as review2_1_0_, comments0_.comment_index as comment_3_0_ from post_comments comments0_ where comments0_.post_id=?][1]} 

#insert comment in secondary transaction
#optimistic locking post version update in secondary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate training,1,1,0]} 
Query:{[insert into post_comments (post_id, comment_index, review) values (?, ?, ?)][1,0,Good post!]} 

#optimistic locking exception in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.EntityOptimisticLockingOnComponentCollectionTest$Post#1]	

The bidirectional parent-owning-side-child association testing

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), post_id bigint, primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comment (post_id bigint not null, comments_id bigint not null)][]} 
Query:{[alter table post_comment add constraint UK_se9l149iyyao6va95afioxsrl  unique (comments_id)][]} 
Query:{[alter table comment add constraint FK_f1sl0xkd2lucs7bve3ktt3tu5 foreign key (post_id) references post][]} 
Query:{[alter table post_comment add constraint FK_se9l149iyyao6va95afioxsrl foreign key (comments_id) references comment][]} 
Query:{[alter table post_comment add constraint FK_6o1igdm04v78cwqre59or1yj1 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 
Query:{[select comments0_.post_id as post_id1_1_0_, comments0_.comments_id as comments2_2_0_, entityopti1_.id as id1_0_1_, entityopti1_.post_id as post_id3_0_1_, entityopti1_.review as review2_0_1_, entityopti2_.id as id1_1_2_, entityopti2_.name as name2_1_2_, entityopti2_.version as version3_1_2_ from post_comment comments0_ inner join comment entityopti1_ on comments0_.comments_id=entityopti1_.id left outer join post entityopti2_ on entityopti1_.post_id=entityopti2_.id where comments0_.post_id=?][1]} 

#insert comment in secondary transaction
#optimistic locking post version update in secondary transaction
Query:{[insert into comment (id, review) values (default, ?)][Good post!]} 
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate training,1,1,0]} 
Query:{[insert into post_comment (post_id, comments_id) values (?, ?)][1,1]} 

#optimistic locking exception in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 
org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [com.vladmihalcea.hibernate.masterclass.laboratory.concurrency.EntityOptimisticLockingOnBidirectionalParentOwningCollectionTest$Post#1]

The bidirectional child-owning-side-parent association testing

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), post_id bigint, primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[alter table comment add constraint FK_f1sl0xkd2lucs7bve3ktt3tu5 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 

#insert comment in secondary transaction
#post version is not incremented in secondary transaction
Query:{[insert into comment (id, post_id, review) values (default, ?, ?)][1,Good post!]} 
Query:{[select count(id) from comment where post_id =?][1]} 

#update works in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 

Overruling default collection versioning

If the default owning-side collection versioning is not suitable for your use case, you can always overrule it with Hibernate @OptimisticLock annotation.

Let’s overrule the default parent version update mechanism for bidirectional parent-owning-side-child association:

@Entity(name = "post")
public class Post {
	...
	@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
	@OptimisticLock(excluded = true)
	private List<Comment> comments = new ArrayList<Comment>();
	...

	public void addComment(Comment comment) {
		comment.setPost(this);
		comments.add(comment);
	}
}	

@Entity(name = "comment")
public class Comment {
	...
	@ManyToOne
	@JoinColumn(name = "post_id", insertable = false, updatable = false)
	private Post post;
	...
}

This time, the children collection changes won’t trigger a parent version update:

#create tables
Query:{[create table comment (id bigint generated by default as identity (start with 1), review varchar(255), post_id bigint, primary key (id))][]} 
Query:{[create table post (id bigint not null, name varchar(255), version integer not null, primary key (id))][]} 
Query:{[create table post_comment (post_id bigint not null, comments_id bigint not null)][]} 
Query:{[alter table post_comment add constraint UK_se9l149iyyao6va95afioxsrl  unique (comments_id)][]} 
Query:{[alter table comment add constraint FK_f1sl0xkd2lucs7bve3ktt3tu5 foreign key (post_id) references post][]} 
Query:{[alter table post_comment add constraint FK_se9l149iyyao6va95afioxsrl foreign key (comments_id) references comment][]} 
Query:{[alter table post_comment add constraint FK_6o1igdm04v78cwqre59or1yj1 foreign key (post_id) references post][]} 

#insert post in primary transaction
Query:{[insert into post (name, version, id) values (?, ?, ?)][Hibernate training,0,1]} 

#select post in secondary transaction
Query:{[select entityopti0_.id as id1_1_0_, entityopti0_.name as name2_1_0_, entityopti0_.version as version3_1_0_ from post entityopti0_ where entityopti0_.id=?][1]} 
Query:{[select comments0_.post_id as post_id1_1_0_, comments0_.comments_id as comments2_2_0_, entityopti1_.id as id1_0_1_, entityopti1_.post_id as post_id3_0_1_, entityopti1_.review as review2_0_1_, entityopti2_.id as id1_1_2_, entityopti2_.name as name2_1_2_, entityopti2_.version as version3_1_2_ from post_comment comments0_ inner join comment entityopti1_ on comments0_.comments_id=entityopti1_.id left outer join post entityopti2_ on entityopti1_.post_id=entityopti2_.id where comments0_.post_id=?][1]} 

#insert comment in secondary transaction
Query:{[insert into comment (id, review) values (default, ?)][Good post!]} 
Query:{[insert into post_comment (post_id, comments_id) values (?, ?)][1,1]} 

#update works in primary transaction
Query:{[update post set name=?, version=? where id=? and version=?][Hibernate Master Class,1,1,0]} 

Conclusion

It’s very important to understand how various modelling structures impact concurrency patterns. The owning-side collections changes are taken into consideration when incrementing the parent version number, and you can always bypass it using the @OptimisticLock annotation.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate application-level repeatable reads

Introduction

In my previous post I described how application-level transactions offer a suitable concurrency control mechanism for long conversations.

All entities are loaded within the context of a Hibernate Session, acting as a transactional write-behind cache.

A Hibernate persistence context can hold one and only one reference of a given entity. The first level cache guarantees session-level repeatable reads.

If the conversation spans over multiple requests we can have application-level repeatable reads. Long conversations are inherently stateful so we can opt for detached objects or long persistence contexts. But application-level repeatable reads require an application-level concurrency control strategy such as optimistic locking.

The catch

But this behavior may prove unexpected at times.

If your Hibernate Session has already loaded a given entity then any successive entity query (JPQL/HQL) is going to return the very same object reference (disregarding the current loaded database snapshot):

ApplicationLevelRepeatableRead

In this example we can see that the first level cache prevents overwriting an already loaded entity. To prove this behavior, I came up with the following test case:

final ExecutorService executorService = Executors.newSingleThreadExecutor();

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		Product product = new Product();
		product.setId(1L);
		product.setQuantity(7L);
		session.persist(product);
		return null;
	}
});

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		final Product product = (Product) session.get(Product.class, 1L);
		try {
			executorService.submit(new Callable<Void>() {
				@Override
				public Void call() throws Exception {
					return doInTransaction(new TransactionCallable<Void>() {
						@Override
						public Void execute(Session _session) {
							Product otherThreadProduct = (Product) _session.get(Product.class, 1L);
							assertNotSame(product, otherThreadProduct);
							otherThreadProduct.setQuantity(6L);
							return null;
						}
					});
				}
			}).get();
			Product reloadedProduct = (Product) session.createQuery("from Product").uniqueResult();
			assertEquals(7L, reloadedProduct.getQuantity());
			assertEquals(6L, ((Number) session.createSQLQuery("select quantity from Product where id = :id").setParameter("id", product.getId()).uniqueResult()).longValue());
		} catch (Exception e) {
			fail(e.getMessage());
		}
		return null;
	}
});

This test case clearly illustrates the differences between entity queries and SQL projections. While SQL query projections always load the latest database state, entity query results are managed by the first level cache, ensuring session-level repeatable reads.

Workaround 1: If your use case demands reloading the latest database entity state then you can simply refresh the entity in question.

Workaround 2: If you want an entity to be disassociated from the Hibernate first level cache you can easily evict it, so the next entity query can use the latest database entity value.

Beyond prejudice

Hibernate is a means, not a goal. A data access layer requires both reads and writes and neither plain-old JDBC nor Hibernate are one-size-fits-all solutions. A data knowledge stack is much more appropriate for getting the most of your data read queries and write DML statements.

While native SQL remains the de facto relational data reading technique, Hibernate excels in writing data. Hibernate is a persistence framework and you should never forget that. Loading entities makes sense if you plan on propagating changes back to the database. You don’t need to load entities for displaying read-only views, an SQL projection being a much better alternative in this case.

Session-level repeatable reads prevent lost updates in concurrent writes scenarios, so there’s a good reason why entities don’t get refreshed automatically. Maybe we’ve chosen to manually flush dirty properties and an automated entity refresh might overwrite synchronized pending changes.

Designing the data access patterns is not a trivial task to do and a solid integration testing foundation is worth investing in. To avoid any unknown behaviors, I strongly advise you to validate all automatically generated SQL statements to prove their effectiveness and efficiency.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

MongoDB Incremental Migration Scripts

Introduction

An incremental software development process requires an incremental database migration strategy.

I remember working on an enterprise application where the hibernate.hbm2ddl.auto was the default data migration tool.

Updating the production environment required intensive preparation and the migration scripts were only created on-the-spot. An unforeseen error could have led production data corruption.

Incremental updates to the rescue

The incremental database update is a technical feature that needs to be addressed in the very first application development iterations.

We used to develop our own custom data migration implementations and spending time on writing/supporting frameworks is always working against your current project budget.

A project must be packed with both application code and all associated database schema/data updates scripts. Using incremental migration scripts allows us to automate the deployment process and to take advantage of continuous delivery.

Nowadays you don’t have to implement data migration tools, Flyway does a better job than all our previous custom frameworks. All database schema and data changes have to be recorded in incremental update scripts following a well-defined naming convention.

A RDBMS migration plan addresses both schema and data changes. It’s always good to separate schema and data changes. Integration tests might only use the schema migration scripts in conjunction with test-time related data .

Flyway supports all major relation database systems but for NoSQL (e.g. MongoDB) your need to look somewhere else.

Mongeez

Mongeez is an open-source project aiming to automate MongoDB data migration. MongoDB is schema-less, so migration scripts are only targeting data updates only.

Integrating mongeez

First you have to define a mongeez configuration file:

mongeez.xml

<changeFiles>
    <file path="v1_1__initial_data.js"/>
    <file path="v1_2__update_products.js"/>
</changeFiles>

Then you add the actual migrate scripts:

v1_1__initial_data.js

//mongeez formatted javascript
//changeset system:v1_1
db.product.insert({
    "_id": 1,
    "name" : "TV",
    "price" : 199.99,
    "currency" : 'USD',
    "quantity" : 5,
    "version" : 1
});
db.product.insert({
    "_id": 2,
    "name" : "Radio",
    "price" : 29.99,
    "currency" : 'USD',
    "quantity" : 3,
    "version" : 1
});

v1_2__update_products.js

//mongeez formatted javascript
//changeset system:v1_2
db.product.update(
    {
        name : 'TV'
    },
    {
         $inc : {
             price : -10,
             version : 1
         }
    },
    {
        multi: true
    }
);

And you need to add the MongeezRunner too:

<bean id="mongeez" class="org.mongeez.MongeezRunner" depends-on="mongo">
	<property name="mongo" ref="mongo"/>
	<property name="executeEnabled" value="true"/>
	<property name="dbName" value="${mongo.dbname}"/>
	<property name="file" value="classpath:mongodb/migration/mongeez.xml"/>
</bean>

Running mongeez

When the application first starts, the incremental scripts will be analyzed and only run if necessary:

INFO  [main]: o.m.r.FilesetXMLReader - Num of changefiles 2
INFO  [main]: o.m.ChangeSetExecutor - ChangeSet v1_1 has been executed
INFO  [main]: o.m.ChangeSetExecutor - ChangeSet v1_2 has been executed

Mongeez uses a separate MongoDB collection to record previously run scripts:

db.mongeez.find().pretty();
{
        "_id" : ObjectId("543b69eeaac7e436b2ce142d"),
        "type" : "configuration",
        "supportResourcePath" : true
}
{
        "_id" : ObjectId("543b69efaac7e436b2ce142e"),
        "type" : "changeSetExecution",
        "file" : "v1_1__initial_data.js",
        "changeId" : "v1_1",
        "author" : "system",
        "resourcePath" : "mongodb/migration/v1_1__initial_data.js",
        "date" : "2014-10-13T08:58:07+03:00"
}
{
        "_id" : ObjectId("543b69efaac7e436b2ce142f"),
        "type" : "changeSetExecution",
        "file" : "v1_2__update_products.js",
        "changeId" : "v1_2",
        "author" : "system",
        "resourcePath" : "mongodb/migration/v1_2__update_products.js",
        "date" : "2014-10-13T08:58:07+03:00"
}

Conclusion

To automate the deployment process you need to create self-sufficient packs, containing both bytecode and all associated configuration (xml files, resource bundles and data migration scripts). Before starting writing your own custom framework, you should always investigate for available open-source alternatives.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Integration testing done right with Embedded MongoDB

Introduction

Unit testing requires isolating individual components from their dependencies. Dependencies are replaced with mocks, which simulate certain use cases. This way, we can validate the in-test component behavior across various external context scenarios.

Web components can be unit tested using mock business logic services. Services can be tested against mock data access repositories. But the data access layer is not a good candidate for unit testing, because database statements need to be validated against an actual running database system.

Integration testing database options

Ideally, our tests should run against a production-like database. But using a dedicated database server is not feasible, as we most likely have more than one developer to run such integration test-suites. To isolate concurrent test runs, each developer would require a dedicated database catalog. Adding a continuous integration tool makes matters worse since more tests would have to be run in parallel.

Lesson 1: We need a forked test-suite bound database

When a test suite runs, a database must be started and only made available to that particular test-suite instance. Basically we have the following options:

  • An in-memory embedded database
  • A temporary spawned database process

The fallacy of in-memory database testing

Java offers multiple in-memory relational database options to choose from:

Embedding an in-memory database is fast and each JVM can run it’s own isolated database. But we no longer test against the actual production-like database engine because our integration tests will validate the application behavior for a non-production database system.

Using an ORM tool may provide the false impression that all database are equal, especially when all generated SQL code is SQL-92 compliant.

What’s good for the ORM tool database support may deprive you from using database specific querying features (window functions, Common table expressions, PIVOT).

So the integration testing in-memory database might not support such advanced queries. This can lead to reduced code coverage or to pushing developers to only use the common-yet-limited SQL querying features.

Even if your production database engine provides an in-memory variant, there may still be operational differences between the actual and the lightweight database versions.

Lesson 2: In-memory databases may give you the false impression that your code will also run on a production database

Spawning a production-like temporary database

Testing against the actual production database is much more valuable and that’s why I grew to appreciate this alternative.

When using MongoDB we can choose the embedded mongo plugin. This open-source project creates an external database process that can be bound to the current test-suite life-cycle.

If you’re using Maven, you can take advantage of the embedmongo-maven-plugin:

<plugin>
	<groupId>com.github.joelittlejohn.embedmongo</groupId>
	<artifactId>embedmongo-maven-plugin</artifactId>
	<version>${embedmongo.plugin.version}</version>
	<executions>
		<execution>
			<id>start</id>
			<goals>
				<goal>start</goal>
			</goals>
			<configuration>
				<port>${embedmongo.port}</port>
				<version>${mongo.test.version}</version>
				<databaseDirectory>${project.build.directory}/mongotest</databaseDirectory>
				<bindIp>127.0.0.1</bindIp>
			</configuration>
		</execution>
		<execution>
			<id>stop</id>
			<goals>
				<goal>stop</goal>
			</goals>
		</execution>
	</executions>
</plugin>

When running the plugin, the following actions are taken:

  1. A MongoDB pack is downloaded

    [INFO] --- embedmongo-maven-plugin:0.1.12:start (start) @ mongodb-facts ---
    Download Version{2.6.1}:Windows:B64 START
    Download Version{2.6.1}:Windows:B64 DownloadSize: 135999092
    Download Version{2.6.1}:Windows:B64 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100% Download Version{2.6.1}:Windows:B64 downloaded with 3320kb/s
    Download Version{2.6.1}:Windows:B64 DONE
    
  2. Upon starting a new test suite, the MongoDB pack is unzipped under a unique location in the OS temp folder

    Extract C:\Users\vlad\.embedmongo\win32\mongodb-win32-x86_64-2008plus-2.6.1.zip START
    Extract C:\Users\vlad\.embedmongo\win32\mongodb-win32-x86_64-2008plus-2.6.1.zip DONE
    
  3. The embedded MongoDB instance is started.

    [mongod output]note: noprealloc may hurt performance in many applications
    [mongod output] 2014-10-09T23:25:16.889+0300 [DataFileSync] warning: --syncdelay 0 is not recommended and can have strange performance
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] MongoDB starting : pid=2384 port=51567 dbpath=D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest 64-bit host=VLAD
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] db version v2.6.1
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] git version: 4b95b086d2374bdcfcdf2249272fb552c9c726e8
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] allocator: system
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] options: { net: { bindIp: "127.0.0.1", http: { enabled: false }, port: 51567 }, security: { authorization: "disabled" }, storage: { dbPath: "D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest", journal: { enabled: false }, preallocDataFiles: false, smallFiles: true, syncPeriodSecs: 0.0 } }
    [mongod output] 2014-10-09T23:25:17.179+0300 [FileAllocator] allocating new datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.ns, filling with zeroes...
    [mongod output] 2014-10-09T23:25:17.179+0300 [FileAllocator] creating directory D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\_tmp
    [mongod output] 2014-10-09T23:25:17.240+0300 [FileAllocator] done allocating datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.ns, size: 16MB,  took 0.059 secs
    [mongod output] 2014-10-09T23:25:17.240+0300 [FileAllocator] allocating new datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.0, filling with zeroes...
    [mongod output] 2014-10-09T23:25:17.262+0300 [FileAllocator] done allocating datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.0, size: 16MB,  took 0.021 secs
    [mongod output] 2014-10-09T23:25:17.262+0300 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
    [mongod output] 2014-10-09T23:25:17.262+0300 [initandlisten]     added index to empty collection
    [mongod output] 2014-10-09T23:25:17.263+0300 [initandlisten] waiting for connections on port 51567
    [mongod output] Oct 09, 2014 11:25:17 PM MongodExecutable start
    INFO: de.flapdoodle.embed.mongo.config.MongodConfigBuilder$ImmutableMongodConfig@26b3719c
    
  4. For the life-time of the current test-suite you can see the embedded-mongo process:

    C:\Users\vlad>netstat -ano | findstr 51567
      TCP    127.0.0.1:51567        0.0.0.0:0              LISTENING       8500
      
    C:\Users\vlad>TASKLIST /FI "PID eq 8500"
    
    Image Name                     PID Session Name        Session#    Mem Usage
    ========================= ======== ================ =========== ============
    extract-0eecee01-117b-4d2     8500 RDP-Tcp#0                  1     44,532 K  
    

    embed-mongo

  5. When the test-suite is finished the embeded-mongo is stopped

    [INFO] --- embedmongo-maven-plugin:0.1.12:stop (stop) @ mongodb-facts ---
    2014-10-09T23:25:21.187+0300 [initandlisten] connection accepted from 127.0.0.1:64117 #11 (1 connection now open)
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] terminating, shutdown command received
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] dbexit: shutdown called
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to close listening sockets...
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] closing listening socket: 520
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to flush diaglog...
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to close sockets...
    [mongod output] 2014-10-09T23:25:21.190+0300 [conn11] shutdown: waiting for fs preallocator...
    [mongod output] 2014-10-09T23:25:21.190+0300 [conn11] shutdown: closing all files...
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] closeAllFiles() finished
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] shutdown: removing fs lock...
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] dbexit: really exiting now
    [mongod output] Oct 09, 2014 11:25:21 PM de.flapdoodle.embed.process.runtime.ProcessControl stopOrDestroyProcess
    

Conclusion

The embed-mongo plugin is nowhere slower than any in-memory relation database systems. It makes me wonder why there isn’t such an option for open-source RDBMS (e.g. PostgreSQL). This is a great open-source project idea and maybe Flapdoodle OSS will offer support for relational databases too.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Logical vs physical clock optimistic locking

Introduction

In my previous post I demonstrated why optimistic locking is the only viable solution for application-level transactions. Optimistic locking requires a version column that can be represented as:

  • a physical clock (a timestamp value taken from the system clock)
  • a logical clock (an incrementing numeric value)

This article will demonstrate why logical clocks are better suited for optimistic locking mechanisms.

System time

The system time is provided by the operating system internal clocking algorithm. The programmable interval timer periodically sends an interrupt signal (with a frequency of 1.193182 MHz). The CPU receives the time interruption and increments a tick counter.

Both Unix and Window record time as the number of ticks since a predefined absolute time reference (an epoch). The operating system clock resolution varies from 1ms (Android) to 100ns (Windows) and to 1ns (Unix).

Monotonic time

To order events, the version must advance monotonically. While incrementing a local counter is a monotonic function, system time might not always return monotonic timestamps.

Java has two ways of fetching the current system time. You can either use:

  1. System#currentTimeMillis(), that gives you the number of milliseconds elapsed since Unix epoch

    This method doesn’t give you monotonic time results because it returns the wall clock time which is prone to both forward and backward adjustments (if NTP is used for system time synchronization).

    For monotonic currentTimeMillis, you can check Peter Lawrey’s solution or Bitronix Transaction Manager Monotonic Clock.

  2. System#nanoTime(), that returns the number of nanoseconds elapsed since an arbitrarily chosen time reference
  3. This method tries to use the current operating system monotonic clock implementation, but it falls back to wall clock time if no monotonic clock could be found.

Argument 1: System time is not always monotonically incremented.

Database timestamp precision

The SQL-92 standard defines the TIMESTAMP data type as YYYY-MM-DD hh:mm:ss. The fraction part is optional and each database implements a specific timestamp data type:

RDBMS Timestamp resolution
Oracle TIMESTAMP(9) may use up to 9 fractional digits (nano second precision).
MSSQL DATETIME2 has a precision of 100ns.
MySQL MySQL 5.6.4 added microseconds precision support for TIME, DATETIME, and TIMESTAMP types (e.g. TIMESTAMP(6)).
Previous MySQL versions discard the fractional part of all temporal types.
PostgreSQL Both TIME and TIMESTAMP types have microsecond precision.
DB2 TIMESTAMP(12) may use up to 12 fractional digits (picosecond precision).

When it comes to persisting timestamps, most database servers offer at least 6 fractional digits. MySQL users have long been waiting for a more precise temporal type and the 5.6.4 version had finally added microsecond precision.

On a pre-5.6.4 MySQL database server, updates might be lost during the lifespan of any given second. That’s because all transactions updating the same database row will see the same version timestamp (which points to the beginning of the current running second).

Argument 2: Pre-5.6.4 MySQL versions only support second precision timestamps.

Handling time is not that easy

Incrementing a local version number is always safer because this operation doesn’t depends on any external factors. If the database row already contains a higher version number your data has become stale. It’s as simple as that.

On the other hand, time is one of the most complicated dimension to deal with. If you don’t believe me, check the for daylight saving time handling considerations.

It took 8 versions for Java to finally come up with a mature Date/Time API. Handling time across application layers (from JavaScript, to Java middle-ware to database date/time types) makes matters worse.

Argument 3: Handling system time is a challenging job. You have to hanlde leap seconds, daylight saving, time zones and various time standards

Lessons from distributed computing

Optimistic locking is all about event ordering, so naturally we’re only interested in the happened-before relationship.

In distributed computing, logical clocks are favored over physical ones (system clock), because networks time synchronization implies variable latencies.

Sequence number versioning is similar to Lamport timestamps algorithm, each event incrementing only one counter.

While Lamport timestamps was defined for multiple distributed nodes event synchronization, database optimistic locking is much simpler, because there is only on node (the database server) where all transactions are synchronized (coming from concurrent client connections).

Argument 4: Distributed computing favors logical clock over physical ones, because we are only interested in event ordering anyway.

Conclusion

Using physical time might seem convenient at first, but it turns out to be a naive solution. In a distributed environment, perfect system time synchronization is mostly unlikely. All in all, you should always prefer logical clocks when implementing an optimistic locking mechanism.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.