Hibernate application-level repeatable reads

Introduction

In my previous post I described how application-level transactions offer a suitable concurrency control mechanism for long conversations.

All entities are loaded within the context of a Hibernate Session, acting as a transactional write-behind cache.

A Hibernate persistence context can hold one and only one reference of a given entity. The first level cache guarantees session-level repeatable reads.

If the conversation spans over multiple requests we can have application-level repeatable reads. Long conversations are inherently stateful so we can opt for detached objects or long persistence contexts. But application-level repeatable reads require an application-level concurrency control strategy such as optimistic locking.

The catch

But this behavior may prove unexpected at times.

If your Hibernate Session has already loaded a given entity then any successive entity query (JPQL/HQL) is going to return the very same object reference (disregarding the current loaded database snapshot):

ApplicationLevelRepeatableRead

In this example we can see that the first level cache prevents overwriting an already loaded entity. To prove this behavior, I came up with the following test case:

final ExecutorService executorService = Executors.newSingleThreadExecutor();

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		Product product = new Product();
		product.setId(1L);
		product.setQuantity(7L);
		session.persist(product);
		return null;
	}
});

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		final Product product = (Product) session.get(Product.class, 1L);
		try {
			executorService.submit(new Callable<Void>() {
				@Override
				public Void call() throws Exception {
					return doInTransaction(new TransactionCallable<Void>() {
						@Override
						public Void execute(Session _session) {
							Product otherThreadProduct = (Product) _session.get(Product.class, 1L);
							assertNotSame(product, otherThreadProduct);
							otherThreadProduct.setQuantity(6L);
							return null;
						}
					});
				}
			}).get();
			Product reloadedProduct = (Product) session.createQuery("from Product").uniqueResult();
			assertEquals(7L, reloadedProduct.getQuantity());
			assertEquals(6L, ((Number) session.createSQLQuery("select quantity from Product where id = :id").setParameter("id", product.getId()).uniqueResult()).longValue());
		} catch (Exception e) {
			fail(e.getMessage());
		}
		return null;
	}
});

This test case clearly illustrates the differences between entity queries and SQL projections. While SQL query projections always load the latest database state, entity query results are managed by the first level cache, ensuring session-level repeatable reads.

Workaround 1: If your use case demands reloading the latest database entity state then you can simply refresh the entity in question.

Workaround 2: If you want an entity to be disassociated from the Hibernate first level cache you can easily evict it, so the next entity query can use the latest database entity value.

Beyond prejudice

Hibernate is a means, not a goal. A data access layer requires both reads and writes and neither plain-old JDBC nor Hibernate are one-size-fits-all solutions. A data knowledge stack is much more appropriate for getting the most of your data read queries and write DML statements.

While native SQL remains the de facto relational data reading technique, Hibernate excels in writing data. Hibernate is a persistence framework and you should never forget that. Loading entities makes sense if you plan on propagating changes back to the database. You don’t need to load entities for displaying read-only views, an SQL projection being a much better alternative in this case.

Session-level repeatable reads prevent lost updates in concurrent writes scenarios, so there’s a good reason why entities don’t get refreshed automatically. Maybe we’ve chosen to manually flush dirty properties and an automated entity refresh might overwrite synchronized pending changes.

Designing the data access patterns is not a trivial task to do and a solid integration testing foundation is worth investing in. To avoid any unknown behaviors, I strongly advise you to validate all automatically generated SQL statements to prove their effectiveness and efficiency.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

MongoDB Incremental Migration Scripts

Introduction

An incremental software development process requires an incremental database migration strategy.

I remember working on an enterprise application where the hibernate.hbm2ddl.auto was the default data migration tool.

Updating the production environment required intensive preparation and the migration scripts were only created on-the-spot. An unforeseen error could have led production data corruption.

Incremental updates to the rescue

The incremental database update is a technical feature that needs to be addressed in the very first application development iterations.

We used to develop our own custom data migration implementations and spending time on writing/supporting frameworks is always working against your current project budget.

A project must be packed with both application code and all associated database schema/data updates scripts. Using incremental migration scripts allows us to automate the deployment process and to take advantage of continuous delivery.

Nowadays you don’t have to implement data migration tools, Flyway does a better job than all our previous custom frameworks. All database schema and data changes have to be recorded in incremental update scripts following a well-defined naming convention.

A RDBMS migration plan addresses both schema and data changes. It’s always good to separate schema and data changes. Integration tests might only use the schema migration scripts in conjunction with test-time related data .

Flyway supports all major relation database systems but for NoSQL (e.g. MongoDB) your need to to look somewhere else.

Mongeez

Mongeez is an open-source project aiming to automate MongoDB data migration. MongoDB is schema-less, so migration scripts are only targeting data updates only.

Integrating mongeez

First you have to define a mongeez configuration file:

mongeez.xml

<changeFiles>
    <file path="v1_1__initial_data.js"/>
    <file path="v1_2__update_products.js"/>
</changeFiles>

Then you add the actual migrate scripts:

v1_1__initial_data.js

//mongeez formatted javascript
//changeset system:v1_1
db.product.insert({
    "_id": 1,
    "name" : "TV",
    "price" : 199.99,
    "currency" : 'USD',
    "quantity" : 5,
    "version" : 1
});
db.product.insert({
    "_id": 2,
    "name" : "Radio",
    "price" : 29.99,
    "currency" : 'USD',
    "quantity" : 3,
    "version" : 1
});

v1_2__update_products.js

//mongeez formatted javascript
//changeset system:v1_2
db.product.update(
    {
        name : 'TV'
    },
    {
         $inc : {
             price : -10,
             version : 1
         }
    },
    {
        multi: true
    }
);

And you need to add the MongeezRunner too:

<bean id="mongeez" class="org.mongeez.MongeezRunner" depends-on="mongo">
	<property name="mongo" ref="mongo"/>
	<property name="executeEnabled" value="true"/>
	<property name="dbName" value="${mongo.dbname}"/>
	<property name="file" value="classpath:mongodb/migration/mongeez.xml"/>
</bean>

Running mongeez

When the application first starts, the incremental scripts will be analyzed and only run if necessary:

INFO  [main]: o.m.r.FilesetXMLReader - Num of changefiles 2
INFO  [main]: o.m.ChangeSetExecutor - ChangeSet v1_1 has been executed
INFO  [main]: o.m.ChangeSetExecutor - ChangeSet v1_2 has been executed

Mongeez uses a separate MongoDB collection to record previously run scripts:

db.mongeez.find().pretty();
{
        "_id" : ObjectId("543b69eeaac7e436b2ce142d"),
        "type" : "configuration",
        "supportResourcePath" : true
}
{
        "_id" : ObjectId("543b69efaac7e436b2ce142e"),
        "type" : "changeSetExecution",
        "file" : "v1_1__initial_data.js",
        "changeId" : "v1_1",
        "author" : "system",
        "resourcePath" : "mongodb/migration/v1_1__initial_data.js",
        "date" : "2014-10-13T08:58:07+03:00"
}
{
        "_id" : ObjectId("543b69efaac7e436b2ce142f"),
        "type" : "changeSetExecution",
        "file" : "v1_2__update_products.js",
        "changeId" : "v1_2",
        "author" : "system",
        "resourcePath" : "mongodb/migration/v1_2__update_products.js",
        "date" : "2014-10-13T08:58:07+03:00"
}

Conclusion

To automate the deployment process you need to create self-sufficient packs, containing both bytecode and all associated configuration (xml files, resource bundles and data migration scripts). Before starting writing your own custom framework, you should always investigate for available open-source alternatives.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Integration testing done right with Embedded MongoDB

Introduction

Unit testing requires isolating individual components from their dependencies. Dependencies are replaced with mocks, which simulate certain use cases. This way, we can validate the in-test component behavior across various external context scenarios.

Web components can be unit tested using mock business logic services. Services can be tested against mock data access repositories. But the data access layer is not a good candidate for unit testing, because database statements need to be validated against an actual running database system.

Integration testing database options

Ideally, our tests should run against a production-like database. But using a dedicated database server is not feasible, as we most likely have more than one developer to run such integration test-suites. To isolate concurrent test runs, each developer would require a dedicated database catalog. Adding a continuous integration tool makes matters worse since more tests would have to be run in parallel.

Lesson 1: We need a forked test-suite bound database

When a test suite runs, a database must be started and only made available to that particular test-suite instance. Basically we have the following options:

  • An in-memory embedded database
  • A temporary spawned database process

The fallacy of in-memory database testing

Java offers multiple in-memory relational database options to choose from:

Embedding an in-memory database is fast and each JVM can run it’s own isolated database. But we no longer test against the actual production-like database engine because our integration tests will validate the application behavior for a non-production database system.

Using an ORM tool may provide the false impression that all database are equal, especially when all generated SQL code is SQL-92 compliant.

What’s good for the ORM tool database support may deprive you from using database specific querying features (window functions, Common table expressions, PIVOT).

So the integration testing in-memory database might not support such advanced queries. This can lead to reduced code coverage or to pushing developers to only use the common-yet-limited SQL querying features.

Even if your production database engine provides an in-memory variant, there may still be operational differences between the actual and the lightweight database versions.

Lesson 2: In-memory databases may give you the false impression that your code will also run on a production database

Spawning a production-like temporary database

Testing against the actual production database is much more valuable and that’s why I grew to appreciate this alternative.

When using MongoDB we can choose the embedded mongo plugin. This open-source project creates an external database process that can be bound to the current test-suite life-cycle.

If you’re using Maven, you can take advantage of the embedmongo-maven-plugin:

<plugin>
	<groupId>com.github.joelittlejohn.embedmongo</groupId>
	<artifactId>embedmongo-maven-plugin</artifactId>
	<version>${embedmongo.plugin.version}</version>
	<executions>
		<execution>
			<id>start</id>
			<goals>
				<goal>start</goal>
			</goals>
			<configuration>
				<port>${embedmongo.port}</port>
				<version>${mongo.test.version}</version>
				<databaseDirectory>${project.build.directory}/mongotest</databaseDirectory>
				<bindIp>127.0.0.1</bindIp>
			</configuration>
		</execution>
		<execution>
			<id>stop</id>
			<goals>
				<goal>stop</goal>
			</goals>
		</execution>
	</executions>
</plugin>

When running the plugin, the following actions are taken:

  1. A MongoDB pack is downloaded

    [INFO] --- embedmongo-maven-plugin:0.1.12:start (start) @ mongodb-facts ---
    Download Version{2.6.1}:Windows:B64 START
    Download Version{2.6.1}:Windows:B64 DownloadSize: 135999092
    Download Version{2.6.1}:Windows:B64 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100% Download Version{2.6.1}:Windows:B64 downloaded with 3320kb/s
    Download Version{2.6.1}:Windows:B64 DONE
    
  2. Upon starting a new test suite, the MongoDB pack is unzipped under a unique location in the OS temp folder

    Extract C:\Users\vlad\.embedmongo\win32\mongodb-win32-x86_64-2008plus-2.6.1.zip START
    Extract C:\Users\vlad\.embedmongo\win32\mongodb-win32-x86_64-2008plus-2.6.1.zip DONE
    
  3. The embedded MongoDB instance is started.

    [mongod output]note: noprealloc may hurt performance in many applications
    [mongod output] 2014-10-09T23:25:16.889+0300 [DataFileSync] warning: --syncdelay 0 is not recommended and can have strange performance
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] MongoDB starting : pid=2384 port=51567 dbpath=D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest 64-bit host=VLAD
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] db version v2.6.1
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] git version: 4b95b086d2374bdcfcdf2249272fb552c9c726e8
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] allocator: system
    [mongod output] 2014-10-09T23:25:16.891+0300 [initandlisten] options: { net: { bindIp: "127.0.0.1", http: { enabled: false }, port: 51567 }, security: { authorization: "disabled" }, storage: { dbPath: "D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest", journal: { enabled: false }, preallocDataFiles: false, smallFiles: true, syncPeriodSecs: 0.0 } }
    [mongod output] 2014-10-09T23:25:17.179+0300 [FileAllocator] allocating new datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.ns, filling with zeroes...
    [mongod output] 2014-10-09T23:25:17.179+0300 [FileAllocator] creating directory D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\_tmp
    [mongod output] 2014-10-09T23:25:17.240+0300 [FileAllocator] done allocating datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.ns, size: 16MB,  took 0.059 secs
    [mongod output] 2014-10-09T23:25:17.240+0300 [FileAllocator] allocating new datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.0, filling with zeroes...
    [mongod output] 2014-10-09T23:25:17.262+0300 [FileAllocator] done allocating datafile D:\wrk\vladmihalcea\vladmihalcea.wordpress.com\mongodb-facts\target\mongotest\local.0, size: 16MB,  took 0.021 secs
    [mongod output] 2014-10-09T23:25:17.262+0300 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
    [mongod output] 2014-10-09T23:25:17.262+0300 [initandlisten]     added index to empty collection
    [mongod output] 2014-10-09T23:25:17.263+0300 [initandlisten] waiting for connections on port 51567
    [mongod output] Oct 09, 2014 11:25:17 PM MongodExecutable start
    INFO: de.flapdoodle.embed.mongo.config.MongodConfigBuilder$ImmutableMongodConfig@26b3719c
    
  4. For the life-time of the current test-suite you can see the embedded-mongo process:

    C:\Users\vlad>netstat -ano | findstr 51567
      TCP    127.0.0.1:51567        0.0.0.0:0              LISTENING       8500
      
    C:\Users\vlad>TASKLIST /FI "PID eq 8500"
    
    Image Name                     PID Session Name        Session#    Mem Usage
    ========================= ======== ================ =========== ============
    extract-0eecee01-117b-4d2     8500 RDP-Tcp#0                  1     44,532 K  
    

    embed-mongo

  5. When the test-suite is finished the embeded-mongo is stopped

    [INFO] --- embedmongo-maven-plugin:0.1.12:stop (stop) @ mongodb-facts ---
    2014-10-09T23:25:21.187+0300 [initandlisten] connection accepted from 127.0.0.1:64117 #11 (1 connection now open)
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] terminating, shutdown command received
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] dbexit: shutdown called
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to close listening sockets...
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] closing listening socket: 520
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to flush diaglog...
    [mongod output] 2014-10-09T23:25:21.189+0300 [conn11] shutdown: going to close sockets...
    [mongod output] 2014-10-09T23:25:21.190+0300 [conn11] shutdown: waiting for fs preallocator...
    [mongod output] 2014-10-09T23:25:21.190+0300 [conn11] shutdown: closing all files...
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] closeAllFiles() finished
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] shutdown: removing fs lock...
    [mongod output] 2014-10-09T23:25:21.191+0300 [conn11] dbexit: really exiting now
    [mongod output] Oct 09, 2014 11:25:21 PM de.flapdoodle.embed.process.runtime.ProcessControl stopOrDestroyProcess
    

Conclusion

The embed-mongo plugin is nowhere slower than any in-memory relation database systems. It makes me wonder why there isn’t such an option for open-source RDBMS (e.g. PostgreSQL). This is a great open-source project idea and maybe Flapdoodle OSS will offer support for relational databases too.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Logical vs physical clock optimistic locking

Introduction

In my previous post I demonstrated why optimistic locking is the only viable solution for application-level transactions. Optimistic locking requires a version column that can be represented as:

  • a physical clock (a timestamp value taken from the system clock)
  • a logical clock (an incrementing numeric value)

This article will demonstrate why logical clocks are better suited for optimistic locking mechanisms.

System time

The system time is provided by the operating system internal clocking algorithm. The programmable interval timer periodically sends an interrupt signal (with a frequency of 1.193182 MHz). The CPU receives the time interruption and increments a tick counter.

Both Unix and Window record time as the number of ticks since a predefined absolute time reference (an epoch). The operating system clock resolution varies from 1ms (Android) to 100ns (Windows) and to 1ns (Unix).

Monotonic time

To order events, the version must advance monotonically. While incrementing a local counter is a monotonic function, system time might not always return monotonic timestamps.

Java has two ways of fetching the current system time. You can either use:

  1. System#currentTimeMillis(), that gives you the number of milliseconds elapsed since Unix epoch

    This method doesn’t give you monotonic time results because it returns the wall clock time which is prone to both forward and backward adjustments (if NTP is used for system time synchronization).

    For monotonic currentTimeMillis, you can check Peter Lawrey’s solution or Bitronix Transaction Manager Monotonic Clock.

  2. System#nanoTime(), that returns the number of nanoseconds elapsed since an arbitrarily chosen time reference
  3. This method tries to use the current operating system monotonic clock implementation, but it falls back to wall clock time if no monotonic clock could be found.

Argument 1: System time is not always monotonically incremented.

Database timestamp precision

The SQL-92 standard defines the TIMESTAMP data type as YYYY-MM-DD hh:mm:ss. The fraction part is optional and each database implements a specific timestamp data type:

RDBMS Timestamp resolution
Oracle TIMESTAMP(9) may use up to 9 fractional digits (nano second precision).
MSSQL DATETIME2 has a precision of 100ns.
MySQL MySQL 5.6.4 added microseconds precision support for TIME, DATETIME, and TIMESTAMP types (e.g. TIMESTAMP(6)).
Previous MySQL versions discard the fractional part of all temporal types.
PostgreSQL Both TIME and TIMESTAMP types have microsecond precision.
DB2 TIMESTAMP(12) may use up to 12 fractional digits (picosecond precision).

When it comes to persisting timestamps, most database servers offer at least 6 fractional digits. MySQL users have long been waiting for a more precise temporal type and the 5.6.4 version had finally added microsecond precision.

On a pre-5.6.4 MySQL database server, updates might be lost during the lifespan of any given second. That’s because all transactions updating the same database row will see the same version timestamp (which points to the beginning of the current running second).

Argument 2: Pre-5.6.4 MySQL versions only support second precision timestamps.

Handling time is not that easy

Incrementing a local version number is always safer because this operation doesn’t depends on any external factors. If the database row already contains a higher version number your data has become stale. It’s as simple as that.

On the other hand, time is one of the most complicated dimension to deal with. If you don’t believe me, check the for daylight saving time handling considerations.

It took 8 versions for Java to finally come up with a mature Date/Time API. Handling time across application layers (from JavaScript, to Java middle-ware to database date/time types) makes matters worse.

Argument 3: Handling system time is a challenging job. You have to hanlde leap seconds, daylight saving, time zones and various time standards

Lessons from distributed computing

Optimistic locking is all about event ordering, so naturally we’re only interested in the happened-before relationship.

In distributed computing, logical clocks are favored over physical ones (system clock), because networks time synchronization implies variable latencies.

Sequence number versioning is similar to Lamport timestamps algorithm, each event incrementing only one counter.

While Lamport timestamps was defined for multiple distributed nodes event synchronization, database optimistic locking is much simpler, because there is only on node (the database server) where all transactions are synchronized (coming from concurrent client connections).

Argument 4: Distributed computing favors logical clock over physical ones, because we are only interested in event ordering anyway.

Conclusion

Using physical time might seem convenient at first, but it turns out to be a naive solution. In a distributed environment, perfect system time synchronization is mostly unlikely. All in all, you should always prefer logical clocks when implementing an optimistic locking mechanism.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

One year of blogging

Teaching is my way of learning

Exactly one year ago today, I wrote my very first blog post. It’s been such a long journey ever since, so it’s time to draw a line and review all my technical writing accomplishments.

I realized that sharing knowledge is a way of pushing myself to reason thoroughly on a particular subject. So, both my readers and I have something to learn from my writing. Finding time to think of future blog topics, researching particular subjects, writing code snippets and the ever-present pre-publishing reviews is worth the hassle.

Under the umbrella

Internet is huge, so being heard is not something you would leave to chance. From the start I knew that I needed to do more than writing high quality articles. When nobody knows anything about you, your only chance is strategic marketing.

Being an avid Java DZone reader I was already familiar with their MVB program, so I decided to give it a shoot. I also submitted a collaboration proposal to JavaCodeGeeks and to my surprise I got accepted soon after my first published post.

Several well-received articles and Allen Coin proposed me for the Dev of the Week column. That’s when I also became a DZone MVB.

Both DZone and JavaCodeGeeks allowed me to reach a much larger audience, so I am grateful for the chance they offered me.

Meeting true Java heroes

This journey allowed me to meet so many great people I would never have had the chance of knowing otherwise.

Lukas Eder (jOOQ founder) was one the first people to find my articles interesting. After two months of blogging, he proposed me for the 100 High-Quality Java Developers’ Blogs list. With his great jOOQ framework and clever marketing skills, Lukas managed to build a large audience on various networking channels (blog, Reddit, Twitter, Google+). Without him promoting my posts, it would have been much more difficult to create so many connections with other software enthusiasts.

Eugen Paraschiv (owner of Baeldung) is definitely the person we should all look up to. Romanian IT industry has developed considerably, but I always felt we fall short on great software figures. Well, he’s passion for software craftsmanship is a secret ingredient for becoming successful in our industry. He’s been listing my articles in many of his personal weekly reviews, allowing my posts to reach his very impressive followers network. I’ve been applying many of his wise marketing advices and I can assure you they work like magic.

Petri Kainulainen (blogger and Spring Data book author) has been a great influence throughout my technical writing apprenticeship. I am a big fan of his articles and I’m fascinated by his ever improving concerns. Without him retweeting my articles, I wouldn’t have got to almost 300 Twitter followers.

The list can go on with Thorben Janssen, Rob Diana and many other Twitter followers finding my articles interesting and worth sharing.

While I first decided joining Twitter for article sharing, I soon discovered a great network of passionate developers. In less than a year I managed to get 286 followers:

meta_12m_twitter

Open-source contribution

From the very beginning, I created a GitHub account to host blog posts code samples. On one project of ours, I realized we were missing a connection pooling monitoring tool, so I decided to write my own open-source framework.

That’s how FlexyPool was born and from real-estate platforms to banking industry (US and Swiss), from GitHub traffic statistics, I can tell that some major companies have added connection pooling improvements tickets back-linking FlexyPool.

Towards becoming a professional trainer

Throughout my software development career, I kept on seeing all sorts of ORM and Data Access anti-patterns. That’s why I decided to create my own open-source Hibernate Master Class training material.

I started answering Hibernate StackOverflow questions since May 2014. StackOverflow allows you to see what users are struggling with, so you can better sense what’s more important to address in your training material.

meta_12m_stackoverflow

Tokens of appreciation

Jelastic started searching for the most interesting developers in the world and after submitting my request I was chosen as one of August most interesting developers.

Time for statistics

In my first year, I managed to write 70 posts which have been visited 88k times (on average 1250 views per article):

meta_12m_WP_months

DZone published 65 articles that have been viewed 388k times (on average 6000 views per article):

meta_12m_DZone

Top viewers by country

meta_12m_WP_world

My top five articles

Name Views
Time to break free from the SQL-92 mindset 4430
MongoDB and the fine art of data modelling 3773
The anatomy of Connection Pooling 3051
A beginner’s guide to ACID and database transactions 2752
JOOQ Facts: From JPA Annotations to JOOQ Table Mappings 2614

My Java DZone top five articles

Name Views
Code Review Best Practices 18156
MongoDB Time Series: Introducing the Aggregation Framework 16152
Batch Processing Best Practices 14415
Good vs Bad Leader 12671
MongoDB Facts: Over 80,000 Inserts/Second on Commodity Hardware 11991

Blog followers

meta_12m_WP_followers

Conclusion

Thank you for reading my blog. Without my readers, I would be writing in vain. Thanks for helping me throughout my first year of technical writing.

For the next year, I plan on finishing Hibernate Master Class and the Unfolding Java Transaction open-source book.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The fastest way of drawing UML class diagrams

A picture is worth a thousand words

Understanding a software design proposal is so much easier once you can actually visualize it. While writing diagrams might take you an extra effort, the small time investment will pay off when others will require less time understanding your proposal.

Software is a means, not a goal

We are writing software to supports other people business requirements. Understanding business goals is the first step towards coming up with an effective design proposal. After gathering input from your product owner, you should write down the business story. Writing it makes you reason more about the business goal and the product owner can validate your comprehension.

After the business goals are clear you need to move to technical challenges. A software design proposal is derived from both business and technical requirements. The quality of service may pose certain challenges that are better addressed by a specific design pattern or software architecture.

The class diagram drawing hassle

My ideal diagram drawing tool will simply transpose my hand-drawing sketches to a digital format. Unfortunately I haven’t yet found such tool, so this is how I do it:

  1. I hand draw all concepts and interactions on a piece of paper. That’s the most rapid way of design prototyping. While I could use a UML drawing tool, I prefer the paper-and-pencil approach, because changes require much less effort
  2. Once I settle for a design proposal, I start writing down the interfaces and request/response objects in plain Java classes. Changing the classes is pretty easy, thanks to IntelliJ IDEA refactoring tools.
  3. When all Java classes are ready, I simply delegate the class diagram drawing to IntelliJ IDEA

In the end, this is what you end up with:

flexy-pool-class-diagram

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Preventing lost updates in long conversations

Introduction

All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (BEGIN/COMMIT/ROLLBACK). Data integrity is enforced by the ACID properties of database transactions.

Logical vs Physical transactions

An logical transaction is an application-level unit of work that may span over multiple physical (database) transactions. Holding the database connection open throughout several user requests, including user think time, is definitely an anti-pattern.

A database server can accommodate a limited number of physical connections, and often those are reused by using connection pooling. Holding limited resources for long periods of time hinders scalability. So database transactions must be short to ensure that both database locks and the pooled connections are released as soon as possible.

Web applications entail a read-modify-write conversational pattern. A web conversation consists of multiple user requests, all operations being logically connected to the same application-level transaction. A typical use case goes like this:

  1. Alice requests a certain product for being displayed
  2. The product is fetched from the database and returned to the browser
  3. Alice requests a product modification
  4. The product must be updated and saved to the database

All these operations should be encapsulated in a single unit-of-work. We therefore need an application-level transaction that’s also ACID compliant, because other concurrent users might modify the same entities, long after shared locks had been released.

In my previous post I introduced the perils of lost updates. The database transaction ACID properties can only prevent this phenomena within the boundaries of a single physical transaction. Pushing transaction boundaries into the application layer requires application-level ACID guarantees.

To prevent lost updates we must have application-level repeatable reads along with a concurrency control mechanisms.

Long conversations

HTTP is a stateless protocol. Stateless applications are always easier to scale than stateful ones, but conversations can’t be stateless.

Hibernate offers two strategies for implementing long conversations:

  • Extended persistence context
  • Detached objects

Extended persistence context

After the first database transaction ends the JDBC connection is closed (usually going back to the connection pool) and the Hibernate session becomes disconnected. A new user request will reattach the original Session. Only the last physical transaction must issue DML operations, as otherwise the application-level transaction is not an atomic unit of work.

For disabling persistence in the course of the application-level transaction, we have the following options:

  • We can disable automatic flushing, by switching the Session FlushMode to MANUAL. At the end of the last physical transaction, we need to explicitly call Session#flush() to propagate the entity state transitions.
  • All but the last transaction are marked read-only. For read-only transactions Hibernate disables both dirty checking and the default automatic flushing.

    The read-only flag might propagate to the underlying JDBC Connection, so the driver might enable some database-level read-only optimizations.

    The last transaction must be writeable so that all changes are flushed and committed.

Using an extended persistence context is more convenient since entities remain attached across multiple user requests. The downside is the memory footprint. The persistence context might easily grow with every new fetched entity. Hibernate default dirty checking mechanism uses a deep-comparison strategy, comparing all properties of all managed entities. The larger the persistence context, the slower the dirty checking mechanism will get.

This can be mitigated by evicting entities that don’t need to be propagated to the last physical transaction.

Java Enterprise Edition offers a very convenient programming model through the use of @Stateful Session Beans along with an EXTENDED PersistenceContext.

All extended persistence context examples set the default transaction propagation to NOT_SUPPORTED which makes it uncertain if the queries are enrolled in the context of a local transaction or each query is executed in a separate database transaction.

Detached objects

Another option is to bind the persistence context to the life-cycle of the intermediate physical transaction. Upon persistence context closing all entities become detached. For a detached entity to become managed, we have two options:

  • The entity can be reattached using Hibernate specific Session.update() method. If there’s an already attached entity (same entity class and with the same identifier) Hibernate throws an exception, because a Session can have at most one reference of any given entity.

    There is no such equivalent in Java Persistence API.

  • Detached entities can also be merged with their persistent object equivalent. If there’s no currently loaded persistence object, Hibernate will load one from the database. The detached entity will not become managed.

    By now you should know that this pattern smells like trouble:

    What if the loaded data doesn’t match what we have previously loaded?
    What if the entity has changed since we first loaded it?

    Overwriting new data with an older snapshot leads to lost updates. So the concurrency control mechanism is not an option when dealing with long conversations.

    Both Hibernate and JPA offer entity merging.

Detached entities storage

The detached entities must be available throughout the lifetime of a given long conversation. For this, we need a stateful context to make sure all conversation requests find the same detached entities. Therefore we can make use of:

  • Stateful Session Beans

    Stateful session beans is one of the greatest feature offered by Java Enterprise Edition. It hides all the complexity of saving/loading state between different user requests. Being a built-in feature, it automatically benefits from cluster replication, so the developer can concentrate on business logic instead.

    Seam is a Java EE application framework that has built-in support for web conversations.

  • HttpSession

    We can save the detached objects in the HttpSession. Most web/application servers offer session replication so this option can be used by non-JEE technologies, like Spring framework. Once the conversation is over, we should always discard all associated state, to make sure we don’t bloat the Session with unnecessary storage.

    You need to be careful to synchronize all HttpSession access (getAttribute/setAttribute), because for a very strange reason, this web storage is not thread-safe.

    Spring Web Flow is a Spring MVC companion that supports HttpSession web conversations.

  • Hazelcast

    Hazelcast is an in-memory clustered cache, so it’s a viable solution for the long conversation storage. We should always set an expiration policy, because in a web application, conversations might be started and abandoned. Expiration acts as the Http session invalidation.

The stateless conversation anti-pattern

Like with database transactions, we need repeatable reads as otherwise we might load an already modified record without realizing it so:

ConversationLostUpdateByReloading

  1. Alice request a product to be displayed
  2. The product is fetched from the database and returned to the browser
  3. Alice request a product modification
  4. Because Alice hasn’t kept a copy of the previously displayed object, she has to reload it once again
  5. The product is updated and saved to the database
  6. The batch job update has been lost and Alice will never realize it

The stateful version-less conversation anti-pattern

Preserving conversation state is a must if we want to ensure both isolation and consistency, but we can still run into lost updates situations:

ConversationLostUpdateStatefullUnversioned

Even if we have application-level repeatable reads others can still modify the same entities. Within the context of a single database transaction, row-level locks can block concurrent modifications but this is not feasible for logical transactions. The only option is to allow others modify any rows, while preventing persisting stale data.

Optimistic locking to the rescue

Optimistic locking is a generic-purpose concurrency control technique, and it works for both physical and application-level transactions. Using JPA is only a matter of adding a @Version field to our domain models:

ConversationLostUpdateStatefullVersioned

Conclusion

Pushing database transaction boundaries into the application layer requires an application-level concurrency control. To ensure application-level repeatable reads we need to preserve state across multiple user requests, but in the absence of database locking we need to rely on an application-level concurrency control.

Optimistic locking works for both database and application-level transactions, and it doesn’t make use of any additional database locking. Optimistic locking can prevent lost updates and that’s why I always recommend all entities be annotated with the @Version attribute.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.