One year of blogging

Teaching is my way of learning

Exactly one year ago today, I wrote my very first blog post. It’s been such a long journey ever since, so it’s time to draw a line and review all my technical writing accomplishments.

I realized that sharing knowledge is a way of pushing myself to reason thoroughly on a particular subject. So, both my readers and I have something to learn from my writing. Finding time to think of future blog topics, researching particular subjects, writing code snippets and the ever-present pre-publishing reviews is worth the hassle.

Under the umbrella

Internet is huge, so being heard is not something you would leave to chance. From the start I knew that I needed to do more than writing high quality articles. When nobody knows anything about you, your only chance is strategic marketing.

Being an avid Java DZone reader I was already familiar with their MVB program, so I decided to give it a shoot. I also submitted a collaboration proposal to JavaCodeGeeks and to my surprise I got accepted soon after my first published post.

Several well-received articles and Allen Coin proposed me for the Dev of the Week column. That’s when I also became a DZone MVB.

Both DZone and JavaCodeGeeks allowed me to reach a much larger audience, so I am grateful for the chance they offered me.

Meeting true Java heroes

This journey allowed me to meet so many great people I would never have had the chance of knowing otherwise.

Lukas Eder (jOOQ founder) was one the first people to find my articles interesting. After two months of blogging, he proposed me for the 100 High-Quality Java Developers’ Blogs list. With his great jOOQ framework and clever marketing skills, Lukas managed to build a large audience on various networking channels (blog, Reddit, Twitter, Google+). Without him promoting my posts, it would have been much more difficult to create so many connections with other software enthusiasts.

Eugen Paraschiv (owner of Baeldung) is definitely the person we should all look up to. Romanian IT industry has developed considerably, but I always felt we fall short on great software figures. Well, he’s passion for software craftsmanship is a secret ingredient for becoming successful in our industry. He’s been listing my articles in many of his personal weekly reviews, allowing my posts to reach his very impressive followers network. I’ve been applying many of his wise marketing advices and I can assure you they work like magic.

Petri Kainulainen (blogger and Spring Data book author) has been a great influence throughout my technical writing apprenticeship. I am a big fan of his articles and I’m fascinated by his ever improving concerns. Without him retweeting my articles, I wouldn’t have got to almost 300 Twitter followers.

The list can go on with Thorben Janssen, Rob Diana and many other Twitter followers finding my articles interesting and worth sharing.

While I first decided joining Twitter for article sharing, I soon discovered a great network of passionate developers. In less than a year I managed to get 286 followers:

meta_12m_twitter

Open-source contribution

From the very beginning, I created a GitHub account to host blog posts code samples. On one project of ours, I realized we were missing a connection pooling monitoring tool, so I decided to write my own open-source framework.

That’s how FlexyPool was born and from real-estate platforms to banking industry (US and Swiss), from GitHub traffic statistics, I can tell that some major companies have added connection pooling improvements tickets back-linking FlexyPool.

Towards becoming a professional trainer

Throughout my software development career, I kept on seeing all sorts of ORM and Data Access anti-patterns. That’s why I decided to create my own open-source Hibernate Master Class training material.

I started answering Hibernate StackOverflow questions since May 2014. StackOverflow allows you to see what users are struggling with, so you can better sense what’s more important to address in your training material.

meta_12m_stackoverflow

Tokens of appreciation

Jelastic started searching for the most interesting developers in the world and after submitting my request I was chosen as one of August most interesting developers.

Time for statistics

In my first year, I managed to write 70 posts which have been visited 88k times (on average 1250 views per article):

meta_12m_WP_months

DZone published 65 articles that have been viewed 388k times (on average 6000 views per article):

meta_12m_DZone

Top viewers by country

meta_12m_WP_world

My top five articles

Name Views
Time to break free from the SQL-92 mindset 4430
MongoDB and the fine art of data modelling 3773
The anatomy of Connection Pooling 3051
A beginner’s guide to ACID and database transactions 2752
JOOQ Facts: From JPA Annotations to JOOQ Table Mappings 2614

My Java DZone top five articles

Name Views
Code Review Best Practices 18156
MongoDB Time Series: Introducing the Aggregation Framework 16152
Batch Processing Best Practices 14415
Good vs Bad Leader 12671
MongoDB Facts: Over 80,000 Inserts/Second on Commodity Hardware 11991

Blog followers

meta_12m_WP_followers

Conclusion

Thank you for reading my blog. Without my readers, I would be writing in vain. Thanks for helping me throughout my first year of technical writing.

For the next year, I plan on finishing Hibernate Master Class and the Unfolding Java Transaction open-source book.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

The fastest way of drawing UML class diagrams

A picture is worth a thousand words

Understanding a software design proposal is so much easier once you can actually visualize it. While writing diagrams might take you an extra effort, the small time investment will pay off when others will require less time understanding your proposal.

Software is a means, not a goal

We are writing software to supports other people business requirements. Understanding business goals is the first step towards coming up with an effective design proposal. After gathering input from your product owner, you should write down the business story. Writing it makes you reason more about the business goal and the product owner can validate your comprehension.

After the business goals are clear you need to move to technical challenges. A software design proposal is derived from both business and technical requirements. The quality of service may pose certain challenges that are better addressed by a specific design pattern or software architecture.

The class diagram drawing hassle

My ideal diagram drawing tool will simply transpose my hand-drawing sketches to a digital format. Unfortunately I haven’t yet found such tool, so this is how I do it:

  1. I hand draw all concepts and interactions on a piece of paper. That’s the most rapid way of design prototyping. While I could use a UML drawing tool, I prefer the paper-and-pencil approach, because changes require much less effort
  2. Once I settle for a design proposal, I start writing down the interfaces and request/response objects in plain Java classes. Changing the classes is pretty easy, thanks to IntelliJ IDEA refactoring tools.
  3. When all Java classes are ready, I simply delegate the class diagram drawing to IntelliJ IDEA

In the end, this is what you end up with:

flexy-pool-class-diagram

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Preventing lost updates in long conversations

Introduction

All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (BEGIN/COMMIT/ROLLBACK). Data integrity is enforced by the ACID properties of database transactions.

Logical vs Physical transactions

An logical transaction is an application-level unit of work that may span over multiple physical (database) transactions. Holding the database connection open throughout several user requests, including user think time, is definitely an anti-pattern.

A database server can accommodate a limited number of physical connections, and often those are reused by using connection pooling. Holding limited resources for long periods of time hinders scalability. So database transactions must be short to ensure that both database locks and the pooled connections are released as soon as possible.

Web applications entail a read-modify-write conversational pattern. A web conversation consists of multiple user requests, all operations being logically connected to the same application-level transaction. A typical use case goes like this:

  1. Alice requests a certain product for being displayed
  2. The product is fetched from the database and returned to the browser
  3. Alice requests a product modification
  4. The product must be updated and saved to the database

All these operations should be encapsulated in a single unit-of-work. We therefore need an application-level transaction that’s also ACID compliant, because other concurrent users might modify the same entities, long after shared locks had been released.

In my previous post I introduced the perils of lost updates. The database transaction ACID properties can only prevent this phenomena within the boundaries of a single physical transaction. Pushing transaction boundaries into the application layer requires application-level ACID guarantees.

To prevent lost updates we must have application-level repeatable reads along with a concurrency control mechanisms.

Long conversations

HTTP is a stateless protocol. Stateless applications are always easier to scale than stateful ones, but conversations can’t be stateless.

Hibernate offers two strategies for implementing long conversations:

  • Extended persistence context
  • Detached objects

Extended persistence context

After the first database transaction ends the JDBC connection is closed (usually going back to the connection pool) and the Hibernate session becomes disconnected. A new user request will reattach the original Session. Only the last physical transaction must issue DML operations, as otherwise the application-level transaction is not an atomic unit of work.

For disabling persistence in the course of the application-level transaction, we have the following options:

  • We can disable automatic flushing, by switching the Session FlushMode to MANUAL. At the end of the last physical transaction, we need to explicitly call Session#flush() to propagate the entity state transitions.
  • All but the last transaction are marked read-only. For read-only transactions Hibernate disables both dirty checking and the default automatic flushing.

    The read-only flag might propagate to the underlying JDBC Connection, so the driver might enable some database-level read-only optimizations.

    The last transaction must be writeable so that all changes are flushed and committed.

Using an extended persistence context is more convenient since entities remain attached across multiple user requests. The downside is the memory footprint. The persistence context might easily grow with every new fetched entity. Hibernate default dirty checking mechanism uses a deep-comparison strategy, comparing all properties of all managed entities. The larger the persistence context, the slower the dirty checking mechanism will get.

This can be mitigated by evicting entities that don’t need to be propagated to the last physical transaction.

Java Enterprise Edition offers a very convenient programming model through the use of @Stateful Session Beans along with an EXTENDED PersistenceContext.

All extended persistence context examples set the default transaction propagation to NOT_SUPPORTED which makes it uncertain if the queries are enrolled in the context of a local transaction or each query is executed in a separate database transaction.

Detached objects

Another option is to bind the persistence context to the life-cycle of the intermediate physical transaction. Upon persistence context closing all entities become detached. For a detached entity to become managed, we have two options:

  • The entity can be reattached using Hibernate specific Session.update() method. If there’s an already attached entity (same entity class and with the same identifier) Hibernate throws an exception, because a Session can have at most one reference of any given entity.

    There is no such equivalent in Java Persistence API.

  • Detached entities can also be merged with their persistent object equivalent. If there’s no currently loaded persistence object, Hibernate will load one from the database. The detached entity will not become managed.

    By now you should know that this pattern smells like trouble:

    What if the loaded data doesn’t match what we have previously loaded?
    What if the entity has changed since we first loaded it?

    Overwriting new data with an older snapshot leads to lost updates. So the concurrency control mechanism is not an option when dealing with long conversations.

    Both Hibernate and JPA offer entity merging.

Detached entities storage

The detached entities must be available throughout the lifetime of a given long conversation. For this, we need a stateful context to make sure all conversation requests find the same detached entities. Therefore we can make use of:

  • Stateful Session Beans

    Stateful session beans is one of the greatest feature offered by Java Enterprise Edition. It hides all the complexity of saving/loading state between different user requests. Being a built-in feature, it automatically benefits from cluster replication, so the developer can concentrate on business logic instead.

    Seam is a Java EE application framework that has built-in support for web conversations.

  • HttpSession

    We can save the detached objects in the HttpSession. Most web/application servers offer session replication so this option can be used by non-JEE technologies, like Spring framework. Once the conversation is over, we should always discard all associated state, to make sure we don’t bloat the Session with unnecessary storage.

    You need to be careful to synchronize all HttpSession access (getAttribute/setAttribute), because for a very strange reason, this web storage is not thread-safe.

    Spring Web Flow is a Spring MVC companion that supports HttpSession web conversations.

  • Hazelcast

    Hazelcast is an in-memory clustered cache, so it’s a viable solution for the long conversation storage. We should always set an expiration policy, because in a web application, conversations might be started and abandoned. Expiration acts as the Http session invalidation.

The stateless conversation anti-pattern

Like with database transactions, we need repeatable reads as otherwise we might load an already modified record without realizing it so:

ConversationLostUpdateByReloading

  1. Alice request a product to be displayed
  2. The product is fetched from the database and returned to the browser
  3. Alice request a product modification
  4. Because Alice hasn’t kept a copy of the previously displayed object, she has to reload it once again
  5. The product is updated and saved to the database
  6. The batch job update has been lost and Alice will never realize it

The stateful version-less conversation anti-pattern

Preserving conversation state is a must if we want to ensure both isolation and consistency, but we can still run into lost updates situations:

ConversationLostUpdateStatefullUnversioned

Even if we have application-level repeatable reads others can still modify the same entities. Within the context of a single database transaction, row-level locks can block concurrent modifications but this is not feasible for logical transactions. The only option is to allow others modify any rows, while preventing persisting stale data.

Optimistic locking to the rescue

Optimistic locking is a generic-purpose concurrency control technique, and it works for both physical and application-level transactions. Using JPA is only a matter of adding a @Version field to our domain models:

ConversationLostUpdateStatefullVersioned

Conclusion

Pushing database transaction boundaries into the application layer requires an application-level concurrency control. To ensure application-level repeatable reads we need to preserve state across multiple user requests, but in the absence of database locking we need to rely on an application-level concurrency control.

Optimistic locking works for both database and application-level transactions, and it doesn’t make use of any additional database locking. Optimistic locking can prevent lost updates and that’s why I always recommend all entities be annotated with the @Version attribute.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

A beginner’s guide to database locking and the lost update phenomena

Introduction

A database is highly concurrent system. There’s always a chance of update conflicts, like when two concurring transactions try to update the same record. If there would be only one database transaction at any time then all operations would be executed sequentially. The challenge comes when multiple transactions try to update the same database rows as we still have to ensure consistent data state transitions.

The SQL standard defines three consistency anomalies (phenomena):

  • Dirty reads, prevented by Read Committed, Repeatable Read and Serializable isolation levels
  • Non-repeatable reads, prevented by Repeatable Read and Serializable isolation levels
  • Phantom reads, prevented by the Serializable isolation level

A lesser-known phenomena is the lost updates anomaly and that’s what we are going to discuss in this current article.

Isolation levels

Most database systems use Read Committed as the default isolation level (MySQL using Repeatable Read instead). Choosing the isolation level is about finding the right balance of consistency and scalability for our current application requirements.

All the following examples are going to be run on PostgreSQL 9.3. Other database systems may behave differently according to their specific ACID implementation.

PostgreSQL uses both locks and MVCC (Multiversion Concurrency Control). In MVCC read and write locks are not conflicting, so reading doesn’t block writing and writing doesn’t block reading either.

Because most applications use the default isolation level, it’s very important to understand the Read Committed characteristics:

  • Queries only see data committed before the query began and also the current transaction uncommitted changes
  • Concurrent changes committed during a query execution won’t be visible to the current query
  • UPDATE/DELETE statements use locks to prevent concurrent modifications

If two transactions try to update the same row, the second transaction must wait for the first one to either commit or rollback, and if the first transaction has been committed, then the second transaction DML WHERE clause must be reevaluated to see if the match is still relevant.

UncontendedTransactions

In this example Bob’s UPDATE must wait for Alice’s transaction to end (commit/rollback) in order to proceed further.

Read Committed accommodates more concurrent transactions than other stricter isolation levels, but less locking leads to better chances of losing updates.

Lost updates

If two transactions are updating different columns of the same row, then there is no conflict. The second update blocks until the first transaction is committed and the final result reflects both update changes.

If the two transactions want to change the same columns, the second transaction will overwrite the first one, therefore loosing the first transaction update.

So an update is lost when a user overrides the current database state without realizing that someone else changed it between the moment of data loading and the moment the update occurs.

LostUpdateSingleRequestTransactions

In this example Bob is not aware that Alice has just changed the quantity from 7 to 6, so her UPDATE is overwritten by Bob’s change.

The typical find-modify-flush ORM strategy

Hibernate (like any other ORM tool) automatically translates entity state transitions to SQL queries. You first load an entity, change it and let the Hibernate flush mechanism syncronize all changes with the database.

public Product incrementLikes(Long id) {
	Product product = entityManager.find(Product.class, id);
	product.incrementLikes(); 
	return product;
}

public Product setProductQuantity(Long id, Long quantity) {
	Product product = entityManager.find(Product.class, id);
	product.setQuantity(quantity);
	return product;
}

As I’ve already pointed out, all UPDATE statements acquire write locks, even in Read Committed isolation. The persistence context write-behind policy aims to reduce the lock holding interval but the longer the period between the read and the write operations the more chances of getting into a lost update situation.

Hibernate includes all row columns in an UPDATE statement. This strategy can be changed to include only the dirty properties (through the @DynamicUpdate annotation) but the reference documentation warns us about its effectiveness:

Although these settings can increase performance in some cases, they can actually decrease performance in others.

So let’s see how Alice and Bob concurrently update the same Product using an ORM framework:

Alice Bob
store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (5, 10) WHERE ID = 1;
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 6 | 7
(1 ROW)

store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 10
(1 ROW)

store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 10
(1 ROW)

Again Alice’s update is lost without Bob ever knowing he overwrote her changes. We should always prevent data integrity anomalies, so let’s see how we can overcome this phenomena.

Repeatable Read

Using Repeatable Read (as well as Serializable which offers a even stricter isolation level) can prevent lost updates across concurrent database transactions.

Alice Bob
store=# BEGIN;
store=# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# BEGIN;
store=# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (5, 10) WHERE ID = 1;
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 6 | 7
(1 ROW)

ERROR: could not serialize access due to concurrent update
store=# SELECT * FROM PRODUCT WHERE ID = 1;
ERROR: current transaction is aborted, commands ignored until end of transaction block
(1 ROW)

This time, Bob couldn’t overwrite Alice’s changes and his transaction was aborted. In Repeatable Read, a query will see the data snapshot as of the start of the current transaction. Changes committed by other concurrent transactions are not visible to the current transaction.

If two transactions attempt to modify the same record, the second transaction will wait for the first one to either commit or rollback. If the first transaction commits, then the second one must be aborted to prevent lost updates.

SELECT FOR UPDATE

Another solution would be to use the FOR UPDATE with the default Read Committed isolation level. This locking clause acquires the same write locks as with UPDATE and DELETE statements.

Alice Bob
store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 5 | 7
(1 ROW)

store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

ID | LIKES | QUANTITY
—-+——-+———-
1 | 6 | 7
(1 ROW)

id | likes | quantity
—-+——-+———-
1 | 6 | 7
(1 row)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 10) WHERE ID = 1;
UPDATE 1
store=# COMMIT;
COMMIT
store=# SELECT * FROM PRODUCT WHERE ID = 1;
id | likes | quantity
—-+——-+———-
1 | 6 | 10
(1 row)

Bob couldn’t proceed with the SELECT statement because Alice has already acquired the write locks on the same row. Bob will have to wait for Alice to end her transaction and when Bob’s SELECT is unblocked he will automatically see her changes, therefore Alice’s UPDATE won’t be lost.

Both transactions should use the FOR UPDATE locking. If the first transaction doesn’t acquire the write locks, the lost update can still happen.

Alice Bob
store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 5 | 7
(1 row)

store=# BEGIN;
store=# SELECT * FROM PRODUCT WHERE ID = 1 FOR UPDATE

id | likes | quantity
—-+——-+———-
1 | 5 | 7
(1 row)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 7) WHERE ID = 1;
store=# UPDATE PRODUCT SET (LIKES, QUANTITY) = (6, 10) WHERE ID = 1;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 6 | 10
(1 row)
store=# COMMIT;

store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 6 | 7
(1 row)

store=# COMMIT;

store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity
—-+——-+———-
1 | 6 | 7
(1 row)

Alice’s UPDATE is blocked until Bob releases the write locks at the end of his current transaction. But Alice’s persistence context is using a stale entity snapshot, so she overwrites Bob changes, leading to another lost update situation.

Optimistic Locking

My favorite approach is to replace pessimistic locking with an optimistic locking mechanism. Like MVCC, optimistic locking defines a versioning concurrency control model that works without acquiring additional database write locks.

The product table will also include a version column that prevents old data snapshots to overwrite the latest data.

Alice Bob
store=# BEGIN;
BEGIN
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 5 | 7 | 2
(1 row)

store=# BEGIN;
BEGIN
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 5 | 7 | 2
(1 row)

store=# UPDATE PRODUCT SET (LIKES, QUANTITY, VERSION) = (6, 7, 3) WHERE (ID, VERSION) = (1, 2);
UPDATE 1
store=# UPDATE PRODUCT SET (LIKES, QUANTITY, VERSION) = (5, 10, 3) WHERE (ID, VERSION) = (1, 2);
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 6 | 7 | 3
(1 row)

UPDATE 0
store=# COMMIT;
store=# SELECT * FROM PRODUCT WHERE ID = 1;

id | likes | quantity | version
—-+——-+———-+———
1 | 6 | 7 | 3
(1 row)

Every UPDATE takes the load-time version into the WHERE clause, assuming no one has changed this row since it was retrieved from the database. If some other transaction manages to commit a newer entity version, the UPDATE WHERE clause will no longer match any row and so the lost update is prevented.

Hibernate uses the PreparedStatement#executeUpdate result to check the number of updated rows. If no row was matched, it then throws a StaleObjectStateException (when using Hibernate API) or an OptimisticLockException (when using JPA).

Like with Repeatable Read the current transaction and the persistence context are aborted, in respect to atomicity guarantees.

Conclusion

Lost updates can happen unless you plan for preventing such situations. Other than optimistic locking, all pessimistic locking approaches are effective only in the scope of the same database transaction, when both the SELECT and the UPDATE statements are executed in the same physical transaction.

In my next post I will explain why optimistic locking is the only viable solution when using application-level transactions, like it’s the case for most web applications.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

Hibernate bytecode enhancement

Introduction

Now that you know the basics of Hibernate dirty checking, we can dig into enhanced dirty checking mechanisms. While the default graph-traversal algorithm might be sufficient for most use-cases, there might be times when you need an optimized dirty checking algorithm and instrumentation is much more convenient than building your own custom strategy.

Using Ant Hibernate Tools

Traditionally, The Hibernate Tools have been focused on Ant and Eclipse. Bytecode instrumentation has been possible since Hibernate 3, but it required an Ant task to run the CGLIB or Javassist bytecode enhancement routines.

Maven supports running Ant tasks through the maven-antrun-plugin:

<build>
	<plugins>
		<plugin>
			<artifactId>maven-antrun-plugin</artifactId>
			<executions>
				<execution>
					<id>Instrument domain classes</id>
					<configuration>
						<tasks>
							<taskdef name="instrument"
									 classname="org.hibernate.tool.instrument.javassist.InstrumentTask">
								<classpath>
									<path refid="maven.dependency.classpath"/>
									<path refid="maven.plugin.classpath"/>
								</classpath>
							</taskdef>
							<instrument verbose="true">
								<fileset dir="${project.build.outputDirectory}">
									<include name="**/flushing/*.class"/>
								</fileset>
							</instrument>
						</tasks>
					</configuration>
					<phase>process-classes</phase>
					<goals>
						<goal>run</goal>
					</goals>
				</execution>
			</executions>
			<dependencies>
				<dependency>
					<groupId>org.hibernate</groupId>
					<artifactId>hibernate-core</artifactId>
					<version>${hibernate.version}</version>
				</dependency>
				<dependency>
					<groupId>org.javassist</groupId>
					<artifactId>javassist</artifactId>
					<version>${javassist.version}</version>
				</dependency>
			</dependencies>
		</plugin>
	</plugins>
</build>

So for the following entity source class:

@Entity
public class EnhancedOrderLine {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private Long number;

    private String orderedBy;

    private Date orderedOn;

    public Long getId() {
        return id;
    }

    public Long getNumber() {
        return number;
    }

    public void setNumber(Long number) {
        this.number = number;
    }

    public String getOrderedBy() {
        return orderedBy;
    }

    public void setOrderedBy(String orderedBy) {
        this.orderedBy = orderedBy;
    }

    public Date getOrderedOn() {
        return orderedOn;
    }

    public void setOrderedOn(Date orderedOn) {
        this.orderedOn = orderedOn;
    }
}

During build-time the following class is generated:

@Entity
public class EnhancedOrderLine implements FieldHandled {

  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  private Long id;
  private Long number;
  private String orderedBy;
  private Date orderedOn;
  private transient FieldHandler $JAVASSIST_READ_WRITE_HANDLER;

  public Long getId() {
    return $javassist_read_id();
  }

  public Long getNumber() {
    return $javassist_read_number();
  }

  public void setNumber(Long number) {
    $javassist_write_number(number);
  }

  public String getOrderedBy() {
    return $javassist_read_orderedBy();
  }

  public void setOrderedBy(String orderedBy) {
    $javassist_write_orderedBy(orderedBy);
  }

  public Date getOrderedOn() {
    return $javassist_read_orderedOn();
  }

  public void setOrderedOn(Date orderedOn) {
    $javassist_write_orderedOn(orderedOn);
  }

  public FieldHandler getFieldHandler() {
    return this.$JAVASSIST_READ_WRITE_HANDLER;
  }

  public void setFieldHandler(FieldHandler paramFieldHandler) {
    this.$JAVASSIST_READ_WRITE_HANDLER = paramFieldHandler;
  }

  public Long $javassist_read_id() {
    if (getFieldHandler() == null)
      return this.id;
  }

  public void $javassist_write_id(Long paramLong) {
    if (getFieldHandler() == null) {
      this.id = paramLong;
      return;
    }
    this.id = ((Long)getFieldHandler().writeObject(this, "id", this.id, paramLong));
  }

  public Long $javassist_read_number() {
    if (getFieldHandler() == null)
      return this.number;
  }

  public void $javassist_write_number(Long paramLong) {
    if (getFieldHandler() == null) {
      this.number = paramLong;
      return;
    }
    this.number = ((Long)getFieldHandler().writeObject(this, "number", this.number, paramLong));
  }

  public String $javassist_read_orderedBy() {
    if (getFieldHandler() == null)
      return this.orderedBy;
  }

  public void $javassist_write_orderedBy(String paramString) {
    if (getFieldHandler() == null) {
      this.orderedBy = paramString;
      return;
    }
    this.orderedBy = ((String)getFieldHandler().writeObject(this, "orderedBy", this.orderedBy, paramString));
  }

  public Date $javassist_read_orderedOn() {
    if (getFieldHandler() == null)
      return this.orderedOn;
  }

  public void $javassist_write_orderedOn(Date paramDate) {
    if (getFieldHandler() == null) {
      this.orderedOn = paramDate;
      return;
    }
    this.orderedOn = ((Date)getFieldHandler().writeObject(this, "orderedOn", this.orderedOn, paramDate));
  }
}

Although the org.hibernate.bytecode.instrumentation.spi.AbstractFieldInterceptor manages to intercept dirty fields, this info is never really enquired during dirtiness tracking.

The InstrumentTask bytecode enhancement can only tell whether an entity is dirty, lacking support for indicating which properties have been modified, therefore making the InstrumentTask more suitable for “No-proxy” LAZY fetching strategy.

hibernate-enhance-maven-plugin

Hibernate 4.2.8 added support for a dedicated Maven bytecode enhancement plugin.

The Maven bytecode enhancement plugin is easy to configure:

<build>
    <plugins>
        <plugin>
            <groupId>org.hibernate.orm.tooling</groupId>
            <artifactId>hibernate-enhance-maven-plugin</artifactId>
            <executions>
                 <execution>
                     <phase>compile</phase>
                     <goals>
                         <goal>enhance</goal>
                     </goals>
                 </execution>
             </executions>
        </plugin>
    </plugins>
</build>

During project build-time, the following class is being generated:

@Entity
public class EnhancedOrderLine
        implements ManagedEntity, PersistentAttributeInterceptable, SelfDirtinessTracker {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
    private Long number;
    private String orderedBy;
    private Date orderedOn;

    @Transient
    private transient PersistentAttributeInterceptor $$_hibernate_attributeInterceptor;

    @Transient
    private transient Set $$_hibernate_tracker;

    @Transient
    private transient CollectionTracker $$_hibernate_collectionTracker;

    @Transient
    private transient EntityEntry $$_hibernate_entityEntryHolder;

    @Transient
    private transient ManagedEntity $$_hibernate_previousManagedEntity;

    @Transient
    private transient ManagedEntity $$_hibernate_nextManagedEntity;

    public Long getId() {
        return $$_hibernate_read_id();
    }

    public Long getNumber() {
        return $$_hibernate_read_number();
    }

    public void setNumber(Long number) {
        $$_hibernate_write_number(number);
    }

    public String getOrderedBy() {
        return $$_hibernate_read_orderedBy();
    }

    public void setOrderedBy(String orderedBy) {
        $$_hibernate_write_orderedBy(orderedBy);
    }

    public Date getOrderedOn() {
        return $$_hibernate_read_orderedOn();
    }

    public void setOrderedOn(Date orderedOn) {
        $$_hibernate_write_orderedOn(orderedOn);
    }

    public PersistentAttributeInterceptor $$_hibernate_getInterceptor() {
        return this.$$_hibernate_attributeInterceptor;
    }

    public void $$_hibernate_setInterceptor(PersistentAttributeInterceptor paramPersistentAttributeInterceptor) {
        this.$$_hibernate_attributeInterceptor = paramPersistentAttributeInterceptor;
    }

    public void $$_hibernate_trackChange(String paramString) {
        if (this.$$_hibernate_tracker == null)
            this.$$_hibernate_tracker = new HashSet();
        if (!this.$$_hibernate_tracker.contains(paramString))
            this.$$_hibernate_tracker.add(paramString);
    }

    private boolean $$_hibernate_areCollectionFieldsDirty() {
        return ($$_hibernate_getInterceptor() != null) && (this.$$_hibernate_collectionTracker != null);
    }

    private void $$_hibernate_getCollectionFieldDirtyNames(Set paramSet) {
        if (this.$$_hibernate_collectionTracker == null)
            return;
    }

    public boolean $$_hibernate_hasDirtyAttributes() {
        return ((this.$$_hibernate_tracker == null) || (this.$$_hibernate_tracker.isEmpty())) && ($$_hibernate_areCollectionFieldsDirty());
    }

    private void $$_hibernate_clearDirtyCollectionNames() {
        if (this.$$_hibernate_collectionTracker == null)
            this.$$_hibernate_collectionTracker = new CollectionTracker();
    }

    public void $$_hibernate_clearDirtyAttributes() {
        if (this.$$_hibernate_tracker != null)
            this.$$_hibernate_tracker.clear();
        $$_hibernate_clearDirtyCollectionNames();
    }

    public Set<String> $$_hibernate_getDirtyAttributes() {
        if (this.$$_hibernate_tracker == null)
            this.$$_hibernate_tracker = new HashSet();
        $$_hibernate_getCollectionFieldDirtyNames(this.$$_hibernate_tracker);
        return this.$$_hibernate_tracker;
    }

    private Long $$_hibernate_read_id() {
        if ($$_hibernate_getInterceptor() != null)
            this.id = ((Long) $$_hibernate_getInterceptor().readObject(this, "id", this.id));
        return this.id;
    }

    private void $$_hibernate_write_id(Long paramLong) {
        if (($$_hibernate_getInterceptor() == null) || ((this.id == null) || (this.id.equals(paramLong))))
            break label39;
        $$_hibernate_trackChange("id");
        label39:
        Long localLong = paramLong;
        if ($$_hibernate_getInterceptor() != null)
            localLong = (Long) $$_hibernate_getInterceptor().writeObject(this, "id", this.id, paramLong);
        this.id = localLong;
    }

    private Long $$_hibernate_read_number() {
        if ($$_hibernate_getInterceptor() != null)
            this.number = ((Long) $$_hibernate_getInterceptor().readObject(this, "number", this.number));
        return this.number;
    }

    private void $$_hibernate_write_number(Long paramLong) {
        if (($$_hibernate_getInterceptor() == null) || ((this.number == null) || (this.number.equals(paramLong))))
            break label39;
        $$_hibernate_trackChange("number");
        label39:
        Long localLong = paramLong;
        if ($$_hibernate_getInterceptor() != null)
            localLong = (Long) $$_hibernate_getInterceptor().writeObject(this, "number", this.number, paramLong);
        this.number = localLong;
    }

    private String $$_hibernate_read_orderedBy() {
        if ($$_hibernate_getInterceptor() != null)
            this.orderedBy = ((String) $$_hibernate_getInterceptor().readObject(this, "orderedBy", this.orderedBy));
        return this.orderedBy;
    }

    private void $$_hibernate_write_orderedBy(String paramString) {
        if (($$_hibernate_getInterceptor() == null) || ((this.orderedBy == null) || (this.orderedBy.equals(paramString))))
            break label39;
        $$_hibernate_trackChange("orderedBy");
        label39:
        String str = paramString;
        if ($$_hibernate_getInterceptor() != null)
            str = (String) $$_hibernate_getInterceptor().writeObject(this, "orderedBy", this.orderedBy, paramString);
        this.orderedBy = str;
    }

    private Date $$_hibernate_read_orderedOn() {
        if ($$_hibernate_getInterceptor() != null)
            this.orderedOn = ((Date) $$_hibernate_getInterceptor().readObject(this, "orderedOn", this.orderedOn));
        return this.orderedOn;
    }

    private void $$_hibernate_write_orderedOn(Date paramDate) {
        if (($$_hibernate_getInterceptor() == null) || ((this.orderedOn == null) || (this.orderedOn.equals(paramDate))))
            break label39;
        $$_hibernate_trackChange("orderedOn");
        label39:
        Date localDate = paramDate;
        if ($$_hibernate_getInterceptor() != null)
            localDate = (Date) $$_hibernate_getInterceptor().writeObject(this, "orderedOn", this.orderedOn, paramDate);
        this.orderedOn = localDate;
    }

    public Object $$_hibernate_getEntityInstance() {
        return this;
    }

    public EntityEntry $$_hibernate_getEntityEntry() {
        return this.$$_hibernate_entityEntryHolder;
    }

    public void $$_hibernate_setEntityEntry(EntityEntry paramEntityEntry) {
        this.$$_hibernate_entityEntryHolder = paramEntityEntry;
    }

    public ManagedEntity $$_hibernate_getPreviousManagedEntity() {
        return this.$$_hibernate_previousManagedEntity;
    }

    public void $$_hibernate_setPreviousManagedEntity(ManagedEntity paramManagedEntity) {
        this.$$_hibernate_previousManagedEntity = paramManagedEntity;
    }

    public ManagedEntity $$_hibernate_getNextManagedEntity() {
        return this.$$_hibernate_nextManagedEntity;
    }

    public void $$_hibernate_setNextManagedEntity(ManagedEntity paramManagedEntity) {
        this.$$_hibernate_nextManagedEntity = paramManagedEntity;
    }
}

It’s easy to realize that the new bytecode enhancement logic is different than the one generated by the previous InstrumentTask.

Like the custom dirty checking mechanism, the new bytecode enhancement version records what properties have changed, not just a simple dirty boolean flag. The enhancement logic marks dirty fields upon changing. This approach is much more efficient than having to compare all current property values against the load-time snapshot data.

Are we there yet?

Even if the entity class bytecode is being enhanced, somehow with Hibernate 4.3.6 there are still missing puzzle pieces.

For instance, when calling setNumber(Long number) the following intercepting method gets executed:

private void $$_hibernate_write_number(Long paramLong) {
	if (($$_hibernate_getInterceptor() == null) || ((this.number == null) || (this.number.equals(paramLong))))
		break label39;
	$$_hibernate_trackChange("number");
	label39:
	Long localLong = paramLong;
	if ($$_hibernate_getInterceptor() != null)
		localLong = (Long) $$_hibernate_getInterceptor().writeObject(this, "number", this.number, paramLong);
	this.number = localLong;
}    

In my examples, $$_hibernate_getInterceptor() is always null, which bypasses the $$_hibernate_trackChange(“number”) call. Because of this, no dirty property is going to be recorded, forcing Hibernate to fall-back to the default deep-comparison dirty checking algorithm.

So, even if Hibernate has made considerable progress in this particular area, the dirty checking enhancement still requires additional work to become readily available.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

From mostly interested to most interesting

No money can buy this feeling

Being appreciated for my work is what pushes me forward for contributing more. I am proud to be nominated as one of the most interesting developers.

Ever since I started this blog, helping others on Stack Overflow or contributing to Open Source Software many good things have happened.

Being mentioned on the same page with John Sonmez or Christoph Engelbert (Hazelcast) is more than flattering. With all optimism, I’ve never thought I’d get this nomination.

Thank you Jelastic, you made my day!

For more, you can read the full article.

How to customize Hibernate dirty checking mechanism

Introduction

In my previous article I described the Hibernate automatic dirty checking mechanism. While you should always prefer it, there might be times when you want to add your own custom dirtiness detection strategy.

Custom dirty checking strategies

Hibernate offers the following customization mechanisms:

A manual dirty checking exercise

As an exercise, I’ll build a manual dirty checking mechanism to illustrate how easy you can customize the change detection strategy:

Self dirty checking entity

First, I’ll define a DirtyAware interface all manual dirty checking entities will have to implement:

public interface DirtyAware {

    Set<String> getDirtyProperties();

    void clearDirtyProperties();
}

Next I am going to encapsulate our current dirty checking logic in a base class:

public abstract class SelfDirtyCheckingEntity implements DirtyAware {

    private final Map<String, String> setterToPropertyMap = new HashMap<String, String>();

    @Transient
    private Set<String> dirtyProperties = new LinkedHashSet<String>();

    public SelfDirtyCheckingEntity() {
        try {
            BeanInfo beanInfo = Introspector.getBeanInfo(getClass());
            PropertyDescriptor[] descriptors = beanInfo.getPropertyDescriptors();
            for (PropertyDescriptor descriptor : descriptors) {
                Method setter = descriptor.getWriteMethod();
                if (setter != null) {
                    setterToPropertyMap.put(setter.getName(), descriptor.getName());
                }
            }
        } catch (IntrospectionException e) {
            throw new IllegalStateException(e);
        }

    }

    @Override
    public Set<String> getDirtyProperties() {
        return dirtyProperties;
    }

    @Override
    public void clearDirtyProperties() {
        dirtyProperties.clear();
    }

    protected void markDirtyProperty() {
        String methodName = Thread.currentThread().getStackTrace()[2].getMethodName();
        dirtyProperties.add(setterToPropertyMap.get(methodName));
    }
}

All manual dirty checking entities will have to extend this base class and explicitly flag the dirty properties through a call to the markDirtyProperty method.

The actual self dirty checking entity looks like this:

@Entity
@Table(name = "ORDER_LINE")
public class OrderLine extends SelfDirtyCheckingEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    private Long number;

    private String orderedBy;

    private Date orderedOn;

    public Long getId() {
        return id;
    }

    public Long getNumber() {
        return number;
    }

    public void setNumber(Long number) {
        this.number = number;
        markDirtyProperty();
    }

    public String getOrderedBy() {
        return orderedBy;
    }

    public void setOrderedBy(String orderedBy) {
        this.orderedBy = orderedBy;
        markDirtyProperty();
    }

    public Date getOrderedOn() {
        return orderedOn;
    }

    public void setOrderedOn(Date orderedOn) {
        this.orderedOn = orderedOn;
        markDirtyProperty();
    }
}

Whenever a setter gets called, the associated property becomes dirty. For simplicity sake this simple exercise doesn’t cover the use case when we revert a property to its original value.

The dirty checking test

To test the self dirty checking mechanisms I’m going to run the following test case:

@Test
public void testDirtyChecking() {
    doInTransaction(new TransactionCallable<Void>() {
         @Override
         public Void execute(Session session) {
            OrderLine orderLine = new OrderLine();
            session.persist(orderLine);
            session.flush();
            orderLine.setNumber(123L);
            orderLine.setOrderedBy("Vlad");
            orderLine.setOrderedOn(new Date());
            session.flush();
            orderLine.setOrderedBy("Alex");
            return null;
        }
    });
}

The Hibernate Interceptor solution

The Hibernate Interceptor findDirty callback allows us to control the dirty properties discovery process. This method may return:

  • null, to delegate the dirty checking to Hibernate default strategy
  • an int[] array, containing the modified properties indicies

Our Hibernate dirty checking interceptor looks like this:

public class DirtyCheckingInterceptor extends EmptyInterceptor {
        @Override
        public int[] findDirty(Object entity, Serializable id, Object[] currentState, Object[] previousState, String[] propertyNames, Type[] types) {
            if(entity instanceof DirtyAware) {
                DirtyAware dirtyAware = (DirtyAware) entity;
                Set<String> dirtyProperties = dirtyAware.getDirtyProperties();
                int[] dirtyPropertiesIndices = new int[dirtyProperties.size()];
                List<String> propertyNamesList = Arrays.asList(propertyNames);
                int i = 0;
                for(String dirtyProperty : dirtyProperties) {
                    LOGGER.info("The {} property is dirty", dirtyProperty);
                    dirtyPropertiesIndices[i++] = propertyNamesList.indexOf(dirtyProperty);
                }
                dirtyAware.clearDirtyProperties();
                return dirtyPropertiesIndices;
            }
            return super.findDirty(entity, id, currentState, previousState, propertyNames, types);
        }
    }

When passing this interceptor to our current SessionFactory configuration we get the following output:

INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The number property is dirty
INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedBy property is dirty
INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedOn property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:1 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Vlad,2014-08-20 07:35:05.649,1]} 
INFO  [main]: c.v.h.m.l.f.InterceptorDirtyCheckingTest - The orderedBy property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Alex,2014-08-20 07:35:05.649,1]}

The manual dirty checking mechanism has detected incoming changes and propagated them to the flushing event listener.

The lesser-known CustomEntityDirtinessStrategy

The CustomEntityDirtinessStrategy is a recent Hibernate API addition, allowing us to provide an application specific dirty checking mechanism. This interface can be implemented as follows:

    public static class EntityDirtinessStrategy implements CustomEntityDirtinessStrategy {

        @Override
        public boolean canDirtyCheck(Object entity, EntityPersister persister, Session session) {
            return entity instanceof DirtyAware;
        }

        @Override
        public boolean isDirty(Object entity, EntityPersister persister, Session session) {
            return !cast(entity).getDirtyProperties().isEmpty();
        }

        @Override
        public void resetDirty(Object entity, EntityPersister persister, Session session) {
            cast(entity).clearDirtyProperties();
        }

        @Override
        public void findDirty(Object entity, EntityPersister persister, Session session, DirtyCheckContext dirtyCheckContext) {
            final DirtyAware dirtyAware = cast(entity);
            dirtyCheckContext.doDirtyChecking(
                    new AttributeChecker() {
                        @Override
                        public boolean isDirty(AttributeInformation attributeInformation) {
                            String propertyName = attributeInformation.getName();
                            boolean dirty = dirtyAware.getDirtyProperties().contains( propertyName );
                            if (dirty) {
                                LOGGER.info("The {} property is dirty", propertyName);
                            }
                            return dirty;
                        }
                    }
            );
        }

        private DirtyAware cast(Object entity) {
            return DirtyAware.class.cast(entity);
        }
    }

To register the CustomEntityDirtinessStrategy implementation we have to set the following Hibernate property:

properties.setProperty("hibernate.entity_dirtiness_strategy", EntityDirtinessStrategy.class.getName());

Running our test yields the following output:

INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The number property is dirty
INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedBy property is dirty
INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedOn property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:1 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Vlad,2014-08-20 12:51:30.068,1]} 
INFO  [main]: c.v.h.m.l.f.CustomEntityDirtinessStrategyTest - The orderedBy property is dirty
DEBUG [main]: o.h.e.i.AbstractFlushingEventListener - Flushed: 0 insertions, 1 updates, 0 deletions to 1 objects
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[update ORDER_LINE set number=?, orderedBy=?, orderedOn=? where id=?][123,Alex,2014-08-20 12:51:30.068,1]} 

Conclusion

Although the default field-level checking or the bytecode instrumentation alternative are sufficient for most applications, there might be times when you want to gain control over the change detection process. On a long-term project, it’s not uncommon to customize certain built-in mechanisms, to satisfy exceptional quality of service requirements. A framework adoption decision should also consider the framework extensibility and customization support.

Code available on GitHub.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.