How to get a 10,000 points StackOverflow reputation

How it all started

In spring 2014, I initiated the Hibernate Master Class project, focusing on best practices and well-established usage patterns. I then realized that all my previous Hibernate experience wouldn’t be enough for this task. I needed more than that.

Hibernate has a very steep learning curve and tens of new StackOverflow questions are being asked on a daily basis. With so many problems waiting to be solved, I came to realize this was a great opportunity to prove my current skills, while learning some new tricks.

On 8th of May 2014, I gave my very first StackOverflow answer. After 253 days, on 16th of January 2015, I managed to get a reputation of over 10,000:

meta_10k_stackoverflow

StackOveflow facts

StackExchange offers a data query tool to analyze anything you can possible think of. Next I’m going to run some queries against my own account and four well renowned users:

User Reputation Answers
Jon Skeet 743,416 30,812
Peter Lawrey 251,229 10,663
Tomasz Nurkiewicz 152,139 2,964
Lukas Eder 55,208 1077
Vlad Mihalcea 10,018 581

Accepted answers reputation

The accepted answer ratio tells us how much you can count on the OP (question poster) to accept your answers:

User Average acceptance ratio Average acceptance reputation
[Ratio x 15]
Jon Skeet 60.42% 9.06
Peter Lawrey 28,90% 4.35
Tomasz Nurkiewicz 53,91% 8,08
Lukas Eder 46,69% 7.00
Vlad Mihalcea 37,36% 5.60

The chance of having your answer accepted rarely surpasses the 60% rate, so don’t count too much on this one. Some OP will never accept your answer, even if it’s the right one and it has already generated a high score.

Lesson 1: Don’t get upset if your answer was not accepted, and think of your answer as a contribution to our community rather than a gift to the question author.

Up-votes reputation

Another interesting metric is the answer score graph:

meta-10k-upvote-stackoverflow

The average answer score is a good indicator of your overall answer effectiveness, as viewed by the whole community:

User Average Score Average score reputation
[Ratio x 10]
Jon Skeet 8.16 81.6
Peter Lawrey 2.50 25
Tomasz Nurkiewicz 4.67 46.7
Lukas Eder 4.25 42.5
Vlad Mihalcea 0.75 7.5

While the answer acceptance is a one time event, up-voting can be a recurring action. A good answer can increase your reputation, long after you’ve posted your solution.

Lesson 2: Always strive for getting high quality answers. Even if they don’t get accepted, someone else might find it later and thank you with an up-vote.

Bounty hunting reputation

I’ve been a bounty hunter from the very beginning and the bounty contribution query proves why I happen to favor featured questions over regular ones:

User Bounty Count Total bounty reputation Average bounty reputation
Jon Skeet 67 8025 119
Tomasz Nurkiewicz 2 100 50
Peter Lawrey 4 225 56
Lukas Eder 2 550 275
Vlad Mihalcea 36 2275 63

To place a bounty, you have to be willing to deduct your own reputation, so naturally the question is both challenging and rewarding. The featured questions have a dedicated tab, therefore getting much more traction than regular ones, increasing the up-vote chance as well.

Lesson 3: Always favor bounty questions over regular ones.

Reputation is a means not a goal

The reputation alone is just a community contribution indicator and you should probably care more about tag badges instead. The tag badges prove one’s expertise in a certain technology, and it’s the fairest endorsement system currently available in the software industry.

If you want to become an expert in a particular area, I strongly recommend you trying to get a gold badge on that topic. The effort of earning 1000 up-votes will get you more than a virtual medal on your StackOverflow account. You will get to improve your problem-solving skills and make a name for yourself in the software community.

As I said it before:

When you answer a question you are reiterating your knowledge. Sometimes you only have a clue, so you start investigating that path, which not only provides you the right answer but it also allows you to strengthen your skills. It’s like constant rehearsing.

Conclusion

If you cannot image developing software without the helping hand of the StackOverflow knowledge base, then you should definitely start contributing.

In the end, the occasional “Thank you, it works now!” is much more rewarding than even a 10,000 points reputation.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

A beginner’s guide to Java Persistence locking

Implicit locking

In concurrency theory, locking is used for protecting mutable shared data against hazardous data integrity anomalies. Because lock management is a very complex problem, most applications rely on their data provider implicit locking techniques.

Delegating the whole locking responsibility to the database system can both simplify application development and prevent concurrency issues, such as deadlocking. Deadlocks can still occur, but the database can detect and take safety measures (arbitrarily releasing one of the two competing locks).

Physical locks

Most database systems use shared (read) and exclusive (write) locks, attributed to specific locking elements (rows, tables). While physical locking is demanded by the SQL standard, the pessimistic approach might hinder scalability.

Modern databases have implemented lightweight locking techniques, such as multiversion concurrency control.

The implicit database locking is hidden behind the transaction isolation level configuration. Each isolation level comes with a predefined locking scheme, aimed for preventing a certain set of data integrity anomalies.

READ COMMITTED uses query-level shared locks and exclusive locks for the current transaction modified data. REPEATABLE READ and SERIALIZABLE use transaction-level shared locks when reading and exclusive locks when writing.

Logical locks

If database locking is sufficient for batch processing systems, a multi-request web flow spans over several database transactions. For long conversations, a logical (optimistic) locking mechanism is much more appropriate.

Paired with a conversation-level repeatable read storage, optimistic locking can ensure data integrity without trading scalability.

JPA supports both optimistic locking and persistence context repeatable reads, making it ideal for implementing logical transactions.

Explicit locking

While implicit locking is probably the best choice for most applications concurrency control requirements, there might be times when you want a finer-grained locking strategy.

Most database systems support query-time exclusive locking directives, such as SELECT FOR UPDATE or SELECT FOR SHARE. We can therefore use lower level default isolation levels (READ COMMITTED), while requesting share or exclusive locks for specific transaction scenarios.

Most optimistic locking implementations verify modified data only, but JPA allows explicit optimistic locking as well.

JPA locking

As a database abstraction layer, JPA can benefit from the implicit locking mechanisms offered by the underlying RDBMS. For logical locking, JPA offers an optional automated entity version control mechanism as well.

JPA supports explicit locking for the following operations:

Explicit lock types

The LockModeType contains the following optimistic and pessimistic locking modes:

Lock Mode Type Description
NONE In the absence of explicit locking, the application will use implicit locking (optimistic or pessimistic)
OPTIMISTIC Always issues a version check upon transaction commit, therefore ensuring optimistic locking repeatable reads.
READ Same as OPTIMISTIC.
OPTIMISTIC_FORCE_INCREMENT Always increases the entity version (even when the entity doesn’t change) and issues a version check upon transaction commit, therefore ensuring optimistic locking repeatable reads.
WRITE Same as OPTIMISTIC_FORCE_INCREMENT.
PESSIMISTIC_READ A shared lock is acquired to prevent any other transaction from acquiring a PESSIMISTIC_WRITE lock.
PESSIMISTIC_WRITE An exclusive lock is acquired to prevent any other transaction from acquiring a PESSIMISTIC_READ or a PESSIMISTIC_WRITE lock.
PESSIMISTIC_FORCE_INCREMENT A database lock is acquired to prevent any other transaction from acquiring a PESSIMISTIC_READ or a PESSIMISTIC_WRITE lock and the entity version is incremented upon transaction commit.

Lock scope and timeouts

JPA 2.0 defined the javax.persistence.lock.scope property, taking one of the following values:

  • NORMAL

    Because object graphs can span to multiple tables, an explicit locking request might propagate to more than one table (e.g. joined inheritance, secondary tables).

    Because the entire entity associated row(s) are locked, many-to-one and one-to-one foreign keys will be locked as well but without locking the other side parent associations. This scope doesn’t propagate to children collections.

  • EXTENDED

    The explicit lock is propagated to element collections and junction tables, but it doesn’t lock the actual children entities. The lock is only useful for protecting against removing existing children, while permitting phantom reads or changes to the actual children entity states.

JPA 2.0 also introduced the javax.persistence.lock.timeout property, allowing us to configure the amount of time (milliseconds) a lock request will wait before throwing a PessimisticLockException.

Hibernate locking

Hibernate supports all JPA locking modes and some additional specific locking options. As with JPA, explicit locking can be configured for the following operations:

The LockModeConverter takes care of mapping JPA and Hibernate lock modes as follows:

Hibernate LockMode JPA LockModeType
NONE NONE
OPTIMISTIC
READ
OPTIMISTIC
OPTIMISTIC_FORCE_INCREMENT
WRITE
OPTIMISTIC_FORCE_INCREMENT
PESSIMISTIC_READ PESSIMISTIC_READ
PESSIMISTIC_WRITE
UPGRADE
UPGRADE_NOWAIT
UPGRADE_SKIPLOCKED
PESSIMISTIC_WRITE
PESSIMISTIC_FORCE_INCREMENT
FORCE
PESSIMISTIC_FORCE_INCREMENT

The UPGRADE and FORCE lock modes are deprecated in favor of PESSIMISTIC_WRITE.

UPGRADE_NOWAIT and UPGRADE_SKIPLOCKED use an Oracle-style select for update nowait or select for update skip locked syntax respectively.

Lock scope and timeouts

Hibernate also defines scope and timeout locking options:

  • scope

    The lock scope allows explicit locking cascade to owned associations.

  • timeout

    A timeout interval may prevent a locking request from waiting indefinitely.

In my next articles, I am going to unravel different explicit locking design pasterns, so stay tuned!

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, consider following my blog.

Why you should pay developers to learn

A true story

We were having a meeting with a customer and he had just presented a project idea. He wanted us to give him a draft system architecture, supporting his project technical requirements. At one point, I was telling him that incremental development requires architecture evolution as well.

When I said that finding the right architecture is also a learning process, he cut me off short with the following sentence:

Do you expect me to pay you to learn?

To save the day, I told him I was referring to the business domain, we needed to fully understand in order to provide the right architecture.

Do you want your project be developed by an unskilled team?

Unless you hire a highly expensive consultant, chances are you need a software development team for more than a few months. If the project spans over a year or more, how would you feel about a team that never has time to level up?

Software development is one of the most knowledge-driven industry, yet many expect developers to be readily equipped with everything it takes to solve any given problem.

Languages keep on evolving. Relational databases keep on adding new features. There’s a plethora of NoSQL databases most have never ever worked with. Successful frameworks keep on releasing new versions. New techniques emerge (e.g. reactive programming or micro services), while others keep on getting more and more traction (e.g. functional programming).

To master all these technologies and techniques you need to spend a considerable amount of time.

When are developers supposed to level up?

There are extremely passionate developers dedicating their spare time to reading books or technical articles or studying new technologies, but they are an exception to a rule.

Most developers acquire all knowledge during their job and if you don’t invest in their skills they will never grow within your team.

The right place and time to learn software is during your job.

Unfortunately, not everybody in the software industry shares this vision of mine. Business owners don’t want to spend resources (time and money) on training developers.

I really believe it’s a matter of perspective. If you don’t manage to get any direct or indirect revenue, you might be tempted to think you’re wasting money. But if you plan it properly, you can easily turn it into a very profitable investment.

Learn for profit

High quality software demands solid knowledge and expertize, but accumulating skills requires more than just reading. You need to become an active learner to maximize knowledge acquiring.

I used to be a passive learner myself, only reading books and articles while constantly having the impression that I don’t actually make too much progress.

When I started writing this blog, I realized I was now learning through teaching.

When I became an active StackOverflow user, this feeling was reassured.

When I started an open-source project, I finally realized that learning is only a side effect of hard working.

All these examples are what active learning is all about.

From the business perspective, it’s not difficult to foresee where the return of investment might come from:

  • A more skilled development team can leverage more complex projects with a lower risk of failure.
  • You can master a certain technology and start offering professional training and consultancy services
  • You can write books and sell them through a self-publishing program. Ninja Squad’s AngularJS book (French) was a profitable investment after all.

All in all, expertize always sells.

Investing in developement skills can definitely payoff. Many developers enjoy a working environment where they can grow, so this move can actually be beneficial for employee retention as well.

Starting on this journey is not as difficult as one might think, and I’m going to present some of my favorites active learning activities:

Preparing a training material

Let’s say you want to acquire a certain key technology skill in your company. Some developers should be partially allocated for studying and preparing a training material on this subject.

A workshop is always a better choice than a simple presentation. When the training material is ready, you have accumulated both knowledge and a training base too. You can now start offering training or consultancy services on that particular technology.

A company blog

Every software company accumulates experience, yet few of them actually share it with the rest. A company technical blog can be a great marketing instrument. A high quality blog can prove your domain knowledge and expertize.

You can build strategic partnerships with DZone or JavaCodeGeeks and therefore help promoting your business as well.

Answering StackOverflow questions

Contributing to StackOverflow is totally underrated. If you really want to become an expert into a certain domain, you should start answering question on that particular tag. When you answer a question you are reiterating your knowledge.

Sometimes you only have a clue, so you start investigating that path, which not only provides you the right answer but it also allows you to strengthen your skills. It’s like constant rehearsing.

After all, repetition is the mother of learning.

Contributing to open source projects

If you want to boost your design and coding skills, you should probably start contributing to open source projects. Browsing code can unveil certain patterns you’ve never previously applied.

Most frameworks authors are incredibly craftsmen and their code review can teach you a lot about best programming practices. If your company makes heavy use of a certain open source technology, it’s a great idea to start contributing back. The best way to deal with an annoying framework issue is to actually fix it.

Nobody knows a framework better than its own maintainers.

Writing and selling books

You can summarize all your experience in a book. Writing a book is very intense learning process as well. By the time you are done with it, you can really say you’ve come to master the subject.

Amazon offers self publishing programs and selling books can become an alternative revenue source and an advertisement as well.

Conclusion

Embracing learning can be a competitive advantage for your company. Your products carry your company name, and a software product quality mirrors the development team professionalism.

In the end, you are not only investing in individuals, you are investing in your own company as well.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, then you need to follow my blog.

Book review – How Linux Works 2nd edition

O’Reilly Reader Review Program

I heard about O’Reilly Reader Review Program after reading Thorben Janssen’s Java Performance: The Definitive Guide book review. After being admitted, I got a copy of How Linux Works, 2nd Edition. The book is great for anybody working in this field and you should totally read it. Next I’m going to tell you why.

The book

The book’s author is Brian Ward, who has a Ph.D. in computer science and has written several books about Linux Kernel, Vim and VMware. The book has 17 Chapters and covers many Linux aspects, from the Operating System architecture to Bash scripting and Package Managers.

Chapter 1

The first chapter is a very nice introduction of Linux architecture. You are going to learn about Linux abstraction layers and the clear difference between the kernel and the user space.

Chapter 2

This second chapter is very useful for Linux beginners to accommodate with some basic, yet extremely useful, Linux commands, utilities and shell pipes and filters. One very important aspect of Linux is the directory hierarchy, which you definitely have to know it, if you don’t want to get lost.

Chapter 3

The third chapter is dedicated to Linux devices. You’ll learn about the standard file-based device interface and the most useful dd command. The chapter covers in details all device types, from hard disks to USB and terminals.

Chapter 4

The forth chapter talks about disk partitions and various Linux File Systems. You’ll learn how to mount a device and how to partition it for both data and swap. The inode concept is very well explained too.

Chapter 5 and 6

The fifths and the six chapters are more advanced and so they require more time to understand what’s happening during kernel boot process and the user space initialization.

Chapter 7

This chapters is dedicated to system configuration. You’ll learn about the content of etc/ folder, as well as user management and cron tasks. This chapter is very useful for Linux beginners, since you’ll interact with them on a regular basis.

Chapter 8

This chapter is one of the most important one, since it covers everything you need to know about Linux processes. You will learn to use ps and lsof for both process and thread monitoring. From CPU to memory, you will learn that Linux offers a great variety of resource monitoring tools. Unless you are a .NET developer, there’s a great chance your applications get deployed on a Linux server, so skipping this chapter is not an option.

Chapter 9

This chapter is an introduction into networking and you can skip it if you already know networking basics. You can also learn about Linux routing, but unless you are a system administrator, you are not going to need this on your daily job.

Chapter 10

While the previous chapter was a more theoretical one, the tenth chapter is one you don’t want to miss. You are going to learn about network monitoring, using lsof, tcpdump and
port scanning. The network security is also a good read for every programmer as well as the socket section. The Unix socket domains and the Inter Process Communication (IPC) are very important aspects for every developer working with Linux.

Chapter 11 and 12

The eleventh chapter is dedicated to shell scripting and automating recurring tasks is not only a system administrator job. Learning a little about shell scripting can save you a lot of time and prevent accidental mistakes, so make sure you don’t skip it.

The twelfth chapter talks about file network access and the rsync section is very important, since there’s a great chance you’ll have to use it sooner or later.

Chapter 13

In this chapter you are going to learn about user environment configurations for both login and remote sessions.

Chapter 14

The fourteenth chapter is dedicated to Desktop environments, emphasizing the importance of X server and client utilities. You’ll also learn how to utilize window-based applications on a remote Linux server using X11 forwarding from within a SSH session.

Chapter 15 and 16

These chapters give you an introduction into C programming, from a Linux administration perspective. You will learn how to build a Linux package even without a package manager.

Chapter 17

The last chapter wraps everything up and reiterates the importance of Linux for both servers and embedded devices. Linux might not be easy in other domains of activity, but as a developer you have no excuse but to learn to use it.

Conclusion

I certainly recommend this book for every developer wanting to learn something more about Linux.

To master the command line, I also recommend the The Linux Command Line by William Shotts.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, then you need to follow my blog.

2014 – A year in review

Retrospective

January

In the beginning of 2014, I took the initial version of my time series MongoDB aggregation example and pass it through a multistage optimization process, from indexing to advanced data-modelling:

February

In February, I starting developing FlexyPool, the ultimate connection pool sizing utility. This was a great opportunity to dig into Queuing Theory and the following articles capture some of my findings:

May

After finishing FlexyPool, I decided investing in a Data knowledge stack, and so I started working on my Hibernate Master Class training material.

The Hibernate Master Class allowed me to dig into a great variety of JPA/Hibernate features, some of which are lesser known:

Almost at the time, I started answering Hibernate StackOverfow questions, and I accumulated a reputation on 8918 points.

August

In August, I was elected One of August’s Most Interesting Developers.

If you wonder what happened with my open-source Java Transactions Book, you can take a look on the Concurrency Control section of the Hibernate Master Class:

I decided to include my knowledge about transactions in the Master Class material, since you can’t separate transactions out of the run-time environment anyway.

September

In September my blog’s just turned one.

December

Although I didn’t win the Most interesting developer competition, I’m proud I managed to finish on the 3rd place.

2014 most viewed articles

WordPress has created a wonderful 2014 statistics report, but I promised I’d publish my top 5 posts, so there you have them:

Name Views
Hibernate Identity, Sequence and Table (Sequence) generator 5650
Time to break free from the SQL-92 mindset 4725
MongoDB and the fine art of data modelling 4251
The anatomy of Connection Pooling 3347
MongoDB 2.6 is $out 3297

Plans for 2015

I plan on finishing the Hibernate Master Class training and further complete the Data knowledge stack with other database access related technologies.

I want to get a Hibernate and a JPA gold badge on StackOverflow.

I want to read more books than I did in 2014.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

A beginner’s guide to transaction isolation levels in enterprise Java

Introduction

A relational database strong consistency model is based on ACID transaction properties. In this post we are going to unravel the reasons behind using different transaction isolation levels and various configuration patterns for both resource local and JTA transactions.

Isolation and consistency

In a relational database system, atomicity and durability are strict properties, while consistency and isolation are more or less configurable. We cannot even separate consistency from isolation as these two properties are always related.

The lower the isolation level, the less consistent the system will get. From the least to the most consistent, there are four isolation levels:

  • READ UNCOMMITTED
  • READ COMMITTED (protecting against dirty reads)
  • REPEATABLE READ (protecting against dirty and non-repeatable reads)
  • SERIALIZABLE (protecting against dirty, non-repeatable reads and phantom reads)

Although the most consistent SERIALIZABLE isolation level would be the safest choice, most databases default to READ COMMITTED instead. According to Amdahl’s law, to accommodate more concurrent transactions, we have to reduce the serial fraction of our data processing. The shorter the lock acquisition interval, the more requests a database can process.

Isolation levels

As we previously demonstrated, application level repeatable reads paired with an optimistic locking mechanism are very convenient for preventing lost updates in long conversations.

In a highly concurrent environment, optimistic locking might lead to a high transaction failure rate. Pessimistic locking, like any other queuing mechanism might accommodate more transactions when giving a sufficient lock acquisition time interval.

Database and isolation levels

Apart from MySQL (which uses REPEATABLE_READ), the default isolation level of most relational database systems is READ_COMMITTED. All databases allow you to set the default transaction isolation level.

Typically, the database is shared among multiple applications and each one has its own specific transaction requirements. For most transactions the READ_COMMITTED isolation level is the best choice and we should only override it for specific business cases.

This strategy proves to be the very efficient, allowing us to have stricter isolation levels for just a subset of all SQL transactions.

DataSource isolation level

The JDBC Connection object allows us to set the isolation level for all transactions issued on that specific connection. Establishing a new database connection is a resource consuming process, so most applications use a connection pooling DataSource. The connection pooling DataSource can also set the default transaction isolation level:

Compared to the global database isolation level setting, the DataSource level transaction isolation configurations are more convenient. Each application may set its own specific concurrency control level.

We can even define multiple DataSources, each one with a per-defined isolation level. This way we can dynamically choose a specific isolation level JDBC Connection.

Hibernate isolation level

Because it has to support both resource local and JTA transactions, Hibernate offers a very flexible connection provider mechanism.

JTA transactions require an XAConnection and it’s the JTA transaction manager responsibility to provide XA compliant connections.

Resource local transactions can use a resource local DataSource and for this scenario, Hibernate offers multiple connection provider options:

  • Driver Manager Connection Provider (doesn’t pool connections and therefore it’s only meant for simple testing scenarios)
  • C3P0 Connection Provider (delegating connection acquiring calls to an internal C3P0 connection pooling DataSource)
  • DataSource Connection Provider (delegating connection acquiring calls to an external DataSource.

Hibernate offers a transaction isolation level confgiuartion called hibernate.connection.isolation, so we are going to check how all the aforementioned connection providers behave when being given this particular setting.

For this we are going to:

  1. Create a SessionFactory
    @Override
    protected SessionFactory newSessionFactory() {
    	Properties properties = getProperties();
    
    	return new Configuration()
    			.addProperties(properties)
    			.addAnnotatedClass(SecurityId.class)
    			.buildSessionFactory(
    					new StandardServiceRegistryBuilder()
    							.applySettings(properties)
    							.build()
    	);
    }
    
  2. Open a new Session and test the associated connection transaction isolation level
    @Test
        public void test() {
            Session session = null;
            Transaction txn = null;
            try {
                session = getSessionFactory().openSession();
                txn = session.beginTransaction();
                session.doWork(new Work() {
                    @Override
                    public void execute(Connection connection) throws SQLException {
                        LOGGER.debug("Transaction isolation level is {}", Environment.isolationLevelToString(connection.getTransactionIsolation()));
                    }
                });
                txn.commit();
            } catch (RuntimeException e) {
                if ( txn != null && txn.isActive() ) txn.rollback();
                throw e;
            } finally {
                if (session != null) {
                    session.close();
                }
            }
        }
    

The only thing that differs is the connection provider configuration.

Driver Manager Connection Provider

The Driver Manager Connection Provider offers a rudimentary DataSource wrapper for the configured database driver. You should only use it for test scenarios, since it doesn’t offer a professional connection pooling mechanism.

@Override
protected Properties getProperties() {
	Properties properties = new Properties();
        properties.put("hibernate.dialect", "org.hibernate.dialect.HSQLDialect");
        //driver settings
        properties.put("hibernate.connection.driver_class", "org.hsqldb.jdbcDriver");
        properties.put("hibernate.connection.url", "jdbc:hsqldb:mem:test");
        properties.put("hibernate.connection.username", "sa");
        properties.put("hibernate.connection.password", "");
        //isolation level
        properties.setProperty("hibernate.connection.isolation", String.valueOf(Connection.TRANSACTION_SERIALIZABLE));
	return properties;
}

The test generates the following output:

WARN  [main]: o.h.e.j.c.i.DriverManagerConnectionProviderImpl - HHH000402: Using Hibernate built-in connection pool (not for production use!)
DEBUG [main]: c.v.h.m.l.t.TransactionIsolationDriverConnectionProviderTest - Transaction isolation level is SERIALIZABLE

The Hibernate Session associated JDBC Connection is using the SERIALIZABLE transaction isolation level, so the hibernate.connection.isolation configuration works for this specific connection provider.

C3P0 Connection Provider

Hibernate also offers a build-in C3P0 Connection Provider. Like in the previous example, we only need to provide the driver configuration settings and Hibernate instantiate the C3P0 connection pool on our behalf.

@Override
protected Properties getProperties() {
	Properties properties = new Properties();
        properties.put("hibernate.dialect", "org.hibernate.dialect.HSQLDialect");
        //log settings
        properties.put("hibernate.hbm2ddl.auto", "update");
        properties.put("hibernate.show_sql", "true");
        //driver settings
        properties.put("hibernate.connection.driver_class", "org.hsqldb.jdbcDriver");
        properties.put("hibernate.connection.url", "jdbc:hsqldb:mem:test");
        properties.put("hibernate.connection.username", "sa");
        properties.put("hibernate.connection.password", "");
        //c3p0 settings
        properties.put("hibernate.c3p0.min_size", 1);
        properties.put("hibernate.c3p0.max_size", 5);
        //isolation level
        properties.setProperty("hibernate.connection.isolation", String.valueOf(Connection.TRANSACTION_SERIALIZABLE));
	return properties;
}

The test generates the following output:

Dec 19, 2014 11:02:56 PM com.mchange.v2.log.MLog <clinit>
INFO: MLog clients using java 1.4+ standard logging.
Dec 19, 2014 11:02:56 PM com.mchange.v2.c3p0.C3P0Registry banner
INFO: Initializing c3p0-0.9.2.1 [built 20-March-2013 10:47:27 +0000; debug? true; trace: 10]
DEBUG [main]: c.v.h.m.l.t.TransactionIsolationInternalC3P0ConnectionProviderTest - Transaction isolation level is SERIALIZABLE

So, the hibernate.connection.isolation configuration works for the internal C3P0 connection provider too.

DataSource Connection Provider

Hibernate doesn’t force you to use a specific connection provider mechanism. You can simply supply a DataSource and Hibernate will use it whenever a new Connection is being requested. This time we’ll create a full-blown DataSource object and pass it through the hibernate.connection.datasource configuration.

@Override
protected Properties getProperties() {
	Properties properties = new Properties();
        properties.put("hibernate.dialect", "org.hibernate.dialect.HSQLDialect");
        //log settings
        properties.put("hibernate.hbm2ddl.auto", "update");
        //data source settings
        properties.put("hibernate.connection.datasource", newDataSource());
        //isolation level
        properties.setProperty("hibernate.connection.isolation", String.valueOf(Connection.TRANSACTION_SERIALIZABLE));
	return properties;
}

protected ProxyDataSource newDataSource() {
        JDBCDataSource actualDataSource = new JDBCDataSource();
        actualDataSource.setUrl("jdbc:hsqldb:mem:test");
        actualDataSource.setUser("sa");
        actualDataSource.setPassword("");
        ProxyDataSource proxyDataSource = new ProxyDataSource();
        proxyDataSource.setDataSource(actualDataSource);
        proxyDataSource.setListener(new SLF4JQueryLoggingListener());
        return proxyDataSource;
}    

The test generates the following output:

DEBUG [main]: c.v.h.m.l.t.TransactionIsolationExternalDataSourceConnectionProviderTest - Transaction isolation level is READ_COMMITTED

This time, the hibernate.connection.isolation doesn’t seem to be taken into consideration. Hibernate doesn’t override external DataSources, so this settings is useless in this scenario.

If you are using an external DataSource (e.g. maybe through JNDI), then you need to set the transaction isolation at the external DataSource level.

To fix our previous example, we just have to configure the external DataSource to use a specific isolation level:

protected ProxyDataSource newDataSource() {
	JDBCDataSource actualDataSource = new JDBCDataSource();
	actualDataSource.setUrl("jdbc:hsqldb:mem:test");
	actualDataSource.setUser("sa");
	actualDataSource.setPassword("");
	Properties properties = new Properties();
	properties.setProperty("hsqldb.tx_level", "SERIALIZABLE");
	actualDataSource.setProperties(properties);
	ProxyDataSource proxyDataSource = new ProxyDataSource();
	proxyDataSource.setDataSource(actualDataSource);
	proxyDataSource.setListener(new SLF4JQueryLoggingListener());
	return proxyDataSource;
}

Generating the following output:

DEBUG [main]: c.v.h.m.l.t.TransactionIsolationExternalDataSourceExternalconfgiurationConnectionProviderTest - Transaction isolation level is SERIALIZABLE

Java Enterprise transaction isolation support

Hibernate has a built-in Transaction API abstraction layer, isolating the data access layer from the transaction management topology (resource local or JTA). While we can develop an application using Hibernate transaction abstraction only, it’s much more common to delegate this responsibility to a middle-ware technology (JEE or Spring).

Java Enterprise Edition

JTA (Java Transaction API specification) defines how transactions should be managed by a JEE compliant application server. On the client side, we can demarcate the transaction boundaries using the TransactionAttribute annotation. While we have the option of choosing the right transaction propagation setting, we cannot do the same for the isolation level.

JTA doesn’t support transaction-scoped isolation levels and so we have to resort to vendor-specific configurations for providing an XA DataSource with a specific transaction isolation setting.

Spring

Spring @Transactional annotation is used for defining a transaction boundary. As opposed to JEE, this annotation allows us to configure:

  • isolation level
  • exception types rollback policy
  • propagation
  • read-only
  • timeout

As I will demonstrate later in this article, the isolation level settings is readily available for resource local transactions only. Because JTA doesn’t support transaction-scoped isolation levels, Spring offers the IsolationLevelDataSourceRouter to overcome this shortcoming when using application server JTA DataSources.

Because most DataSource implementations can only take a default transaction isolation level, we can have multiple such DataSources, each one serving connections for a specific transaction isolation level.

The logical transaction (e.g. @Transactional) isolation level setting is introspected by the IsolationLevelDataSourceRouter and the connection acquire request is therefore delegated to a specific DataSource implementation that can serve a JDBC Connection with the same transaction isolation level setting.

So, even in JTA environments, the transaction isolation router can offer a vendor-independent solution for overriding the default database isolation level on a per transaction basis.

Spring transaction-scoped isolation levels

Next, I’m going to test the Spring transaction management support for both resource local and JTA transactions.

For this, I’ll introduce a transactional business logic Service Bean:

@Service
public class StoreServiceImpl implements StoreService {

    protected final Logger LOGGER = LoggerFactory.getLogger(getClass());

    @PersistenceContext(unitName = "persistenceUnit")
    private EntityManager entityManager;

    @Override
    @Transactional(isolation = Isolation.SERIALIZABLE)
    public void purchase(Long productId) {        
        Session session = (Session) entityManager.getDelegate();
        session.doWork(new Work() {
            @Override
            public void execute(Connection connection) throws SQLException {
                LOGGER.debug("Transaction isolation level is {}", Environment.isolationLevelToString(connection.getTransactionIsolation()));
            }
        });
    }
}

The Spring framework offers a transaction management abstraction that decouples the application logic code from the underlying transaction specific configurations. The Spring transaction manager is only a facade to the actual resource local or JTA transaction managers.

Migrating from resource local to XA transactions is just a configuration detail, leaving the actual business logic code untouched. This wouldn’t be possible without the extra transaction management abstraction layer and the cross-cutting AOP support.

Next we are going to test how various specific transaction managers support transaction-scope isolation level overriding.

JPA transaction manager

First, we are going to test the JPA Transaction Manager:

    <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
        <property name="entityManagerFactory" ref="entityManagerFactory" />
    </bean>

When calling our business logic service, this is what we get:

DEBUG [main]: c.v.s.i.StoreServiceImpl - Transaction isolation level is SERIALIZABLE

The JPA transaction manager can take one DataSource only, so it can only issue resource local transactions. In such scenarios, Spring transaction manager is able to override the default DataSource isolation level (which is READ COMMITTED in our case).

JTA transaction manager

Now, let’s see what happens when we switch to JTA transactions. As I previously stated, Spring only offers a logical transaction manager, which means we also have to provide a physical JTA transaction manager.

Traditionally, it was the enterprise application server (e.g. Wildfly, WebLogic) responsibility to provide a JTA compliant transaction manager. Nowadays, there is also a great variety of stand-alone JTA transaction managers:

In this test, we are going to use Bitronix:

<bean id="jtaTransactionManager" factory-method="getTransactionManager"
	  class="bitronix.tm.TransactionManagerServices" depends-on="btmConfig, dataSource"
	  destroy-method="shutdown"/>

<bean id="transactionManager" class="org.springframework.transaction.jta.JtaTransactionManager">
	<property name="transactionManager" ref="jtaTransactionManager"/>
	<property name="userTransaction" ref="jtaTransactionManager"/>
</bean>

When running the previous test, we get the following exception:

org.springframework.transaction.InvalidIsolationLevelException: JtaTransactionManager does not support custom isolation levels by default - switch 'allowCustomIsolationLevels' to 'true'

So, let’s enable the custom isolation level setting and rerun the test:

<bean id="transactionManager" class="org.springframework.transaction.jta.JtaTransactionManager">
	<property name="transactionManager" ref="jtaTransactionManager"/>
	<property name="userTransaction" ref="jtaTransactionManager"/>
	<property name="allowCustomIsolationLevels" value="true"/>
</bean>

The test gives us the following output:

DEBUG [main]: c.v.s.i.StoreServiceImpl - Transaction isolation level is READ_COMMITTED

Even with this extra configuartion, the transaction-scoped isolation level wasn’t propagated to the underlying database connection, as this is the default JTA transaction manager behavior.

For WebLogic, Spring offers a WebLogicJtaTransactionManager to address this limitation, as we can see in the following Spring source-code snippet:

// Specify isolation level, if any, through corresponding WebLogic transaction property.
if (this.weblogicTransactionManagerAvailable) {
	if (definition.getIsolationLevel() != TransactionDefinition.ISOLATION_DEFAULT) {
		try {
			Transaction tx = getTransactionManager().getTransaction();
			Integer isolationLevel = definition.getIsolationLevel();
			/*
			weblogic.transaction.Transaction wtx = (weblogic.transaction.Transaction) tx;
			wtx.setProperty(ISOLATION_LEVEL_KEY, isolationLevel);
			*/
			this.setPropertyMethod.invoke(tx, ISOLATION_LEVEL_KEY, isolationLevel);
		}
		catch (InvocationTargetException ex) {
			throw new TransactionSystemException(
					"WebLogic's Transaction.setProperty(String, Serializable) method failed", ex.getTargetException());
		}
		catch (Exception ex) {
			throw new TransactionSystemException(
					"Could not invoke WebLogic's Transaction.setProperty(String, Serializable) method", ex);
		}
	}
}
else {
	applyIsolationLevel(txObject, definition.getIsolationLevel());
}

Conclusion

Transaction management is definitely not a trivial thing, and with all the available frameworks and abstraction layers, it really becomes more complicated than one might think.

Because data integrity is very important for most business applications, your only option is to master your current project data layer framework stack.

Code available for Hibernate and JPA.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

EAGER fetching is a code smell

Introduction

Hibernate fetching strategies can really make a difference between an application that barely crawls and a highly responsive one. In this post I’ll explain why you should prefer query based fetching instead of global fetch plans.

Fetching 101

Hibernate defines four association retrieving strategies:

Fetching Strategy Description
Join The association is OUTER JOINED in the original SELECT statement
Select An additional SELECT statement is used to retrieve the associated entity(entities)
Subselect An additional SELECT statement is used to retrieve the whole associated collection. This mode is meant for to-many associations
Batch An additional number of SELECT statements is used to retrieve the whole associated collection. Each additional SELECT will retrieve a fixed number of associated entities. This mode is meant for to-many associations

These fetching strategies might be applied in the following scenarios:

  • the association is always initialized along with its owner (e.g. EAGER FetchType)
  • the uninitialized association (e.g. LAZY FetchType) is navigated, therefore the association must be retrieved with a secondary SELECT

The Hibernate mappings fetching information forms the global fetch plan. At query time, we may override the global fetch plan, but only for LAZY associations. For this we can use the fetch HQL/JPQL/Criteria directive. EAGER associations cannot be overridden, therefore tying your application to the global fetch plan.

Hibernate 3 acknowledged that LAZY should be the default association fetching strategy:

By default, Hibernate3 uses lazy select fetching for collections and lazy proxy fetching for single-valued associations. These defaults make sense for most associations in the majority of applications.

This decision was taken after noticing many performance issues associated with Hibernate 2 default eager fetching. Unfortunately JPA has taken a different approach and decided that to-many associations be LAZY while to-one relationships be fetched eagerly.

Association type Default fetching policy
@OneTMany LAZY
@ManyToMany LAZY
@ManyToOne EAGER
@OneToOne EAGER

EAGER fetching inconsistencies

While it may be convenient to just mark associations as EAGER, delegating the fetching responsibility to Hibernate, it’s advisable to resort to query based fetch plans.

An EAGER association will always be fetched and the fetching strategy is not consistent across all querying techniques.

Next, I’m going to demonstrate how EAGER fetching behaves for all Hibernate querying variants. I will reuse the same entity model I’ve previously introduced in my fetching strategies article:

Product

The Product entity has the following associations:

@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "company_id", nullable = false)
private Company company;

@OneToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "product", optional = false)
private WarehouseProductInfo warehouseProductInfo;

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "importer_id")
private Importer importer;

@OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
@OrderBy("index")
private Set<Image> images = new LinkedHashSet<Image>();

The company association is marked as EAGER and Hibernate will always employ a fetching strategy to initialize it along with its owner entity.

Persistence Context loading

First we’ll load the entity using the Persistence Context API:

Product product = entityManager.find(Product.class, productId);

Which generates the following SQL SELECT statement:

Query:{[
select 
    product0_.id as id1_18_1_, 
    product0_.code as code2_18_1_, 
    product0_.company_id as company_6_18_1_, 
    product0_.importer_id as importer7_18_1_, 
    product0_.name as name3_18_1_, 
    product0_.quantity as quantity4_18_1_, 
    product0_.version as version5_18_1_, 
    company1_.id as id1_6_0_, 
    company1_.name as name2_6_0_ 
from Product product0_ 
inner join Company company1_ on product0_.company_id=company1_.id 
where product0_.id=?][1]

The EAGER company association was retrieved using an inner join. For M such associations the owner entity table is going to be joined M times.

Each extra join adds up to the overall query complexity and execution time. If we don’t even use all these associations, for every possible business scenario, then we’ve just paid the extra performance penalty for nothing in return.

Fetching using JPQL and Criteria

Product product = entityManager.createQuery(
	"select p " +
			"from Product p " +
			"where p.id = :productId", Product.class)
	.setParameter("productId", productId)
	.getSingleResult();

or with

CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Product> cq = cb.createQuery(Product.class);
Root<Product> productRoot = cq.from(Product.class);
cq.where(cb.equal(productRoot.get("id"), productId));
Product product = entityManager.createQuery(cq).getSingleResult();

Generates the following SQL SELECT statements:

Query:{[
select 
    product0_.id as id1_18_, 
    product0_.code as code2_18_, 
    product0_.company_id as company_6_18_, 
    product0_.importer_id as importer7_18_, 
    product0_.name as name3_18_, 
    product0_.quantity as quantity4_18_, 
    product0_.version as version5_18_ 
from Product product0_ 
where product0_.id=?][1]} 

Query:{[
select 
    company0_.id as id1_6_0_, 
    company0_.name as name2_6_0_ 
from Company company0_ 
where company0_.id=?][1]}

Both JPQL and Criteria queries default to select fetching, therefore issuing a secondary select for each individual EAGER association. The larger the associations number, the more additional individual SELECTS, the more it will affect our application performance.

Hibernate Criteria API

While JPA 2.0 added support for Criteria queries, Hibernate has long been offering a specific dynamic query implementation.

If the EntityManager implementation delegates method calls the the legacy Session API, the JPA Criteria implementation was written from scratch. That’s the reason why Hibernate and JPA Criteria API behave differently for similar querying scenarios.

The previous example Hibernate Criteria equivalent looks like this:

Product product = (Product) session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.uniqueResult();

And the associated SQL SELECT is:

Query:{[
select 
    this_.id as id1_3_1_, 
    this_.code as code2_3_1_, 
    this_.company_id as company_6_3_1_, 
    this_.importer_id as importer7_3_1_, 
    this_.name as name3_3_1_, 
    this_.quantity as quantity4_3_1_, 
    this_.version as version5_3_1_, 
    hibernatea2_.id as id1_0_0_, 
    hibernatea2_.name as name2_0_0_ 
from Product this_ 
inner join Company hibernatea2_ on this_.company_id=hibernatea2_.id 
where this_.id=?][1]}

This query uses the join fetch strategy as opposed to select fetching, employed by JPQL/HQL and Criteria API.

Hibernate Criteria and to-many EAGER collections

Let’s see what happens when the image collection fetching strategy is set to EAGER:

@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
@OrderBy("index")
private Set<Image> images = new LinkedHashSet<Image>();

The following SQL is going to be generated:

Query:{[
select 
    this_.id as id1_3_2_, 
    this_.code as code2_3_2_, 
    this_.company_id as company_6_3_2_, 
    this_.importer_id as importer7_3_2_, 
    this_.name as name3_3_2_, 
    this_.quantity as quantity4_3_2_, 
    this_.version as version5_3_2_, 
    hibernatea2_.id as id1_0_0_, 
    hibernatea2_.name as name2_0_0_, 
    images3_.product_id as product_4_3_4_, 
    images3_.id as id1_1_4_, 
    images3_.id as id1_1_1_, 
    images3_.index as index2_1_1_, 
    images3_.name as name3_1_1_, 
    images3_.product_id as product_4_1_1_ 
from Product this_ 
inner join Company hibernatea2_ on this_.company_id=hibernatea2_.id 
left outer join Image images3_ on this_.id=images3_.product_id 
where this_.id=? 
order by images3_.index][1]}

Hibernate Criteria doesn’t automatically groups the parent entities list. Because of the one-to-many children table JOIN, for each child entity we are going to get a new parent entity object reference (all pointing to the same object in our current Persistence Context):

product.setName("TV");
product.setCompany(company);

Image frontImage = new Image();
frontImage.setName("front image");
frontImage.setIndex(0);

Image sideImage = new Image();
sideImage.setName("side image");
sideImage.setIndex(1);

product.addImage(frontImage);
product.addImage(sideImage);

List products = session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.list();
assertEquals(2, products.size());
assertSame(products.get(0), products.get(1));

Because we have two image entities, we will get two Product entity references, both pointing to the same first level cache entry.

To fix it we need to instruct Hibernate Criteria to use distinct root entities:

List products = session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY)
	.list();
assertEquals(1, products.size());

Conclusion

The EAGER fetching strategy is a code smell. Most often it’s used for simplicity sake without considering the long-term performance penalties. The fetching strategy should never be the entity mapping responsibility. Each business use case has different entity load requirements and therefore the fetching strategy should be delegated to each individual query.

The global fetch plan should only define LAZY associations, which are fetched on a per query basis. Combined with the always check generated queries strategy, the query based fetch plans can improve application performance and reduce maintaining costs.

Code available for Hibernate and JPA.

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.