EAGER fetching is a code smell

Introduction

Hibernate fetching strategies can really make a difference between an application that barely crawls and a highly responsive one. In this post, I’ll explain why you should prefer query based fetching instead of global fetch plans.

Fetching 101

Hibernate defines four association retrieving strategies:

Fetching Strategy Description
Join The association is OUTER JOINED in the original SELECT statement
Select An additional SELECT statement is used to retrieve the associated entity(entities)
Subselect An additional SELECT statement is used to retrieve the whole associated collection. This mode is meant for to-many associations
Batch An additional number of SELECT statements is used to retrieve the whole associated collection. Each additional SELECT will retrieve a fixed number of associated entities. This mode is meant for to-many associations

These fetching strategies might be applied in the following scenarios:

  • the association is always initialized along with its owner (e.g. EAGER FetchType)
  • the uninitialized association (e.g. LAZY FetchType) is navigated, therefore the association must be retrieved with a secondary SELECT

The Hibernate mappings fetching information forms the global fetch plan. At query time, we may override the global fetch plan, but only for LAZY associations. For this we can use the fetch HQL/JPQL/Criteria directive. EAGER associations cannot be overridden, therefore tying your application to the global fetch plan.

Hibernate 3 acknowledged that LAZY should be the default association fetching strategy:

By default, Hibernate3 uses lazy select fetching for collections and lazy proxy fetching for single-valued associations. These defaults make sense for most associations in the majority of applications.

This decision was taken after noticing many performance issues associated with Hibernate 2 default eager fetching. Unfortunately JPA has taken a different approach and decided that to-many associations be LAZY while to-one relationships be fetched eagerly.

Association type Default fetching policy
@OneTMany LAZY
@ManyToMany LAZY
@ManyToOne EAGER
@OneToOne EAGER

EAGER fetching inconsistencies

While it may be convenient to just mark associations as EAGER, delegating the fetching responsibility to Hibernate, it’s advisable to resort to query based fetch plans.

An EAGER association will always be fetched and the fetching strategy is not consistent across all querying techniques.

Next, I’m going to demonstrate how EAGER fetching behaves for all Hibernate querying variants. I will reuse the same entity model I’ve previously introduced in my fetching strategies article:

Product

The Product entity has the following associations:

@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "company_id", nullable = false)
private Company company;

@OneToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "product", optional = false)
private WarehouseProductInfo warehouseProductInfo;

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "importer_id")
private Importer importer;

@OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
@OrderBy("index")
private Set<Image> images = new LinkedHashSet<Image>();

The company association is marked as EAGER and Hibernate will always employ a fetching strategy to initialize it along with its owner entity.

Persistence Context loading

First we’ll load the entity using the Persistence Context API:

Product product = entityManager.find(Product.class, productId);

Which generates the following SQL SELECT statement:

Query:{[
select 
    product0_.id as id1_18_1_, 
    product0_.code as code2_18_1_, 
    product0_.company_id as company_6_18_1_, 
    product0_.importer_id as importer7_18_1_, 
    product0_.name as name3_18_1_, 
    product0_.quantity as quantity4_18_1_, 
    product0_.version as version5_18_1_, 
    company1_.id as id1_6_0_, 
    company1_.name as name2_6_0_ 
from Product product0_ 
inner join Company company1_ on product0_.company_id=company1_.id 
where product0_.id=?][1]

The EAGER company association was retrieved using an inner join. For M such associations the owner entity table is going to be joined M times.

Each extra join adds up to the overall query complexity and execution time. If we don’t even use all these associations, for every possible business scenario, then we’ve just paid the extra performance penalty for nothing in return.

Fetching using JPQL and Criteria

Product product = entityManager.createQuery(
	"select p " +
			"from Product p " +
			"where p.id = :productId", Product.class)
	.setParameter("productId", productId)
	.getSingleResult();

or with

CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Product> cq = cb.createQuery(Product.class);
Root<Product> productRoot = cq.from(Product.class);
cq.where(cb.equal(productRoot.get("id"), productId));
Product product = entityManager.createQuery(cq).getSingleResult();

Generates the following SQL SELECT statements:

Query:{[
select 
    product0_.id as id1_18_, 
    product0_.code as code2_18_, 
    product0_.company_id as company_6_18_, 
    product0_.importer_id as importer7_18_, 
    product0_.name as name3_18_, 
    product0_.quantity as quantity4_18_, 
    product0_.version as version5_18_ 
from Product product0_ 
where product0_.id=?][1]} 

Query:{[
select 
    company0_.id as id1_6_0_, 
    company0_.name as name2_6_0_ 
from Company company0_ 
where company0_.id=?][1]}

Both JPQL and Criteria queries default to select fetching, therefore issuing a secondary select for each individual EAGER association. The larger the associations number, the more additional individual SELECTS, the more it will affect our application performance.

Hibernate Criteria API

While JPA 2.0 added support for Criteria queries, Hibernate has long been offering a specific dynamic query implementation.

If the EntityManager implementation delegates method calls the the legacy Session API, the JPA Criteria implementation was written from scratch. That’s the reason why Hibernate and JPA Criteria API behave differently for similar querying scenarios.

The previous example Hibernate Criteria equivalent looks like this:

Product product = (Product) session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.uniqueResult();

And the associated SQL SELECT is:

Query:{[
select 
    this_.id as id1_3_1_, 
    this_.code as code2_3_1_, 
    this_.company_id as company_6_3_1_, 
    this_.importer_id as importer7_3_1_, 
    this_.name as name3_3_1_, 
    this_.quantity as quantity4_3_1_, 
    this_.version as version5_3_1_, 
    hibernatea2_.id as id1_0_0_, 
    hibernatea2_.name as name2_0_0_ 
from Product this_ 
inner join Company hibernatea2_ on this_.company_id=hibernatea2_.id 
where this_.id=?][1]}

This query uses the join fetch strategy as opposed to select fetching, employed by JPQL/HQL and Criteria API.

Hibernate Criteria and to-many EAGER collections

Let’s see what happens when the image collection fetching strategy is set to EAGER:

@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
@OrderBy("index")
private Set<Image> images = new LinkedHashSet<Image>();

The following SQL is going to be generated:

Query:{[
select 
    this_.id as id1_3_2_, 
    this_.code as code2_3_2_, 
    this_.company_id as company_6_3_2_, 
    this_.importer_id as importer7_3_2_, 
    this_.name as name3_3_2_, 
    this_.quantity as quantity4_3_2_, 
    this_.version as version5_3_2_, 
    hibernatea2_.id as id1_0_0_, 
    hibernatea2_.name as name2_0_0_, 
    images3_.product_id as product_4_3_4_, 
    images3_.id as id1_1_4_, 
    images3_.id as id1_1_1_, 
    images3_.index as index2_1_1_, 
    images3_.name as name3_1_1_, 
    images3_.product_id as product_4_1_1_ 
from Product this_ 
inner join Company hibernatea2_ on this_.company_id=hibernatea2_.id 
left outer join Image images3_ on this_.id=images3_.product_id 
where this_.id=? 
order by images3_.index][1]}

Hibernate Criteria doesn’t automatically groups the parent entities list. Because of the one-to-many children table JOIN, for each child entity we are going to get a new parent entity object reference (all pointing to the same object in our current Persistence Context):

product.setName("TV");
product.setCompany(company);

Image frontImage = new Image();
frontImage.setName("front image");
frontImage.setIndex(0);

Image sideImage = new Image();
sideImage.setName("side image");
sideImage.setIndex(1);

product.addImage(frontImage);
product.addImage(sideImage);

List products = session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.list();
assertEquals(2, products.size());
assertSame(products.get(0), products.get(1));

Because we have two image entities, we will get two Product entity references, both pointing to the same first level cache entry.

To fix it we need to instruct Hibernate Criteria to use distinct root entities:

List products = session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY)
	.list();
assertEquals(1, products.size());

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

The EAGER fetching strategy is a code smell. Most often it’s used for simplicity sake without considering the long-term performance penalties. The fetching strategy should never be the entity mapping responsibility. Each business use case has different entity load requirements and therefore the fetching strategy should be delegated to each individual query.

The global fetch plan should only define LAZY associations, which are fetched on a per query basis. Combined with the always check generated queries strategy, the query based fetch plans can improve application performance and reduce maintaining costs.

Code available for Hibernate and JPA.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

24 thoughts on “EAGER fetching is a code smell

  1. Good post!

    I would say to keep the default JPA fetching strategy and after learning how the entities and their relationships are used in the application, a developer should try to change them.

    Just one detail, if I’m not wrong, entityManager.find() and mappings use LEFT JOIN on associations and all JPQL/HQL and Criteria use INNER JOIN by default.

    1. Thanks for your kind words. The defaults are specific to the JPA implementation you use. The ones I showed you are for Hibernate only.
      LEFT JOIN is used for optional (nullable) associations while inner join is for not-nullable ones.
      The JPQL and HQL use secondary selects because they can’t add joins you haven’t explicitly requested. The entity manager loading methods (find/getReference) seem to benefit from implicit joining.
      But because of the query type differences I wouldn’t rely much on EAGER. Just use LAZY and have all your queries explicitly fetch whatever they need.

      1. Hi, thanks for those tips!

        So either optional or nullable attributes can change the JOIN? I didn’t realized that! If I’m not wrong, nullable attribute is used just to DDL, isn’t it?

        Could you give me some reference?

      2. Yes, nullable is for DDL only. And optional is for optimizing LAZY many-to-one and one-to-one associations where Hibernate would otherwise require a secondary select to know if it needs to use a null or an actual proxy instead.
        I think I’ve seen such an example with LEFT JOIN, but don;t remember where.

      3. Thanks again!

        These are kind of details we just know when we need them.

        I’m asking this to you because according to JPA spec every association uses INNER JOIN on JPQL or Criteria API by default, but as you said, this optional attribute can change the default on some relationships.

  2. I agree , I always thought that the application knows the collection, so it should be application responsibility retrieve it when needed.

  3. I’m curious, what happens when you have several @OneToMany EAGER associations? Something like:

    @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
    @OrderBy("index")
    private Set images = new LinkedHashSet();
    
    @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
    @OrderBy("index")
    private Set documents = new LinkedHashSet();
    

    Will both relationships be left outer joined at the same time? That would be fatal, as that would produce a cartesian product between { images × documents } with tons of duplicates that would need to be removed again.

      1. Check this example on GitHub:

                @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
                @OrderBy("index")
                private Set<Image> images = new LinkedHashSet<Image>();
        
                @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
                private Set<Review> reviews = new LinkedHashSet<Review>();
        

        And the Cartesian Product JOIN:

        select 
                hibernatea0_.id as id1_3_0_, 
        	hibernatea0_.code as code2_3_0_, 
        	hibernatea0_.company_id as company_6_3_0_, 
        	hibernatea0_.importer_id as importer7_3_0_, 
        	hibernatea0_.name as name3_3_0_, 
        	hibernatea0_.quantity as quantity4_3_0_, 
        	hibernatea0_.version as version5_3_0_, 
        	images1_.product_id as product_4_3_1_, 
        	images1_.id as id1_1_1_, 
        	images1_.id as id1_1_2_, 
        	images1_.index as index2_1_2_, 
        	images1_.name as name3_1_2_, 
        	images1_.product_id as product_4_1_2_, 
        	reviews2_.product_id as product_3_3_3_, 
        	reviews2_.id as id1_7_3_, 
        	reviews2_.id as id1_7_4_, 
        	reviews2_.comment as comment2_7_4_, 
        	reviews2_.product_id as product_3_7_4_ 
        from Product hibernatea0_ 
        left outer join Image images1_ 
            on hibernatea0_.id=images1_.product_id 
        left outer join review reviews2_ 
            on hibernatea0_.id=reviews2_.product_id 
        where hibernatea0_.id=? 
        order by images1_.index 
        

        Luckily, I don’t think many think of enabling two EAGER collections. But someone can get a Cartesian Product if he doesn’t know how Hibernate works. The Hibernate code should always be reviewed by a senior. You can imagine the SQL horror a junior developers team would generate otherwise.

      2. Very interesting, thanks for the feedback. Well, that’s what I thought. In the end, ORMs really can’t materialise the object-graph in one go, because nested collections are ill-represented via JOINs, which denormalise relations.

        The best solution would be to employ SQL standard MULTISET clauses, but they’re only implemented in CUBRID, Informix, and Oracle – not even PostgreSQL…

        Well, good times for jOOQ 🙂

      3. That’s why Hibernate 3 made all asdociations LAZY by default. They assumed you don’t want to fetch the whole DB at once. But JPA made to-one associations EAGER unless you explicitly make them LAZY.

        An abstraction layer will never beat an implementation that has a specific plugin for every database.

        jOOQ will always beat Hibernate on the query side. There’s no doubt. And so many Hibernate peculiarities make you wonder how many of us do really know them.

      4. Lazy is not the solution either. Some N+1 situations are just dumb, and you don’t notice until you actually log all the queries (probably in production via Oracle Enterprise Manager).

        I do wonder though if multisets can be emulated in those databases that do not natively support them… I still believe that if all databases were ORDBMS, we wouldn’t need any ORM, because then, O=R

      5. That’s why I advice writing tests for asserting the SQL statements count. I have a db-util GitHub project for that. I alwats use the datasource-proxy to log the actual exexuted SQL and include the verification in the DoD.

        In early 2000 Object DBs were hype. Why did they fail?

      6. Interesting. You should blog about that GitHub project.

        Object DBs failed, yes. But ORDBMS didn’t. CUBRID, Informix, Oracle and PostgreSQL are ORDBMS with lots of very interesting features. ORDBMS features have also been standardised in the SQL standard. I’ll explain these things soon on the jOOQ blog.

  4. ​EclipseLink behaves somewhat differently:

    Whether a LAZY or EAGER strategy is specified the provider will *always* issue an additional select statement when invoking the EntityManager’s find() method as in:

    # SELECT [ATTR_LIST] FROM PARENT WHERE ID=123
    # SELECT [ATTR_LIST] FROM CHILD WHERE PARENT_ID=123

    So, from a “number of db calls” point of view there really is no difference. The only difference is that in the case of an EAGER fetch the association is fetched without needing to call the corresponding accessor within the same transaction.

    If we really wanted to minimise the number of db calls (and additionally some network traffic) when doing an EAGER fetch, we’d need to use a “JOIN FETCH” jpql query (most preferably a @NamedQuery so that prepared statements are also cached) with the only caveat being that the root entity in the expression would have to be DISTINCTed so that the duplicate rows are eliminated (or alternatively load the result collection into a Set). This would fetch everything in one go.

    This is pretty much the same as your first example under “Persistence Context loading” (I’m guessing you’d need to include some sort of DISTINCT clause there too?)

    I personally prefer the EclipseLink way as it makes things simpler and more straightforward.

    My 2 cents.

    1. Thanks for describing the EclipseLink fetch policy. I must admit that Hibernate implicit fetching is not a good idea. Distinct is only useful for Hibernate Criteria API. The other querying techniques regroup all results anyway, therefore resolving duplicates.

      1. Do you have any post/reference about how Hibernate regroups all results? Sometimes I don’t understand how it achieves to regroup all data with their relationships when we have duplicate rows from JPQL or Criteria.

      2. Even if you select parents joined with entities, Hibernate can usually regroups the results and it doesn’t give you duplicated parents. There are exceptions to this rule, like the specific Hibernate Criteria.
        But specifying DISTINCT into JPQL is a good practice anyway and since it’s mandatory to have integration tests for all your queries, you should always check the results and so you can validate you don’t get duplicates when you don’t expect them.

  5. Perfect article, but actually just so you know .setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY)
    does not generate distinct and two rows will be retrieved from DB in your case. As you know you should always filter data on db side rather than on server side. Using “select distinct p form Product p where p.id = :id” will be better.

      1. I suppose you wanted to mention about select + join fetch. In this case distinct does not apply on db side, because product is retrieved with images. According to Java Persistence with Hiberante : select distinct i from Item i join fetch i.bids – note that in this case the DISTINCT doesn’t operate at the SQL level, but forces Hibernate to filter out duplicates in memory when marshaling the result into objects. Clearly, duplicates cant’t be avoided in the SQL result.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s