The best way to lazy load entity attributes using JPA and Hibernate

Introduction

When fetching an entity, all attributes are going to be loaded as well. This is because every entity attribute is implicitly marked with the @Basic annotation whose default fetch policy is FetchType.EAGER.

However, the attribute fetch strategy can be set to FetchType.LAZY, in which case the entity attribute is loaded with a secondary select statement upon being accessed for the first time.

@Basic(fetch = FetchType.LAZY)

This configuration alone is not sufficient because Hibernate requires bytecode instrumentation to intercept the attribute access request and issue the secondary select statement on demand.

Bytecode enhancement

When using the Maven bytecode enhancement plugin, the enableLazyInitialization configuration property must be set to true as illustrated in the following example:

<plugin>
    <groupId>org.hibernate.orm.tooling</groupId>
    <artifactId>hibernate-enhance-maven-plugin</artifactId>
    <version>${hibernate.version}</version>
    <executions>
        <execution>
            <configuration>
                <failOnError>true</failOnError>
                <enableLazyInitialization>true</enableLazyInitialization>
            </configuration>
            <goals>
                <goal>enhance</goal>
            </goals>
        </execution>
    </executions>
</plugin>

With this configuration in place, all JPA entity classes are going to be instrumented with lazy attribute fetching. This process takes place at build time, right after entity classes are compiled from their associated source files.

The attribute lazy fetching mechanism is very useful when dealing with column types that store large amounts of data (e.g. BLOB, CLOB, VARBINARY). This way, the entity can be fetched without automatically loading data from the underlying large column types, therefore improving performance.

To demonstrate how attribute lazy fetching works, the following example is going to use an Attachment entity which can store any media type (e.g. PNG, PDF, MPEG).

@Entity @Table(name = "attachment")
public class Attachment {

    @Id
    @GeneratedValue
    private Long id;

    private String name;

    @Enumerated
    @Column(name = "media_type")
    private MediaType mediaType;

    @Lob
    @Basic(fetch = FetchType.LAZY)
    private byte[] content;

    //Getters and setters omitted for brevity
}

Properties such as the entity identifier, the name or the media type are to be fetched eagerly on every entity load. On the other hand, the media file content should be fetched lazily, only when being accessed by the application code.

After the Attachment entity is instrumented, the class bytecode is changed as follows:

@Transient
private transient PersistentAttributeInterceptor 
    $$_hibernate_attributeInterceptor;

public byte[] getContent() {
    return $$_hibernate_read_content();
}

public byte[] $$_hibernate_read_content() {
    if ($$_hibernate_attributeInterceptor != null) {
        this.content = ((byte[]) 
            $$_hibernate_attributeInterceptor.readObject(
                this, "content", this.content));
    }
    return this.content;
}

The content attribute fetching is done by the PersistentAttributeInterceptor object reference, therefore providing a way to load the underlying BLOB column only when the getter is called for the first time.

attachment

When executing the following test case:

Attachment book = entityManager.find(
    Attachment.class, bookId);

LOGGER.debug("Fetched book: {}", book.getName());

assertArrayEquals(
    Files.readAllBytes(bookFilePath), 
    book.getContent()
);

Hibernate generates the following SQL queries:

SELECT a.id AS id1_0_0_,
       a.media_type AS media_ty3_0_0_,
       a.name AS name4_0_0_
FROM   attachment a
WHERE  a.id = 1

-- Fetched book: High-Performance Java Persistence

SELECT a.content AS content2_0_
FROM   attachment a
WHERE  a.id = 1

Because it is marked with the FetchType.LAZY annotation and lazy fetching bytecode enhancement is enabled, the content column is not fetched along with all the other columns that initialize the Attachment entity. Only when the data access layer tries to access the content property, Hibernate issues a secondary select to load this attribute as well.

Just like FetchType.LAZY associations, this technique is prone to N+1 query problems, so caution is advised. One slight disadvantage of the bytecode enhancement mechanism is that all entity properties, not just the ones marked with the FetchType.LAZY annotation, are going to be transformed, as previously illustrated.

Fetching subentities

Another approach to avoid loading table columns that are rather large is to map multiple subentities to the same database table.

attachmentsummary

Both the Attachment entity and the AttachmentSummary subentity inherit all common attributes from a BaseAttachment superclass.

@MappedSuperclass
public class BaseAttachment {

    @Id
    @GeneratedValue
    private Long id;

    private String name;

    @Enumerated
    @Column(name = "media_type")
    private MediaType mediaType;

    //Getters and setters omitted for brevity
}

While AttachmentSummary extends BaseAttachment without declaring any new attribute:

@Entity @Table(name = "attachment")
public class AttachmentSummary 
    extends BaseAttachment {}

The Attachment entity inherits all the base attributes from the BaseAttachment superclass and maps the content column as well.

@Entity @Table(name = "attachment")
public class Attachment 
    extends BaseAttachment {

    @Lob
    private byte[] content;

    //Getters and setters omitted for brevity
}

When fetching the AttachmentSummary subentity:

AttachmentSummary bookSummary = entityManager.find(
    AttachmentSummary.class, bookId);

The generated SQL statement is not going to fetch the content column:

SELECT a.id as id1_0_0_, 
       a.media_type as media_ty2_0_0_, 
       a.name as name3_0_0_ 
FROM attachment a 
WHERE  a.id = 1

However, when fetching the Attachment entity:

Attachment book = entityManager.find(
    Attachment.class, bookId);

Hibernate is going to fetch all columns from the underlying database table:

SELECT a.id as id1_0_0_, 
       a.media_type as media_ty2_0_0_, 
       a.name as name3_0_0_, 
       a.content as content4_0_0_ 
FROM attachment a 
WHERE  a.id = 1

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

To lazy fetch entity attributes, you can either use bytecode enhancement or subentities. Although bytecode instrumentation allows you to use only one entity per table, subentities are more flexible and can even deliver better performance since they don’t involve an interceptor call whenever reading an entity attribute.

When it comes to reading data, subentities are very similar to DTO projections. However, unlike DTO projections, subentities can track state changes and propagate them to the database.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

40 thoughts on “The best way to lazy load entity attributes using JPA and Hibernate

  1. Hi Vlad. Mapping byte[] it’s a not a good idea, it can lead to OutOfMemoryError: Java heap space. For instance: when you create a new Attachment with byte[] which weighs 200Mb, it means that the very huge object is created in heap. Could you imagine if many users are attaching files simultaneously?
    You should use Blob instead:

    @Entity
    class Attachment{
    @Column(name = “data”)
    @Lob
    private Blob data;
    }

    and

    File file = new ClassPathResource(“200MB.zip”).getFile();
    Attachment attachment = new Attachment();
    attachment.setData(BlobProxy.generateProxy(new FileInputStream(file), file.length()));
    em.persist(attachment);

    You can test it, just turn on the JVisualVM to figure out what’s going on with Java heap space in two different cases: with Blob and with byte[].

    1. Thanks for the tip. Indeed, for very large files, a bye[] would be overkill. However, if you have an imposed limit for file size (as many applications do), this should not be an issue. Blobs have heir own quirks as well, especially when you retrieve them.

  2. I wonder whether 3rd way is possible: 2 entities mapped to same table (but different columns) connected with lazy @OneToOne and @MapsId. Seems better than your second solution, because the relation between entities is explicit.

    1. @OneToOne demands having two tables. In this example, we only have one table. Of course, you can always move the large column into a separate table, but that’s a totally different discussion. Not to mention that sometimes you cannot do that because you already inherit a legacy schema.

  3. I always thought that every @Lob attribute was LAZY by default, even without byte enhancement enabled [I’ve never used it before]. Normally I use DTO and projections to avoid loading @Lob attributes.

    By the way, I liked your solution using subentities, it was clever and well designed. Indeed, I liked it more than byte enhancement.

  4. What about updating the data? What will happen if, by chance, you end up updating an Attachment and an AttachmentSummary with the same id in the same transaction?
    I suppose you have to be careful using this feature and always update using the subentity with all the data, not mixing loading two subentities with the same id in the same transaction.

      1. Ups, sorry for the delay, I didn’t find out you had already answered.

        Let me explain better with an example: first, you load an Attachment with id = 1. It’s name has a value of “MyName”. Then you change it to “MyChangedName”. Then, in the same transaction, you load an AttachmentSummary with the same id. As it is another entity and the changes haven’t been persisted, I suppose its name property value will still be “MyName”. ¿Am I wrong?

        Then, for example, you change the AttachmentSummary’s name and media_type property to another values. Once the transaction finishes, ¿will both entities be persisted, overwriting one the values of the other one? If you use a version property, I suppose you will get an exception.

        If I’m not wrong, those are problems you will have if you mix updates to subentities with the same id. Of course, you can avoid them with a bit of care, but you have to be aware of them.

      2. Mixing multiple modifying entities for the same table is not advisable when you use subentities. You have to be aware of these issues, of course. Although I haven’t tested it, I suppose that optimistic locking will catch these issues.

  5. Hi Vlad. What about @NamedEntityGraph or @FetchProfile for such lazy attributes? For example I want to eager load lazy attributes for getById operation and lazily load it for getAll operation.
    It seems it doesn’t work like for @ManyToOne associations.

  6. Hi, Vlad.
    How Hibernate’s second level cache is working having two entities mapped to a one table in same context?
    For example I have entity1 and entity2(let it be read only, and it shared two field with entity1).
    Would entity2’s cache region be updated after entity1 was updated or even deleted

    1. Hibernate is not going to do any syncronization across regions because it does not know about any overlapping. In this case, it’s probably better not to use the 2nd-level cache at all. Any way, the applicability of the second-level cache is only justified for reducing load on the Master node. If you’re using it to provide better read throughput, then you’re doing it all wrong because you can do a much better job if you tune the DB buffers correctly and redirect read traffic to Slave nodes.

  7. Thanks Vlad! Sorry i am new to this. Do i need to raise a bug on hibernate and then submit the test case or is there a certain process to this?

    1. There’s already a bug created as indicated in the answer you got on StackOverflow. You only need to create a replicating test case and attach it to the JIRA issue. It’s really simple.

  8. I can’t get the Bytecode enhancement solution to work on my project with hibernate 4.3.5 and postgres 9.3. It still loads the Blob, I guess I will use the subentities solution.

  9. Hi Vlad,

    Thanks for this good article. By using Bytecode enhancement plugin with Hibernate 5.2.10.Final, I can load byte[] field lazily to improve application performance. However I find an issue once enhancement plugin is in place.

    Following is my Organization entity, which has a primaryUser field as foreign key to point to a User object.

    @Entity
    @Table(schema = “ormenhance”)
    public class Organization {
    @SequenceGenerator(name = “org_seq_generator”, sequenceName = “org_seq”)

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO,
            generator = "org_seq_generator")
    @Access(AccessType.PROPERTY)
    private Integer id;
    
    @Column
    private String name;
    
    @OneToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "primary_user_id", referencedColumnName = "id")
    private User primaryUser;
    
    @Basic(fetch = FetchType.LAZY)
    @Column(length = 102400)
    private byte[] logo;
    
    @Column
    private boolean enabled;
    
    @OneToMany(mappedBy = "organization", fetch = FetchType.LAZY)
    private Set<User> users;
    
    .....
    

    }

    Here is User entity
    @Entity
    @Table(name = “\”user\””, schema = “ormenhance”)
    public class User {
    @SequenceGenerator(name = “user_seq_generator”, sequenceName = “user_seq”)

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO,
            generator = "user_seq_generator")
    private Integer id;
    
    @Column(columnDefinition = "bpchar")
    private String language;
    
    @Column(name = "first_name")
    private String firstName;
    
    @Column(name = "last_name")
    private String lastName;
    
    @ManyToOne
    @JoinColumn(name = "organization_id")
    private Organization organization;
    
    @Column
    private boolean enabled;
    
    ....
    

    }

    scenario 1: Disable Bytecode enhancement plugin, and fetch organization entity. Hibernate issues one sql below
    select organizati0_.id as id1_0_0_, organizati0_.enabled as enabled2_0_0_, organizati0_.logo as logo3_0_0_, organizati0_.name as name4_0_0_, organizati0_.primary_user_id as primary_5_0_0_ from ormenhance.organization organizati0_ where organizati0_.id=?

    Since logo is retrieved together, I plan to use Bytecode enhancement plugin to fetch it as demand to improve the performance.

    scenario 2: Enable Bytecode enhancement plugin, and fetch organization entity. Hibernate issues two sqls below
    1) select organizati0_.id as id1_0_0_, organizati0_.enabled as enabled2_0_0_, organizati0_.name as name4_0_0_, organizati0_.primary_user_id as primary_5_0_0_ from ormenhance.organization organizati0_ where organizati0_.id=?
    2) select user0_.id as id1_1_0_, user0_.enabled as enabled2_1_0_, user0_.first_name as first_na3_1_0_, user0_.language as language4_1_0_, user0_.last_name as last_nam5_1_0_, user0_.organization_id as organiza6_1_0_, organizati1_.id as id1_0_1_, organizati1_.enabled as enabled2_0_1_, organizati1_.name as name4_0_1_, organizati1_.primary_user_id as primary_5_0_1_ from ormenhance.”user” user0_ left outer join ormenhance.organization organizati1_ on user0_.organization_id=organizati1_.id where user0_.id=?

    The first sql is as good as expected(no logo column any more). However why the second sql is issued to fetch the user object for primaryUser field? It is not desired result and will reduce the performance.

    Could you please help me how to fetch primaryUser field lazily when Bytecode enhancement plugin is enabled?

    Thanks for your time!

      1. Thanks Vlad for your quick response!
        As per your suggestion, I add FetchType.LAZY to ManyToOne association in User entity

        @ManyToOne(fetch = FetchType.LAZY)
        @JoinColumn(name = "organization_id")
        private Organization organization;
        

        However, hibernate still issues two sqls to resolve organization and user object. log is below:

        2017-06-13 14:05:00 DEBUG TransactionImpl:55 – begin
        2017-06-13 14:05:00 DEBUG SQL:92 – select organizati0_.id as id1_0_0_, organizati0_.enabled as enabled2_0_0_, organizati0_.name as name4_0_0_, organizati0_.primary_user_id as primary_5_0_0_ from ormenhance.Organization organizati0_ where organizati0_.id=?
        Hibernate: select organizati0_.id as id1_0_0_, organizati0_.enabled as enabled2_0_0_, organizati0_.name as name4_0_0_, organizati0_.primary_user_id as primary_5_0_0_ from ormenhance.Organization organizati0_ where organizati0_.id=?
        2017-06-13 14:05:00 DEBUG ResultSetProcessorImpl:120 – Starting ResultSet row #0
        2017-06-13 14:05:00 DEBUG EntityReferenceInitializerImpl:126 – On call to EntityIdentifierReaderImpl#resolve, EntityKey was already known; should only happen on root returns with an optional identifier specified
        2017-06-13 14:05:00 DEBUG TwoPhaseLoad:141 – Resolving associations for [org.hibernate.bugs.Organization#1]
        2017-06-13 14:05:00 DEBUG SQL:92 – select user0_.id as id1_1_0_, user0_.enabled as enabled2_1_0_, user0_.first_name as first_na3_1_0_, user0_.language as language4_1_0_, user0_.last_name as last_nam5_1_0_, user0_.organization_id as organiza6_1_0_ from ormenhance.”user” user0_ where user0_.id=?
        Hibernate: select user0_.id as id1_1_0_, user0_.enabled as enabled2_1_0_, user0_.first_name as first_na3_1_0_, user0_.language as language4_1_0_, user0_.last_name as last_nam5_1_0_, user0_.organization_id as organiza6_1_0_ from ormenhance.”user” user0_ where user0_.id=?
        2017-06-13 14:05:00 DEBUG ResultSetProcessorImpl:120 – Starting ResultSet row #0
        2017-06-13 14:05:00 DEBUG EntityReferenceInitializerImpl:126 – On call to EntityIdentifierReaderImpl#resolve, EntityKey was already known; should only happen on root returns with an optional identifier specified
        2017-06-13 14:05:00 DEBUG TwoPhaseLoad:141 – Resolving associations for [org.hibernate.bugs.User#1]
        2017-06-13 14:05:00 DEBUG TwoPhaseLoad:281 – Done materializing entity [org.hibernate.bugs.User#1]
        2017-06-13 14:05:00 DEBUG ResourceRegistryStandardImpl:73 – HHH000387: ResultSet’s statement was not registered
        2017-06-13 14:05:00 DEBUG AbstractLoadPlanBasedEntityLoader:189 – Done entity load : org.hibernate.bugs.User#1
        2017-06-13 14:05:00 DEBUG TwoPhaseLoad:281 – Done materializing entity [org.hibernate.bugs.Organization#1]
        2017-06-13 14:05:00 DEBUG ResourceRegistryStandardImpl:73 – HHH000387: ResultSet’s statement was not registered
        2017-06-13 14:05:00 DEBUG AbstractLoadPlanBasedEntityLoader:189 – Done entity load : org.hibernate.bugs.Organization#1
        Organization name: ROOTORG

        Could you please have a look?

        Many thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s