How to customize an entity association JOIN ON clause with Hibernate @JoinFormula

Last modified:

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

As I explained in this previous article, you can map calculated properties using Hibernate @Formula, and the value is generated at query time.

In this post, you’ll see how you can use a custom SQL fragment to customize the JOIN relationship between two entities, or an entity and a collection of embeddable types.

Domain Model

Assuming we have the following entities:

The Language entity is used so that we know how to render the application UI. The Country entity is used so that we can load various country-specific settings, like the associated vatRate.

However, the Account table does not contain any FOREIGN KEY to the Language or Country tables. Instead, the Account entity only defines a Locale attribute, and the Account table looks as follows:

CREATE TABLE Account (
    id int8 NOT NULL ,
    credit float8 ,
    locale VARCHAR(255) ,
    rate float8 ,
    PRIMARY KEY (id)
)

While having two FOREIGN KEY columns: country_id and locale_id would be better, we’ll assume a legacy database schema that cannot be easily modified. For this reason, we need to use the
locale column since it stores info about both language and country. What we need to do is to parse it and extract the country and language codes which can then be used to join the associated Country and Language tables.

While JPA doesn’t offer any support for such mapping, Hibernate has long been offering the @JoinFormula annotation.

Therefore, the Account mapping becomes:

@Entity(name = "Account")
public class Account {

    @Id
    private Long id;

    private Double credit;

    private Double rate;

    private Locale locale;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinFormula(
        "REGEXP_REPLACE(locale, '\\w+_(\\w+)[_]?', 
        '\\1')" 
    )
    private Country country;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinFormula(
        "REGEXP_REPLACE(locale, '(\\w+)_.*', 
        '\\1')"
    )
    private Language language;

    //Getters and setters omitted for brevity
}

The @ManyToOne associations are using FetchType.LAZY because EAGER fetching is a code smell.

The @JoinFormula uses the awesome REGEXP_REPLACE function which is supported by PostgreSQL, Oracle or MariaDB.

For the following tests, we are going to use PostgreSQL.

Testing time

Assuming we have the following entities:

Country _US = new Country();
_US.setId( "US" );
_US.setName( "United States" );
_US.setVatRate(0.1);

Country _UK = new Country();
_UK.setId( "UK" );
_UK.setName( "United Kingdom" );
_UK.setVatRate(0.2);

Country _Spain = new Country();
_Spain.setId( "ES" );
_Spain.setName( "Spain" );
_Spain.setVatRate(0.21);

Country _Mexico = new Country();
_Mexico.setId( "MX" );
_Mexico.setName( "Mexico" );
_Mexico.setVatRate(0.16);

Language _English = new Language();
_English.setId( "en" );
_English.setName( "English" );

Language _Spanish = new Language();
_Spanish.setId( "es" );
_Spanish.setName( "Spanish" );

doInJPA( entityManager -> {
    entityManager.persist( _US );
    entityManager.persist( _UK );
    entityManager.persist( _Spain );
    entityManager.persist( _Mexico );
    entityManager.persist( _English );
    entityManager.persist( _Spanish );
} );

doInJPA( entityManager -> {
    Account account1 = new Account( );
    account1.setId( 1L );
    account1.setCredit( 5000d );
    account1.setRate( 1.25 / 100 );
    account1.setLocale( Locale.US );
    entityManager.persist( account1 );

    Account account2 = new Account( );
    account2.setId( 2L );
    account2.setCredit( 200d );
    account2.setRate( 1.25 / 100 );
    account2.setLocale( new Locale( "es", "MX" ) );
    entityManager.persist( account2 );
} );

Lazy fetching

When fetching the Account entity, Hibernate managed to fetch the associated Country and Language entities:

doInJPA( entityManager -> {
    LOGGER.info("Fetch first Account");
    Account account1 = entityManager.find( 
        Account.class, 1L 
    );
    assertEquals( _English, account1.getLanguage());
    assertEquals( _US, account1.getCountry());

    LOGGER.info("Fetch second Account");
    Account account2 = entityManager.find( 
        Account.class, 2L 
    );
    assertEquals( _Spanish, account2.getLanguage());
    assertEquals( _Mexico, account2.getCountry());
} );

Behind the scenes, Hibernate executes the following SQL queries:

-- Fetch first Account

SELECT a.id AS id1_0_0_,
       a.credit AS credit2_0_0_,
       a.locale AS locale3_0_0_,
       a.rate AS rate4_0_0_,
       REGEXP_REPLACE(
           a.locale, 
           'w+_(w+)[_]?', 
           '\1'
       ) AS formula2_0_,
       REGEXP_REPLACE(
           a.locale, 
           '(w+)_.*', 
           '\1'
       ) AS formula3_0_
FROM   Account a
WHERE  a.id = 1

SELECT l.id AS id1_2_0_,
       l.name AS name2_2_0_
FROM   Language l
WHERE  l.id = 'en'

SELECT c.id AS id1_1_0_,
       c.name AS name2_1_0_,
       c.vatRate AS vatRate3_1_0_
FROM   Country c
WHERE  c.id = 'US'

-- Fetch second Account

SELECT a.id AS id1_0_0_,
       a.credit AS credit2_0_0_,
       a.locale AS locale3_0_0_,
       a.rate AS rate4_0_0_,
       REGEXP_REPLACE(
           a.locale, 
           'w+_(w+)[_]?', 
           '\1'
       ) AS formula2_0_,
       REGEXP_REPLACE(
           a.locale, 
           '(w+)_.*', 
           '\1'
       ) AS formula3_0_
FROM   Account a
WHERE  a.id = 2

SELECT l.id AS id1_2_0_,
       l.name AS name2_2_0_
FROM   Language l
WHERE  l.id = 'es'

SELECT c.id AS id1_1_0_,
       c.name AS name2_1_0_,
       c.vatRate AS vatRate3_1_0_
FROM   Country c
WHERE  c.id = 'MX'

Eager fetching

When JOIN FETCH-ing the language and the countryassociations:

Account account1 = entityManager.createQuery(
    "select a " +
    "from Account a " +
    "join a.language l " +
    "join a.country c " +
    "where a.id = :accountId", Account.class )
.setParameter("accountId", 1L)
.getSingleResult();

assertEquals( _English, account1.getLanguage());
assertEquals( _US, account1.getCountry());

Hibernate executes the following SQL query:

SELECT a.id                                            AS id1_0_, 
       a.credit                                        AS credit2_0_, 
       a.locale                                        AS locale3_0_, 
       a.rate                                          AS rate4_0_, 
       REGEXP_REPLACE(a.locale, '\w+_(\w+)[_]?', '\1') AS formula2_, 
       REGEXP_REPLACE(a.locale, '(\w+)_.*', '\1')      AS formula3_ 
FROM   Account a 
INNER JOIN 
       Language l 
ON REGEXP_REPLACE(a.locale, '(\w+)_.*', '\1') = l.id 
INNER JOIN 
       Country c 
ON REGEXP_REPLACE(a.locale, '\w+_(\w+)[_]?', '\1') = c.id 
WHERE  a.id = 1

Make sure that you use a function index for the @JoinFormula expressions to speed up the JOIN query execution.

Otherwise, the JOIN ON clause will require a full-table scan, hence it will be slow.

Therefore, we need to add the following two indexes to our database migration scripts:

CREATE INDEX account_language_idx 
ON Account (REGEXP_REPLACE(locale, '(\w+)_.*', '\1'));

CREATE INDEX account_country_idx 
ON Account (REGEXP_REPLACE(locale, '\w+_(\w+)[_]?', '\1'));

This way, when we EXPLAIN ANALYZE the previous SQL query which joins the Language and Country tables with the Account one,
PostgreSQL generates the following execution plan:

QUERY PLAN
Nested Loop (cost=0.43..24.51 rows=1 width=540) (actual time=0.094..0.095 rows=1 loops=1)
-> Nested Loop (cost=0.29..16.34 rows=1 width=540) (actual time=0.064..0.065 rows=1 loops=1)
-> Index Scan using account_pkey on account a (cost=0.14..8.16 rows=1 width=540) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: (id = 1)
-> Index Only Scan using language_pkey on language l (cost=0.14..8.16 rows=1 width=516) (actual time=0.012..0.012 rows=1 loops=1)
Index Cond: (id = regexp_replace((a.locale)::text, ‘(\w+)_.*’::text, ‘\1’::text))
Heap Fetches: 1
-> Index Only Scan using country_pkey on country c (cost=0.14..8.16 rows=1 width=516) (actual time=0.005..0.005 rows=1 loops=1)
-> Index Cond: (id = regexp_replace((a.locale)::text, ‘\w+_(\w+)[_]?’::text, ‘\1’::text))
-> Heap Fetches: 1
Planning time: 0.229 ms
Execution time: 0.171 ms

As illustrated by the above execution plan, PostgreSQL can benefit from these two function indexes as the execution plan uses an Index Only Scan for each JOIN relationship.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

While most of the time the @JoinColumn annotation is the best way to express an entity association, if the database schema doesn’t have a FOREIGN KEY column you could use, then @JoinFormula becomes very useful.

Code available on GitHub.

Follow @vlad_mihalcea