The best way to map an Enum Type with JPA and Hibernate

Introduction

Recently, I got a very good question on Twitter, and I decided that the best way to answer it is with a new article.

In this article, we are going to see how various ways to map an Enum using JPA and Hibernate.

Domain Model

Considering we have the following entity:

The Post entity has a status attribute which is an Enum, represented by the PostStatus type. In JPA, Enum types must be marked with the @Enumerated annotation, which can take an EnumType defining if the Java Enumeration is mapped to either a String or an Integer column type.

Mapping Enum to a String column type

To map the Enum to a String database column type, you need to specify the EnumType.STRING value when using the @Enumerated annotation.

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    @Enumerated(EnumType.STRING)
    @Column(length = 8)
    private PostStatus status;

    //Getters and setters omitted for brevity
}

Assuming we are using PostgreSQL, on the database side, it is expected to have the following table:

CREATE TABLE post (
    id INT8 NOT NULL, 
    title VARCHAR(255), 
    status VARCHAR(8), 
    PRIMARY KEY (id)
)

Now, when inserting a Post entity:

Post post = new Post();
post.setId( 1L );
post.setTitle( "High-Performance Java Persistence" );
post.setStatus( PostStatus.PENDING );

entityManager.persist( post );

Hibernate generates the following SQL INSERT statement:

INSERT INTO post (
    status, 
    title, 
    id
) 
VALUES (
    'PENDING', 
    'High-Performance Java Persistence', 
    1
)

As expected, the String representation of the Java Enum was used to populate the associated database column value.

However, as readable as this might be for the developer, this column takes way more space than necessary. In this case, the status column takes 8 bytes. If we were to store 100 million post records, the status column alone will occupy 800 MB.

Mapping Enum to an Integer column type

As previously explained, you can also use an Integer column type. The Post entity mapping becomes as follows:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    @Enumerated
    @Column(columnDefinition = "smallint")
    private PostStatus status;

    //Getters and setters omitted for brevity
}

Notice that the @Enumerated column does not need to take the ORDINAL EnumType value since that’s used by default. We are also using the smallint integer column type since it is improbable that we will ever need more than 2 bytes to store all values for the Post Enum type.

As expected, on the database side, the following table will be used:

CREATE TABLE post (
    id INT8 NOT NULL, 
    title VARCHAR(255), 
    status SMALLINT, 
    PRIMARY KEY (id)
)

Now, when inserting the same Post entity, Hibernate generates the following SQL INSERT statement:

INSERT INTO post (
    status, 
    title, 
    id
) 
VALUES (
    0, 
    'High-Performance Java Persistence', 
    1
)

The Enum ordinal value, which is 0 for PostStatus.PENDING, is used to represent the Java Enum type.

Now, this is much more efficient, but it less expressive. So, how can we have both performance and readability?

Actually, it’s pretty simple. Just create a post_status_info as follows:


CREATE TABLE post_status_info (
    id SMALLINT NOT NULL, 
    description VARCHAR(255), 
    name VARCHAR(255), 
    PRIMARY KEY (id)
)

Now, you can populate to describe all PostStatu Enum values:

INSERT INTO post_status_info (
    description, 
    name, 
    id
) 
VALUES (
    'Posts waiting to be approved by the admin', 
    'PENDING', 
    0
)

INSERT INTO post_status_info (
    description, 
    name, 
    id
) 
VALUES (
    'Posts approved by the admin', 
    'APPROVED', 
    1
)

INSERT INTO post_status_info (
    description, 
    name, 
    id
) 
VALUES (
    'Posts rejected as spam', 
    'SPAM', 
    2
)

Now you can get the description by joining the post_status_info table, as illustrated by the following example:

Tuple tuple = (Tuple) entityManager
.createNativeQuery(
        "SELECT " +
        "    p.id, " +
        "    p.title, " +
        "    p.status, " +
        "    psi.name, " +
        "    psi.description " +
        "FROM post p " +
        "INNER JOIN post_status_info psi ON p.status = psi.id " +
        "WHERE p.id = :postId", Tuple.class
)
.setParameter( "postId", 1L )
.getSingleResult();

assertEquals(
    "PENDING", 
    tuple.get( "name" )
);

assertEquals(
    "Posts waiting to be approved by the admin", 
    tuple.get( "description" )
);

Cool, right?

Mapping a Java Enum to a database-specific Enumarated column type

Since we have been using PostgreSQL, let’s use the dedicated Enumared column type.

First, we need to create a new PostgreSQL enum type to mirror the Java PostStatus enumeration:

CREATE TYPE post_status_info AS ENUM (
    'PENDING', 
    'APPROVED', 
    'SPAM'
)

Now, to map this custom PostgreSQL type we can no longer use the default Hibernate-specific org.hibernate.type.EnumType because PostgreSQL expects an Object type, not a VARCHAR or an INT.

Luckily, we can easily create a custom Type by extending org.hibernate.type.EnumType:

public class PostgreSQLEnumType extends org.hibernate.type.EnumType {
    
    public void nullSafeSet(
            PreparedStatement st, 
            Object value, 
            int index, 
            SharedSessionContractImplementor session) 
        throws HibernateException, SQLException {
        if(value == null) {
            st.setNull( index, Types.OTHER );
        }
        else {
            st.setObject( 
                index, 
                value.toString(), 
                Types.OTHER 
            );
        }
    }
}

That’s it. Now, we can use the PostgreSQLEnumType as follows:

@Entity(name = "Post")
@Table(name = "post")
@TypeDef(
    name = "pgsql_enum",
    typeClass = PostgreSQLEnumType.class
)
public static class Post {

    @Id
    private Long id;

    private String title;

    @Enumerated(EnumType.STRING)
    @Column(columnDefinition = "post_status_info")
    @Type( type = "pgsql_enum" )
    private PostStatus status;

    //Getters and setters omitted for brevity
}

When inserting the Post entity, our JDBC statement interceptor will see the enum as a String, although in the database a dedicated Enum type is going to be used:

INSERT INTO post (
    status, 
    title, 
    id
) 
VALUES (
    'PENDING', 
    'High-Performance Java Persistence', 
    1
)

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

As I explained in my book, the choice of database types can have a significant impact on application performance, and Enum types are no different.

Although you might be tempted to use the STRING EnumType, in reality, this is the least efficient option. So, if you have hundreds of millions of entries, it’s better to use either the ORDINAL EnumType or the database-specific type. PostgreSQL uses 4 bytes for an Enumerated type, therefore the ORDINAL which can use a SMALLINT is the most compact option. Although the MySQL Enum takes either 1 or 2 bytes, the MySQL Enum type has many downsides.

However, there are downsides to using ORDINAL as well. In case you need to reorder enum values or add/remove/rename entries, the STRING or the database-specific type will require a minimum amount of effort. On the other hand, removing the first element from ann Enum will require to shift all entries in case you are using ORDINAL.

Therefore, it’s all about trade-offs, so choose wisely.

Enter your email address to follow this blog and receive notifications of new posts by email.

Advertisements

19 thoughts on “The best way to map an Enum Type with JPA and Hibernate

  1. Every time i see someone use ORDINAL instead of String and then i have to do a query on the DB and search dont know where to find what this numbers mean i have to be held back to not cut someones head off. I hate this ordinals. Luckily we dont have to count every byte and can be developer-friendly with Enumarated(EnumType.STRING)

      1. Our clients use SQL Server so i dont have the latter options you mentioned. When i would switch from String Enums to creating additional unmapped table, have to maintain them and resorting to native queries – that would make my colleagues kill me. I dont see anyone being happy with this solution.

      2. If you have more than 100 million rows in a table, then you’d probably want to use the ORDINAL EnumType. The problem you mentioned is due to renaming entries or reordering them. If you don’t have this requirement and you always append new values to the end of the Enum, you should be fine. I don’t see why would that be a terrible problem.

  2. Great article, Vlad.

    I have to be honest, I prefer and recommend to use EnumType.STRING for most of cases, because we have a much better expressiveness when quering tables (the business is right in front of developers, it’s clear and there’s no noise). Of course, if we’re talking about a table where there’s a chance to get bigger then ORDINAL sounds to be the best choice.

      1. Even if you have 100k, or even 1 million rows, it makes no difference.
        You are right though in your article, at 100 million it starts to make a difference.

  3. IMO, using string value is better choice despite being less efficient as migration will be difficult in case of refactoring of enum type (adding/removing/re-ordering) if we use ordinal value.

    But yes, db specific enum type is the best solution.

  4. Hi Vlad,
    Thank you for next great article.

    BTW, I have a little question.
    In my actual project we have a little problem with multi-language enum.
    We have a lot of enum values that we show on FrontEnd as translated message (translations are in .properties files).
    If user sorting data by this column in other than standard application language – data sometimes sort wrong.
    For example – we have enum “Language” with values: “ENGLISH”, “CYRILLIC”.
    For end user this enum values show as “English” and “Cyrillic” when application language = English and as “Angielski”, “Cyrylica” when application language = Polish.

    Is the only solution for this problem is to create a separate table with translations for the enum?

    1. Use the Enum as a code: either ORDINAL, or STRING (the code of the message) or a DB-specific Enum. Taking the code and the current Locale, you can display it in the UI. But, the user will always submit teh code back to the server and all checks and comparisons are going against the Enum code.

  5. Now, for the ultimate JPA ENUM challenge, how do you declare and manage an array of ENUM’s (other than resorting to a collection).

    I am developing a Samurai Sudoku game, and I need to determine the value for a given cell, in a matrix that is 9×9, with the range of legal possible values being 1-9. Each cell has an array of 10 ENUM’s to capture what is currently known about the possibility that the cell is any given digit, ex: UNKNOWN, POSSIBLE, IMPOSSIBLE, KNOWN.

    ex: PossibleValueState cellPossibleValues[] = PossibleValueState[10].

    The ordinal position in the array defines the digit. i.e. if it is known that the value of the cell cannot be 4, then the value of cellPossibleValues[4] == PossibleValueState.IMPOSSIBLE etc.

    Details available at: https://stackoverflow.com/questions/42238635/how-to-store-an-array-of-enum-in-hibernate.

      1. Now, I just need to learn enough about Maven plugin’s to decipher the comments in your readme file about how to get this to run in Eclipse :-). Or pull down IntelliJ and learn how to compile and run the code in a new IDE. …

        I noticed that your new test used PostgreSQL95, and the new Enum case on line 122, used an Integer array. I am running with HSQLDB, and using String Enums. Will either of these be a problem, once I got the IDE issues squared away?

      2. You just need to import the project from Maven. It’s even easier with String enums. But you need to adapt the test for that. I will also offer a test case for this use case and release it as a new version. But that will happen in 2 weeks, if I have time.

      3. I’ve tried importing the project into Eclipse several different ways, a couple of which bring the code in, but not in a usable form. I went through your book, hoping for some clues. You get into some wonderfully exotic stuff, but I didn’t find anything on getting the sample code to install or run.

      4. I stopped using Eclipse 12 years ago and never looked back. It has so many issues, among which you can’t even get this project imported from Maven. It works like a charm on IintelliJ. For Eclipse, check the sections in the README.

        One guy from a previous training wrote those sections, so they might help you.

  6. Thanks for this article.

    I did notice, however, that the enum ordinal starts at 0, while the post_status_info table starts at 1, so won’t match with the join.

    I guess, however, it isn’t a big deal to do an INSERT from 0, however, as it’s not auto increment or anything.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s