The best way to implement an audit log using Hibernate Envers

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to learn what is the best way to implement an audit log to track INSERT, UPDATE, and DELETE statements using Hibernate Envers.

As previously explained, CDC (Change Data Capture) is an essential step to extract change events from an OLTP application to make them available to other modules in an enterprise system (e.g. caches, data warehouse).

While Debezium is the most efficient way of doing CDC, it might be that you need a simpler solution in your project. Hibernate Envers is a Hibernate ORM extension that allows you to capture change events in no time.

In this article, we are going to see the most efficient way of using Hibernate Envers.

Hibernate Envers Maven dependency

Because Hibernate Envers is packaged as a separate dependency, if you want to use it, you need to declare the following Maven dependency:

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-envers</artifactId>
    <version>${hibernate.version}</version>
</dependency>

Hibernate Envers @Audited annotation

Now, after adding the hibernate-envers dependency, you need to instruct Hibernate which entities should be audited, and this can be done via the @Audited entity-level annotation.

@Entity(name = "Post")
@Table(name = "post")
@Audited
public class Post {

    @Id
    private Long id;

    private String title;

    //Getters and setters omitted for brevity

    @Override
    public String toString() {
        return "Post{" +
                "id=" + id +
                ", title='" + title + '\'' +
                '}';
    }
}

Hibernate Envers default audit logging strategy

Once you add the @Audited annotation to your entity and generate the database schema using the hbm2ddl tool, the following audit tables are going to be created:

CREATE TABLE post_AUD
(
    id      BIGINT NOT NULL,
    REV     INTEGER NOT NULL,
    REVTYPE TINYINT,
    title   VARCHAR(255),
    PRIMARY KEY ( id, REV )
)
    
CREATE TABLE revinfo 
( 
    rev      INTEGER GENERATED BY DEFAULT 
             AS IDENTITY ( START WITH 1 ), 
    revtstmp BIGINT, 
    PRIMARY KEY ( rev ) 
)   

ALTER TABLE post_aud 
ADD CONSTRAINT 
    fkllaf9w93qaiooguo8mfvwtwbg 
FOREIGN KEY ( REV ) REFERENCES revinfo 

Using hbm2ddl is not recommended for production environments so use it just for prototyping your database schema. Therefore, you should prefer using an automatic schema migration tool like Flyway.

Nevertheless, Hibernate expects to find the aforementioned audit tables in your database, otherwise, Envers will fail. So, make sure you add them to your migration scripts along with the actual entity tables.

The revinfo table stores the revision number and its epoch timestamp while the post_AUD table stores the entity snapshot at a particular revision.

In Envers terminology, a revision stands for a database transaction that has either inserted, updated, or deleted the audited entity. Each audited entity has a mirrored table that, by default, ends with the AUD suffix and which stores the state of the entity at the end of a particular revision.

The REVTYPE column stores the ordinal of the RevisionType Enum which encapsulates one of the following entity state modifications:

  • ADD – an INSERT SQL statement has created the entity in question
  • MOD – an UPDATE SQL statement has modified our entity
  • DEL – a DELETE SQL statement has removed the audited entity from the database table

Persisting an entity

When creating a Post entity and persisting into the currently running Persistence Context:

Post post = new Post();
post.setId( 1L );
post.setTitle( 
    "High-Performance Java Persistence 1st edition" 
);
entityManager.persist( post );

Hibernate generates the following three SQL INSERT statements:

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence 1st edition', 1)
    
INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503062974131)

INSERT INTO post_AUD (REVTYPE, title, id, REV) 
VALUES (0, 'High-Performance Java Persistence 1st edition', 1, 1)
  • The first statement creates the Post entity and is executed by Hibernate ORM.
  • The second statement is executed by Envers to create a new revision.
  • The third statement is also executed by Envers to capture the state of the Post entity at this revision.

Notice the REVTYPE column value is 0, matching the RevisionType.ADD enumeration value.

Updating an entity

When updating the previously created Post entity:

Post post = entityManager.find( Post.class, 1L );

post.setTitle( 
    "High-Performance Java Persistence 2nd edition" 
);

Hibernate generates the following SQL statements:

SELECT p.id as id1_0_0_, p.title as title2_0_0_ 
FROM   post p 
WHERE  p.id = 1

UPDATE post 
SET    title = 'High-Performance Java Persistence 2nd edition' 
WHERE  id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503064508185)

INSERT INTO post_AUD (REVTYPE, title, id, REV) 
VALUES (1, 'High-Performance Java Persistence 2nd edition', 1, 2)
  • The first two statements are generated by Hibernate ORM to load and modify the entity.
  • The second statements are generated by Envers to capture the new revision and the new state of the Post entity.

Notice the REVTYPE column value is 1, matching the RevisionType.MOD enumeration value.

Removing an entity

When removing the Post entity:

entityManager.remove(
    entityManager.getReference( Post.class, 1L )
);

Hibernate generates the following SQL statements:

DELETE FROM post 
WHERE id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503065294147)

INSERT INTO post_AUD (REVTYPE, title, id, REV) 
VALUES (2, NULL(VARCHAR), 1, 3)

After the entity is removed from the database table by Hibernate ORM, Envers inserts a new revision and all the properties, except for the entity identifier, are going to be set to null.

Notice the REVTYPE column value is 2, matching the RevisionType.DEL enumeration value.

Querying the entity snapshots

Hibernate Envers offers the AuditReaderFactory which takes a JPA EntityManager or a Hibernate Session instance and generates an AuditReader object.

The AuditReader can be used to query the audit log as demonstrated by the following example which extracts all the Post entity state snapshots:

List<Post> posts = AuditReaderFactory.get( entityManager )
.createQuery()
.forRevisionsOfEntity( Post.class, true, true )
.add( AuditEntity.id().eq( 1L ) )
.getResultList();

assertEquals( 3, posts.size() );

for ( int i = 0; i < posts.size(); i++ ) {
    LOGGER.info( 
        "Revision {} of Post entity: {}", 
        i + 1, 
        posts.get( i ) 
    );
}

When running the test case above, Hibernate generates the following output:

SELECT   p.id as id1_1_, p.REV as REV2_1_, 
         p.REVTYPE as REVTYPE3_1_, p.title as title4_1_ 
FROM     post_AUD p 
WHERE    p.id = 1 
ORDER BY p.REV ASC

-- Revision 1 of Post entity: Post{id=1, title='High-Performance Java Persistence 1st edition'}
-- Revision 2 of Post entity: Post{id=1, title='High-Performance Java Persistence 2nd edition'}
-- Revision 3 of Post entity: Post{id=1, title='null'}

So far, so good!

However, assuming that you have previously extracted the Post entity revisions:

List<Number> revisions = doInJPA( entityManager -> {
    return AuditReaderFactory.get( entityManager ).getRevisions(
        Post.class, 1L
    );
} );

If you want to load the Post entity snapshot as of a given revision:

Post post = (Post) AuditReaderFactory.get( entityManager )
.createQuery()
.forEntitiesAtRevision( Post.class, revisions.get( 0 ) )
.getSingleResult();

assertEquals(
    "High-Performance Java Persistence 1st edition", 
    post.getTitle()
);

Hibernate Envers is going to generate an SQL query like this one:

SELECT p1.id AS id1_1_,
       p1.REV AS REV2_1_,
       p1.REVTYPE AS REVTYPE3_1_,
       p1.title AS title4_1_
FROM   post_AUD p1
WHERE   
( 
    p1.REV IN 
    ( 
        SELECT MAX(p2.REV)
        FROM   post_AUD p2
        WHERE  p2.REV <= 1
               AND p1.id = p2.id 
    ) 
)
AND p1.REVTYPE <> 2

Well, that’s definitely not very efficient!

Queries like the one above are generated by the DefaultAuditStrategy and, for a very large audit log, they don’t perform that well.

Luckily, Envers offers the ValidityAuditStrategy to overcome this issue.

Hibernate Envers ValidityAuditStrategy to the rescue

To switch from the DefaultAuditStrategy to ValidityAuditStrategy, you have the provide the following Hibernate configuration property:

<property name="org.hibernate.envers.audit_strategy"
          value="org.hibernate.envers.strategy.internal.ValidityAuditStrategy"
/>

Now, when generating the database schema using the hbm2ddl tool, the following audit tables are going to be created instead:

CREATE TABLE post_AUD
(
    id BIGINT NOT NULL ,
    REV INTEGER NOT NULL ,
    REVTYPE TINYINT ,
    REVEND INTEGER ,
    title VARCHAR(255) ,
    PRIMARY KEY ( id, REV )
)

CREATE TABLE revinfo 
( 
    rev      INTEGER GENERATED BY DEFAULT 
             AS IDENTITY ( START WITH 1 ), 
    revtstmp BIGINT, 
    PRIMARY KEY ( rev ) 
)  
    
ALTER TABLE post_AUD ADD CONSTRAINT 
    FKllaf9w93qaiooguo8mfvwtwbg 
FOREIGN KEY ( REV ) REFERENCES revinfo

ALTER TABLE post_AUD ADD CONSTRAINT 
    FKmo46u9kx2pmomhkxbmctlbwmg 
FOREIGN KEY ( REVEND ) REFERENCES revinfo

The revinfo table is identical with the DefaultAuditStrategy, but the post_AUD features a new REVEND column which references the revinfo table and marks the last revision for which this entity snapshot was still valid.

Persisting an entity

When persisting the same Post entity, Hibernate generates the following SQL statements:

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence 1st edition', 1)
    
INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503116813359)

INSERT INTO post_AUD (REVTYPE, REVEND, title, id, REV)
VALUES (0, NULL(INTEGER), 'High-Performance Java Persistence 1st edition', 1, 1)

The REVEND column value is NULL, so this audit log record is, therefore, valid.

Updating an entity

When updating the Post entity, Hibernate executes the following SQL statements:

SELECT p.id as id1_0_0_, p.title as title2_0_0_ 
FROM   post p 
WHERE  p.id = 1

UPDATE post 
SET    title = 'High-Performance Java Persistence 2nd edition' 
WHERE  id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503117067335)

INSERT INTO post_AUD (REVTYPE, REVEND, title, id, REV)
VALUES (1, NULL(INTEGER), 'High-Performance Java Persistence 2nd edition', 1, 2)

UPDATE post_AUD 
SET    REVEND = 2 
WHERE  id = 1 
       AND REV <> 2 
       AND REVEND IS NULL

Not only that a new Post entity audit log entry is added, but the previous one is updated to mark that it’s no longer valid.

Compared to the DefaultAuditStrategy, the ValidityAuditStrategy does more work while storing the audit logs because it needs to update the end of revision.

However, just like with a database index, this extra write-time overhead is insignificant compared to the read-time response time improvement.

Removing an entity

When deleting the Post entity, Hibernate generates the following SQL statements:

DELETE FROM post 
WHERE id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503117987724)

INSERT INTO post_AUD (REVTYPE, REVEND, title, id, REV)
VALUES (2, NULL(INTEGER), NULL(VARCHAR), 1, 3)

UPDATE post_AUD 
SET    REVEND = 3 
WHERE  id = 1 
       AND REV <> 3 
       AND REVEND IS NULL

Just like with the Post entity update, the entity removal will also mark the end of revision for the previously stored audit log record.

Querying the entity snapshots

While fetching all Post entity state snapshots generates the same SQL query like DefaultAuditStrategy, it is when querying by a specific revision that the benefit of having the REVEND extra column will provide a better query execution plan:

SELECT p.id as id1_1_, p.REV as REV2_1_, 
       p.REVTYPE as REVTYPE3_1_, p.REVEND as REVEND4_1_, 
       p.title as title5_1_ 
FROM   post_AUD p 
WHERE  p.REV <= 1 
       AND p.REVTYPE <> 2 
       AND ( p.REVEND > 1 OR p.REVEND IS NULL)

Much better than using a correlated subquery! To speed up this type of query, we could add an index for the REV and REVEND columns, therefore, avoiding a sequential scan.

I'm running an online workshop on the 20-21 and 23-24 of November about High-Performance Java Persistence.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

Hibernate Envers is extremely easy to set up, making it a very compelling choice for a Hibernate-based application. Although the DefaultAuditStrategy does less work while executing the entity state modification transaction, the ValidityAuditStrategy is a much better choice when trying to speed up the entity snapshot retrievals.

Depending on your application requirements, you should pick the revision strategy that better suits you audit log data access patterns.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.