The best way to implement an audit log using Hibernate Envers

Introduction

As previously explained, CDC (Change Data Capture) is an essential step to extract change events from an OLTP application to make them available to other modules in an enterprise system (e.g. caches, data warehouse).

While Debezium is the most efficient way of doing CDC, it might be that you need a simpler solution in your project. Hibernate Envers is a Hibernate ORM extension that allows you to capture change events in no time.

In this article, we are going to see the most efficient way of using Hibernate Envers.

Maven dependency

Because Hibernate Envers is packaged as a separate dependency, if you want to use it, you need to declare the following Maven dependency:

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-envers</artifactId>
    <version>${hibernate.version}</version>
</dependency>

@Audited

Now, after adding the hibernate-envers dependency, you need to instruct Hibernate which entities should be audited, and this can be done via the @Audited entity-level annotation.

@Entity(name = "Post")
@Table(name = "post")
@Audited
public static class Post {

    @Id
    private Long id;

    private String title;

    //Getters and setters omitted for brevity

    @Override
    public String toString() {
        return "Post{" +
                "id=" + id +
                ", title='" + title + '\'' +
                '}';
    }
}

Default audit logging strategy

Once you add the @Audited annotation to your entity and generate the database schema using the hbm2ddl tool, the following audit tables are going to be created:

CREATE TABLE post_AUD
(
    id      BIGINT NOT NULL,
    REV     INTEGER NOT NULL,
    REVTYPE TINYINT,
    title   VARCHAR(255),
    PRIMARY KEY ( id, REV )
)
    
CREATE TABLE revinfo 
( 
    rev      INTEGER GENERATED BY DEFAULT 
             AS IDENTITY ( START WITH 1 ), 
    revtstmp BIGINT, 
    PRIMARY KEY ( rev ) 
)   

ALTER TABLE post_aud 
ADD CONSTRAINT 
    fkllaf9w93qaiooguo8mfvwtwbg 
FOREIGN KEY ( REV ) REFERENCES revinfo 

Using hbm2ddl is not recommended for production environments so use it just for prototyping your database schema. Therefore, you should prefer using an automatic schema migration tool like Flyway.

Nevertheless, Hibernate expects to find the aforementioned audit tables in your database, otherwise, Envers will fail. So, make sure you add them to your migration scripts along with the actual entity tables.

The revinfo table stores the revision number and its epoch timestamp while the post_AUD table stores the entity snapshot at a particular revision.

In Envers terminology, a revision stands for a database transaction that has either inserted, update or deleted the audited entity. Each audited entity has a mirrored table that, by default, ends with the AUD suffix and which stores the state of the entity at the end of a particular revision.

The REVTYPE column stores the ordinal of the RevisionType Enum which encapsulates one of the following entity state modifications:

  • ADD – an INSERT SQL statement has created the entity in question
  • MOD – an UPDATE SQL statement has modified our entity
  • DEL – a DELETE SQL statement has removed the audited entity from the database table

Persisting an entity

When creating a Post entity and persisting into the currently running Persistence Context:

Post post = new Post();
post.setId( 1L );
post.setTitle( 
    "High-Performance Java Persistence 1st edition" 
);
entityManager.persist( post );

Hibernate generates the following three SQL INSERT statements:

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence 1st edition', 1)
    
INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503062974131)

INSERT INTO post_AUD (REVTYPE, title, id, REV) 
VALUES (0, 'High-Performance Java Persistence 1st edition', 1, 1)
  • The first statement creates the Post entity and is executed by Hibernate ORM.
  • The second statement is executed by Envers to create a new revision.
  • The third statement is also executed by Envers to capture the state of the Post entity at this revision.

Notice the REVTYPE column value is 0, matching the RevisionType.ADD enumeration value.

Updating an entity

When updating the previously created Post entity:

Post post = entityManager.find( Post.class, 1L );

post.setTitle( 
    "High-Performance Java Persistence 2nd edition" 
);

Hibernate generates the following SQL statements:

SELECT p.id as id1_0_0_, p.title as title2_0_0_ 
FROM   post p 
WHERE  p.id = 1

UPDATE post 
SET    title = 'High-Performance Java Persistence 2nd edition' 
WHERE  id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503064508185)

INSERT INTO post_AUD (REVTYPE, title, id, REV) 
VALUES (1, 'High-Performance Java Persistence 2nd edition', 1, 2)
  • The first two statements are generated by Hibernate ORM to load and modify the entity.
  • The second statements are generated by Envers to capture the new revision and the new state of the Post entity.

Notice the REVTYPE column value is 1, matching the RevisionType.MOD enumeration value.

Removing an entity

When removing the Post entity:

entityManager.remove(
    entityManager.getReference( Post.class, 1L )
);

Hibernate generates the following SQL statements:

DELETE FROM post 
WHERE id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503065294147)

INSERT INTO post_AUD (REVTYPE, title, id, REV) 
VALUES (2, NULL(VARCHAR), 1, 3)

After the entity is removed from the database table by Hibernate ORM, Envers inserts a new revision and all the properties, except for the entity identifier, are going to be set to null.

Notice the REVTYPE column value is 2, matching the RevisionType.DEL enumeration value.

Querying the entity snapshots

Hibernate Envers offers the AuditReaderFactory which takes a JPA EntityManager or a Hibernate Session instance and generates an AuditReader object.

The AuditReader can be used to query the audit log as demonstrated by the following example which extracts all the Post entity state snapshots:

List<Post> posts = AuditReaderFactory.get( entityManager )
.createQuery()
.forRevisionsOfEntity( Post.class, true, true )
.add( AuditEntity.id().eq( 1L ) )
.getResultList();

assertEquals( 3, posts.size() );

for ( int i = 0; i < posts.size(); i++ ) {
    LOGGER.info( 
        "Revision {} of Post entity: {}", 
        i + 1, 
        posts.get( i ) 
    );
}

When running the test case above, Hibernate generates the following output:

SELECT   p.id as id1_1_, p.REV as REV2_1_, 
         p.REVTYPE as REVTYPE3_1_, p.title as title4_1_ 
FROM     post_AUD p 
WHERE    p.id = 1 
ORDER BY p.REV ASC

-- Revision 1 of Post entity: Post{id=1, title='High-Performance Java Persistence 1st edition'}
-- Revision 2 of Post entity: Post{id=1, title='High-Performance Java Persistence 2nd edition'}
-- Revision 3 of Post entity: Post{id=1, title='null'}

So far, so good!

However, assuming that you have previously extracted the Post entity revisions:

List<Number> revisions = doInJPA( entityManager -> {
    return AuditReaderFactory.get( entityManager ).getRevisions(
        Post.class, 1L
    );
} );

If you want to load the Post entity snapshot as of a given revision:

Post post = (Post) AuditReaderFactory.get( entityManager )
.createQuery()
.forEntitiesAtRevision( Post.class, revisions.get( 0 ) )
.getSingleResult();

assertEquals(
    "High-Performance Java Persistence 1st edition", 
    post.getTitle()
);

Hibernate Envers is going to generate an SQL query like this one:

SELECT p1.id AS id1_1_,
       p1.REV AS REV2_1_,
       p1.REVTYPE AS REVTYPE3_1_,
       p1.title AS title4_1_
FROM   post_AUD p1
WHERE   
( 
    p1.REV IN 
    ( 
        SELECT MAX(p2.REV)
        FROM   post_AUD p2
        WHERE  p2.REV <= 1
               AND p1.id = p2.id 
    ) 
)
AND p1.REVTYPE <> 2

Well, that’s definitely not very efficient!

Queries like the one above are generated by the DefaultAuditStrategy and, for a very large audit log, they don’t perform that well.

Luckily, Envers offers the ValidityAuditStrategy to overcome this issue.

ValidityAuditStrategy to the rescue

To switch from the DefaultAuditStrategy to ValidityAuditStrategy, you have the provide the following Hibernate configuration property:

<property name="org.hibernate.envers.audit_strategy"
          value="org.hibernate.envers.strategy.ValidityAuditStrategy"
/>

Now, when generating the database schema using the hbm2ddl tool, the following audit tables are going to be created instead:

CREATE TABLE post_AUD
(
    id BIGINT NOT NULL ,
    REV INTEGER NOT NULL ,
    REVTYPE TINYINT ,
    REVEND INTEGER ,
    title VARCHAR(255) ,
    PRIMARY KEY ( id, REV )
)

CREATE TABLE revinfo 
( 
    rev      INTEGER GENERATED BY DEFAULT 
             AS IDENTITY ( START WITH 1 ), 
    revtstmp BIGINT, 
    PRIMARY KEY ( rev ) 
)  
    
ALTER TABLE post_AUD ADD CONSTRAINT 
    FKllaf9w93qaiooguo8mfvwtwbg 
FOREIGN KEY ( REV ) REFERENCES revinfo

ALTER TABLE post_AUD ADD CONSTRAINT 
    FKmo46u9kx2pmomhkxbmctlbwmg 
FOREIGN KEY ( REVEND ) REFERENCES revinfo

The revinfo table is identical with the DefaultAuditStrategy, but the post_AUD features a new REVEND column which references the revinfo table and marks the last revision for which this entity snapshot was still valid.

Persisting an entity

When persisting the same Post entity, Hibernate generates the following SQL statements:

INSERT INTO post (title, id) 
VALUES ('High-Performance Java Persistence 1st edition', 1)
    
INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503116813359)

INSERT INTO post_AUD (REVTYPE, REVEND, title, id, REV)
VALUES (0, NULL(INTEGER), 'High-Performance Java Persistence 1st edition', 1, 1)

The REVEND column value is NULL, so this audit log record is, therefore, valid.

Updating an entity

When updating the Post entity, Hibernate executes the following SQL statements:

SELECT p.id as id1_0_0_, p.title as title2_0_0_ 
FROM   post p 
WHERE  p.id = 1

UPDATE post 
SET    title = 'High-Performance Java Persistence 2nd edition' 
WHERE  id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503117067335)

INSERT INTO post_AUD (REVTYPE, REVEND, title, id, REV)
VALUES (1, NULL(INTEGER), 'High-Performance Java Persistence 2nd edition', 1, 2)

UPDATE post_AUD 
SET    REVEND = 2 
WHERE  id = 1 
       AND REV <> 2 
       AND REVEND IS NULL

Not only that a new Post entity audit log entry is added, but the previous one is updated to mark that it’s no longer valid.

Compared to the DefaultAuditStrategy, the ValidityAuditStrategy does more work while storing the audit logs because it needs to update the end of revision.

However, just like with a database index, this extra write-time overhead is insignificant compared to the read-time response time improvement.

Removing an entity

When deleting the Post entity, Hibernate generates the following SQL statements:

DELETE FROM post 
WHERE id = 1

INSERT INTO REVINFO (REV, REVTSTMP) 
VALUES (default, 1503117987724)

INSERT INTO post_AUD (REVTYPE, REVEND, title, id, REV)
VALUES (2, NULL(INTEGER), NULL(VARCHAR), 1, 3)

UPDATE post_AUD 
SET    REVEND = 3 
WHERE  id = 1 
       AND REV <> 3 
       AND REVEND IS NULL

Just like with the Post entity update, the entity removal will also mark the end of revision for the previously stored audit log record.

Querying the entity snapshots

While fetching all Post entity state snapshots generates the same SQL query like DefaultAuditStrategy, it is when querying by a specific revision that the benefit of having the REVEND extra column will provide a better query execution plan:

SELECT p.id as id1_1_, p.REV as REV2_1_, 
       p.REVTYPE as REVTYPE3_1_, p.REVEND as REVEND4_1_, 
       p.title as title5_1_ 
FROM   post_AUD p 
WHERE  p.REV <= 1 
       AND p.REVTYPE <> 2 
       AND ( p.REVEND > 1 OR p.REVEND IS NULL)

Much better than using a correlated subquery! To speed up this type of query, we could add an index for the REV and REVEND columns, therefore, avoiding a sequential scan.

If you enjoyed this article, I bet you are going to love my book as well.

Conclusion

Hibernate Envers is extremely easy to set up, making it a very compelling choice for a Hibernate-based application. Although the DefaultAuditStrategy does less work while executing the entity state modification transaction, the ValidityAuditStrategy is a much better choice when trying to speed up the entity snapshot retrievals.

Depending on your application requirements, you should pick the revision strategy that better suits you audit log data access patterns.

If you liked this article, you might want to subscribe to my newsletter too.

Advertisements

6 thoughts on “The best way to implement an audit log using Hibernate Envers

  1. One question: isn’t it a better approach from the performance point of view to use the CDC mechanism available within the native database features ? In Oracle for example you con configure Asynchronous Oracle Streams for CDC with almost no overhead added to the transaction itself: when a DML operation is commited the changes are being read straight from the database redo log files. My point is that you should not replicate functionalities that are already available in the database. I’m making this point as I’ve seen a lot of developers that prefer the ‘black-box’ approach when they have to work with a database (any database) and afterwords they complain that “the data base is slow”.

  2. I admit, I did’t click on the link from the start and I thick it would be a good Idea to add “and parsing the redo log” at the original link of the post. My comment would have been different :). In many cases the trade-offs are made without knowing the full implications of the decisions and by the time you realized that you have traded performance and scalability for a lower initial cost it may be to late. Great blog by the way.

    1. If you don’t know the implications, the trade-off is ineffective, indeed. However, not every project requires a dedicated Apache Kafka and Zookeeper and running Ddebezium Kafka Connectors just to store the audit log. If you don’t have a large DB with many modifications that need to be propagated to many other sub-systems, then a simpler DB-based or application-based trigger solution is much more appealing.

  3. I noticed an inconsistency: You say “Notice the REV column value is 1, matching the RevisionType.ADD enumeration value.” (and similar for update/delete). However, it should read “REVTYPE column value is 0” (and similar for update/delete). At least this is what the SQL output tells me. Correct?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s