Building a Distributed Audit Log with YugabyteDB

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see what is the best way you can build an Audit Log using YugabyteDB.

And what’s even more interesting is that this solution is almost the same on PostgreSQL. The only difference is how we are building the audit log table PRIMARY KEY, but everything else is exactly the same, demonstrating how easily you can migrate from PostgreSQL to YugabyteDB.

Why use an audit log

An audit log or audit trail is a data structure that captures all the modifications that occur in a given system.

For instance, the Red Log in a database system is very similar to an audit log because it captures the logical or physical modifications that occurred while executing a given transaction.

However, while you could use a tool like Debezium to parse the Redo Log, the solution is quite complex and doesn’t allow you to capture additional information about the context in which a given modification has occurred.

For this reason, we are going to build our audit log using database triggers.

How to build use an audit log using YugabyteDB

The best way to build a generic audit log using YugabyteDB is to define one audit_log table that captures all changes happening across all our database tables, as illustrated by the following diagram:

Audit Log Tables

In our case, we have two database tables, author and book, that we want to audit. So, for this purpose, we created the audit_log table to capture any modification happing in any of those two tables.

The audit_log table is created using the following DDL statement:

CREATE TABLE IF NOT EXISTS audit_log (
    table_name varchar(255) NOT NULL,
    row_id bigint NOT NULL,
    old_row_data jsonb,
    new_row_data jsonb,
    dml_type dml_type NOT NULL,
    dml_timestamp timestamp NOT NULL,
    dml_created_by varchar(255) NOT NULL,
    trx_timestamp timestamp NOT NULL,
    PRIMARY KEY ((table_name, row_id) HASH, dml_type, dml_timestamp)
)

The audit_log table columns are going to store the following information:

  • table_name – stores the name of the table for the record that has just been inserted, updated, or deleted.
  • row_id – stores the Primary Key value for the table record that has just been inserted, updated, or deleted.
  • old_row_data – stores the JSON representation of the table row state prior to applying the recorded DML operation.
  • new_row_data – stores the JSON representation of the table row state after applying the recorded DML operation.
  • dml_type – stores the associated DML operation type (e.g., INSERT, UPDATE, DELETE).
  • dml_timestamp – stores the timestamp of the DML statement that we have captured.
  • trx_timestamp – stores the transaction timestamp in which was enlisted the DML statement that we have captured.

The reason why the first column of the PRIMARY KEY is created as (table_name, row_id) HASH is that YugabyteDB auto-sharding is done by default on the first column of the PRIMARY KEY unless HASH/ASC/DESC was specified.

By hashing the table_name and row_id, we ensure that the audit_log records are going to be evenly distributed across all shards. Otherwise, we’d risk generating hotspots on the tablets associated with the tables that generate more audit log entries.

On the other hand, on PostgreSQL, the PRIMARY KEY would be created like this:

  PRIMARY KEY (table_name, row_id, dml_type, dml_timestamp)

That’s basically the only difference between YugabyteDB and PostgreSQL when implementing this audit log solution.

In order to capture the modifications that occur on the author and book tables, we have to define the following database triggers:

CREATE TRIGGER author_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON author
FOR EACH ROW EXECUTE FUNCTION audit_log_trigger_function()
	
CREATE TRIGGER book_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON book
FOR EACH ROW EXECUTE FUNCTION audit_log_trigger_function()

So, every time we execute an INSERT, UPDATE, or DELETE statement on an author or book table row, the audit_log_trigger_function database function is going to be executed.

The audit_log_trigger_function is a custom database function that we created like this:

CREATE OR REPLACE FUNCTION audit_log_trigger_function()
RETURNS trigger AS $body$
BEGIN
   if (TG_OP = 'INSERT') then
       INSERT INTO audit_log (
           table_name,
           row_id,
           old_row_data,
           new_row_data,
           dml_type,
           dml_timestamp,
           dml_created_by,
           trx_timestamp
       )
       VALUES(
           TG_TABLE_NAME,
           NEW.id,
           null,
           to_jsonb(NEW),
           'INSERT',
           statement_timestamp(),
           current_setting('var.logged_user'),
           transaction_timestamp()
       );
            
       RETURN NEW;
   elsif (TG_OP = 'UPDATE') then
       INSERT INTO audit_log (
           table_name,
           row_id,
           old_row_data,
           new_row_data,
           dml_type,
           dml_timestamp,
           dml_created_by,
           trx_timestamp
       )
       VALUES(
           TG_TABLE_NAME,
           NEW.id,
           to_jsonb(OLD),
           to_jsonb(NEW),
           'UPDATE',
           statement_timestamp(),
           current_setting('var.logged_user'),
           transaction_timestamp()
       );
            
       RETURN NEW;
   elsif (TG_OP = 'DELETE') then
       INSERT INTO audit_log (
           table_name,
           row_id,
           old_row_data,
           new_row_data,
           dml_type,
           dml_timestamp,
           dml_created_by,
           trx_timestamp
       )
       VALUES(
           TG_TABLE_NAME,
           OLD.id,
           to_jsonb(OLD),
           null,
           'DELETE',
           statement_timestamp(),
           current_setting('var.logged_user'),
           transaction_timestamp()
       );
       
       RETURN OLD;
   END IF;
END;
$body$
LANGUAGE plpgsql

No matter what DML operation was executed, we are always inserting a record in the audit_log table. The difference is how we capture the old and the new record state because the newly inserted record does not have an old state and the deleted row has no new state.

The var.logged_user is a session variable we are going to set in our application with the name of the currently logged application user. This can be done like this:

private void setCurrentLoggedUser(EntityManager entityManager) {
    Session session = entityManager.unwrap(Session.class);

    Dialect dialect = session.getSessionFactory()
        .unwrap(SessionFactoryImplementor.class)
        .getJdbcServices()
        .getDialect();

    String loggedUser = ReflectionUtils.invokeMethod(
        dialect,
        "escapeLiteral",
        LoggedUser.get()
    );

    session.doWork(connection -> {
        update(
            connection,
            String.format(
                "SET LOCAL var.logged_user = '%s'", loggedUser
            )
        );
    });
}

That’s it!

Testing Time

To test how the audit logging works, we are going to use JPA and Hibernate, as this is the most common Java data access technology.

First, let’s create the Author entity that maps the author database table:

@Entity(name = "Author")
@Table(name = "author")
@DynamicUpdate
public class Author {

    @Id
    private Long id;

    @Column(name = "first_name")
    private String firstName;

    @Column(name = "last_name")
    private String lastName;

    private String country;

    @Column(name = "tax_treaty_claiming")
    private boolean taxTreatyClaiming;

    //Getters and setters omitted for brevity
}

By default, Hibernate uses a pre-defined UPDATE statement that includes all the columns, even the ones that haven’t been modified. However, this default UPDATE is not suitable when using YugabyteDB as it will affect write scalability.

Unlike PostgreSQL, which takes a row-level lock whenever a record is changed, YugabyteDB uses a column-level locking approach, and for this reason, it’s very important that the UPDATE statement includes only the columns that were modified by the current Persistence Context, as opposed to all columns.

By providing the @DynamicUpdate annotation, Hibernate will generate the UPDATE statement dynamically, including only the columns that are associated with the entity properties that have been changed.

For more details about this topic, check out this article.

And the book table is mapped as follows:

@Entity(name = "Book")
@Table(name = "book")
@DynamicUpdate
public class Book {

    @Id
    private Long id;

    private String title;

    @ManyToOne(fetch = FetchType.LAZY)
    private Author author;

    @Column(name = "price_in_cents")
    private int priceInCents;

    private String publisher;

    //Getters and setters omitted for brevity
}

With the entity mappings in place, we can start adding the following Author and Book entities:

AtomicInteger auditLogCount = new AtomicInteger();

setCurrentLoggedUser(entityManager);

Author author = new Author()
    .setId(1L)
    .setFirstName("Vlad")
    .setLastName("Mihalcea")
    .setCountry("România");

entityManager.persist(author);

entityManager.persist(
    new Book()
        .setId(1L)
        .setTitle("High-Performance Java Persistence 1st edition")
        .setPublisher("Amazon")
        .setPriceInCents(3990)
        .setAuthor(author)
);

And when checking the audit_log table, we are going to see that two records have just been added:

| table_name | row_id | old_row_data | new_row_data                                                                                                                       | dml_type | dml_timestamp              | dml_created_by | trx_timestamp              |
|------------|--------|--------------|------------------------------------------------------------------------------------------------------------------------------------|----------|----------------------------|----------------|----------------------------|
| author     | 1      |              | {"id": 1, "country": "România", "last_name": "Mihalcea", "first_name": "Vlad", "tax_treaty_claiming": false}                       | INSERT   | 2022-10-20 19:15:37.313888 | Vlad Mihalcea  | 2022-10-20 19:15:37.252935 |
| book       | 1      |              | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 3990} | INSERT   | 2022-10-20 19:15:37.326810 | Vlad Mihalcea  | 2022-10-20 19:15:37.252935 |

Notice that the old_row_data is NULL because these audit log records are associated with INSERT statements.

When updating the two entities:

entityManager.find(Author.class, 1L)
    .setTaxTreatyClaiming(true);

entityManager.find(Book.class, 1L)
    .setPriceInCents(4499);

We see that two new audit log records are going to be added:

| table_name | row_id | old_row_data                                                                                                                       | new_row_data                                                                                                                       | dml_type | dml_timestamp              | dml_created_by | trx_timestamp              |
|------------|--------|------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|----------|----------------------------|----------------|----------------------------|
| author     | 1      |                                                                                                                                    | {"id": 1, "country": "România", "last_name": "Mihalcea", "first_name": "Vlad", "tax_treaty_claiming": false}                       | INSERT   | 2022-10-20 19:15:37.313888 | Vlad Mihalcea  | 2022-10-20 19:15:37.252935 |
| book       | 1      |                                                                                                                                    | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 3990} | INSERT   | 2022-10-20 19:15:37.326810 | Vlad Mihalcea  | 2022-10-20 19:15:37.252935 |
| author     | 1      | {"id": 1, "country": "România", "last_name": "Mihalcea", "first_name": "Vlad", "tax_treaty_claiming": false}                       | {"id": 1, "country": "România", "last_name": "Mihalcea", "first_name": "Vlad", "tax_treaty_claiming": true}                        | UPDATE   | 2022-10-20 19:52:51.688286 | Vlad Mihalcea  | 2022-10-20 19:52:51.656483 |
| book       | 1      | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 3990} | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 4499} | UPDATE   | 2022-10-20 19:52:51.696792 | Vlad Mihalcea  | 2022-10-20 19:52:51.656483 |

Notice that for UPDATE statements, the old_row_data JSON column captures the previous table record state while the new_row_data JSON column stores the state after the UPDATE was executed. This makes it easy to see how data has changed over time.

And, if we remove the Book entity:

entityManager.remove(
    entityManager.getReference(Book.class, 1L)
);

We can see that a new record will be added to the audit_log table:

| table_name | row_id | old_row_data                                                                                                                       | new_row_data                                                                                                                       | dml_type | dml_timestamp              | dml_created_by | trx_timestamp              |
|------------|--------|------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|----------|----------------------------|----------------|----------------------------|
| author     | 1      |                                                                                                                                    | {"id": 1, "country": "România", "last_name": "Mihalcea", "first_name": "Vlad", "tax_treaty_claiming": false}                       | INSERT   | 2022-10-20 19:15:37.313888 | Vlad Mihalcea  | 2022-10-20 19:15:37.252935 |
| book       | 1      |                                                                                                                                    | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 3990} | INSERT   | 2022-10-20 19:15:37.326810 | Vlad Mihalcea  | 2022-10-20 19:15:37.252935 |
| author     | 1      | {"id": 1, "country": "România", "last_name": "Mihalcea", "first_name": "Vlad", "tax_treaty_claiming": false}                       | {"id": 1, "country": "România", "last_name": "Mihalcea", "first_name": "Vlad", "tax_treaty_claiming": true}                        | UPDATE   | 2022-10-20 19:52:51.688286 | Vlad Mihalcea  | 2022-10-20 19:52:51.656483 |
| book       | 1      | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 3990} | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 4499} | UPDATE   | 2022-10-20 19:52:51.696792 | Vlad Mihalcea  | 2022-10-20 19:52:51.656483 |
| book       | 1      | {"id": 1, "title": "High-Performance Java Persistence 1st edition", "author_id": 1, "publisher": "Amazon", "price_in_cents": 4499} |                                                                                                                                    | DELETE   | 2022-10-20 19:56:35.847542 | Vlad Mihalcea  | 2022-10-20 19:56:35.833682 |

For DELETE statements, the audit log record contains only the old_row_data column, which captures the record state prior to being deleted.

And, just because the table row data is stored in JSON columns, it doesn’t mean we cannot transform it back to its relational table structure.

For instance, when executing the following SQL query:

SELECT
    row_id AS id,
    cast(new_row_data ->> 'price_in_cents' AS int) AS price_in_cents,
    new_row_data ->> 'publisher' AS publisher,
    new_row_data ->> 'title' AS title,
    new_row_data ->> 'author_id' AS author_id,
    dml_timestamp as version_timestamp
FROM
    audit_log
WHERE
    table_name = 'book'
ORDER BY dml_timestamp

We will get all the book table changes:

| id | price_in_cents | publisher | title                                         | author_id | version_timestamp          |
|----|----------------|-----------|-----------------------------------------------|-----------|----------------------------|
| 1  | 3990           | Amazon    | High-Performance Java Persistence 1st edition | 1         | 2022-10-20 19:15:37.326810 |
| 1  | 4499           | Amazon    | High-Performance Java Persistence 1st edition | 1         | 2022-10-20 19:52:51.696792 |
| 1  |                |           |                                               |           | 2022-10-20 19:56:35.847542 |

Awesome, right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Seize the deal! 40% discount. Seize the deal! 40% discount.

Conclusion

As you can see, adding an audit log for YugabyteDB is just as easy as doing it in PostgreSQL. Not only that it supports JSONB columns, but database triggers and functions work exactly in the same way.

Using this audit log feature will allow you to track modifications across all database tables that need to be audited.

This research was funded by Yugabyte and conducted in accordance with the blog ethics policy.

While the article was written independently and reflects entirely my opinions and conclusions, the amount of work involved in making this article happen was compensated by Yugabyte.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.