How to index JSON columns using MySQL

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, I’m going to explain how we can index JSON columns when using MySQL.

While other relational database systems provide GIN (Generalized Inverted Index) indexes, MySQL allows you to index a virtual column that mirrors the JSON path expression you are interested in indexing.

Database table

Let’s assume we have the following database book table:

| id | isbn           | properties                                                     |
|----|----------------|----------------------------------------------------------------|
| 1  | 978-9730228236 |{                                                               |
|    |                |   "price":44.99,                                               |
|    |                |   "title":"High-Performance Java Persistence",                 |
|    |                |   "author":"Vlad Mihalcea",                                    |
|    |                |   "reviews":[                                                  |
|    |                |   {                                                            |
|    |                |      "date":"2017-11-14",                                      |
|    |                |      "rating":5,                                               |
|    |                |      "review":"Excellent book to understand Java Persistence", |
|    |                |      "reviewer":"Cristiano"                                    |
|    |                |   },                                                           |
|    |                |   {                                                            |
|    |                |      "date":"2019-01-27",                                      |
|    |                |      "rating":5,                                               |
|    |                |      "review":"The best JPA ORM book out there",               |
|    |                |      "reviewer":"T.W"                                          |
|    |                |   },                                                           |
|    |                |   {                                                            |
|    |                |      "date":"2016-12-24",                                      |
|    |                |      "rating":4,                                               |
|    |                |      "review":"The most informative book",                     |
|    |                |      "reviewer":"Shaikh"                                       |
|    |                |   }                                                            |
|    |                |   ],                                                           |
|    |                |   "publisher":"Amazon"                                         |
|    |                |}                                                               |
|----|----------------|----------------------------------------------------------------|

The properties column type is json, so we can store JSON objects as book properties.

Querying MySQL JSON columns without an index

If we try to filter one record by its associated title attribute located inside the properties JSON object:

SELECT
  isbn
FROM book
WHERE
  properties ->> "$.title" = 'High-Performance Java Persistence'

A full-table scan will be used to filter all records found in the book table:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "50.25"
    },
    "table": {
      "table_name": "book",
      "access_type": "ALL",
      "rows_examined_per_scan": 500,
      "rows_produced_per_join": 500,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "0.25",
        "eval_cost": "50.00",
        "prefix_cost": "50.25",
        "data_read_per_join": "140K"
      },
      "used_columns": [
        "isbn",
        "properties"
      ],
      "attached_condition": "(
        json_unquote(
          json_extract(
            `high_performance_sql`.`book`.`properties`,'$.title'
          )
        ) = 'High-Performance Java Persistence'
      )"
    }
  }
}

The ALL access type means that all pages have been scanned, which is confirmed by the rows_examined_per_scan attribute as well since we only have 500 records in the book table.

Add a MySQL virtual column to index a JSON path expression

In MySQL, the only way to index a JSON path expression is to add a virtual column that mirrors the path expression in question and build an index on the virtual column.

For this reason, we are going to add the following title virtual column on our book table:

ALTER TABLE 
  book 
ADD COLUMN 
  title VARCHAR(50) 
GENERATED ALWAYS AS (
  properties ->> \"$.title\"
)

As you can see, the title column is mapped to the $.title path expression on the properties JSON column.

Next, we are going to add an index on the title column, like this:

CREATE INDEX book_title_idx ON book (title)

And, when re-running the previous SQL query, we now get the following execution plan:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "0.35"
    },
    "table": {
      "table_name": "book",
      "access_type": "ref",
      "possible_keys": [
        "book_title_idx"
      ],
      "key": "book_title_idx",
      "used_key_parts": [
        "title"
      ],
      "key_length": "203",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "0.25",
        "eval_cost": "0.10",
        "prefix_cost": "0.35",
        "data_read_per_join": "288"
      },
      "used_columns": [
        "isbn",
        "properties",
        "title"
      ]
    }
  }
}

Now that the key attribute references the book_title_idx index we’ve just created, we can see that the SQL query is using the index to filter the book records, which is also confirmed by the rows_examined_per_scan attribute since a single index row was scanned.

Updating the indexed JSON attribute

If you update the indexed JSON attribute:

UPDATE
  book
SET
  properties = JSON_SET(
    properties,
    '$.title',
    'High-Performance Java Persistence, 2nd edition'
  )
WHERE
  isbn = '978-9730228236'

You’ll see that the title virtual column is updated accordingly:

| id | isbn           | properties                                                     | title                                          |
|----|----------------|----------------------------------------------------------------|------------------------------------------------|
| 1  | 978-9730228236 |{                                                               | High-Performance Java Persistence, 2nd edition |
|    |                |   "price":44.99,                                               |                                                |
|    |                |   "title":"High-Performance Java Persistence, 2nd edition",    |                                                |
|    |                |   "author":"Vlad Mihalcea",                                    |                                                |
|    |                |   "reviews":[                                                  |                                                |
|    |                |   {                                                            |                                                |
|    |                |      "date":"2017-11-14",                                      |                                                |
|    |                |      "rating":5,                                               |                                                |
|    |                |      "review":"Excellent book to understand Java Persistence", |                                                |
|    |                |      "reviewer":"Cristiano"                                    |                                                |
|    |                |   },                                                           |                                                |
|    |                |   {                                                            |                                                |
|    |                |      "date":"2019-01-27",                                      |                                                |
|    |                |      "rating":5,                                               |                                                |
|    |                |      "review":"The best JPA ORM book out there",               |                                                |
|    |                |      "reviewer":"T.W"                                          |                                                |
|    |                |   },                                                           |                                                |
|    |                |   {                                                            |                                                |
|    |                |      "date":"2016-12-24",                                      |                                                |
|    |                |      "rating":4,                                               |                                                |
|    |                |      "review":"The most informative book",                     |                                                |
|    |                |      "reviewer":"Shaikh"                                       |                                                |
|    |                |   }                                                            |                                                |
|    |                |   ],                                                           |                                                |
|    |                |   "publisher":"Amazon"                                         |                                                |
|    |                |}                                                               |                                                |
|----|----------------|----------------------------------------------------------------|------------------------------------------------|

However, while you can change the indexed JSON attribute, you are not allowed to change the virtual column directly. If you try to do that, MySQL will throw an exception.

I'm running an online workshop on the 11th of October about High-Performance SQL.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

JSON column types can be very useful, especially if you’re implementing an audit log.

While adding a GIN index on the JSON columns is not allowed in MySQL, you can define a virtual column and add an index to it.

However, you need to make sure that the JSON attribute you’re using for indexing is highly selective, as otherwise, the index might not be used by the database engine when generating the SQL execution plan.

Transactions and Concurrency Control eBook

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.