How to index JSON columns using MySQL
Are you struggling with performance issues in your Spring, Jakarta EE, or Java EE application?
What if there were a tool that could automatically detect what caused performance issues in your JPA and Hibernate data access layer?
Wouldn’t it be awesome to have such a tool to watch your application and prevent performance issues during development, long before they affect production systems?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, Micronaut, or Play Framework.
So, rather than fixing performance issues in your production system on a Saturday night, you are better off using Hypersistence Optimizer to help you prevent those issues so that you can spend your time on the things that you love!
Introduction
In this article, I’m going to explain how we can index JSON columns when using MySQL.
While other relational database systems provide GIN (Generalized Inverted Index) indexes, MySQL allows you to index a virtual column that mirrors the JSON path expression you are interested in indexing.
Database table
Let’s assume we have the following database book table:
| id | isbn | properties |
|----|----------------|----------------------------------------------------------------|
| 1 | 978-9730228236 |{ |
| | | "price":44.99, |
| | | "title":"High-Performance Java Persistence", |
| | | "author":"Vlad Mihalcea", |
| | | "reviews":[ |
| | | { |
| | | "date":"2017-11-14", |
| | | "rating":5, |
| | | "review":"Excellent book to understand Java Persistence", |
| | | "reviewer":"Cristiano" |
| | | }, |
| | | { |
| | | "date":"2019-01-27", |
| | | "rating":5, |
| | | "review":"The best JPA ORM book out there", |
| | | "reviewer":"T.W" |
| | | }, |
| | | { |
| | | "date":"2016-12-24", |
| | | "rating":4, |
| | | "review":"The most informative book", |
| | | "reviewer":"Shaikh" |
| | | } |
| | | ], |
| | | "publisher":"Amazon" |
| | |} |
|----|----------------|----------------------------------------------------------------|
The properties column type is json, so we can store JSON objects as book properties.
Querying MySQL JSON columns without an index
If we try to filter one record by its associated title attribute located inside the properties JSON object:
SELECT isbn FROM book WHERE properties ->> "$.title" = 'High-Performance Java Persistence'
A full-table scan will be used to filter all records found in the book table:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "50.25"
},
"table": {
"table_name": "book",
"access_type": "ALL",
"rows_examined_per_scan": 500,
"rows_produced_per_join": 500,
"filtered": "100.00",
"cost_info": {
"read_cost": "0.25",
"eval_cost": "50.00",
"prefix_cost": "50.25",
"data_read_per_join": "140K"
},
"used_columns":[
"isbn",
"properties"
],
"attached_condition": "(
json_unquote(
json_extract(
high_performance_sql.book.properties,
'$.title'
)
) = 'High-Performance Java Persistence'
)"
}
}
}
The ALL access type means that all pages have been scanned, which is confirmed by the rows_examined_per_scan attribute as well since we only have 500 records in the book table.
Add a MySQL virtual column to index a JSON path expression
Prior to version 8.0.21, the only way to index a JSON path expression was to add a virtual column that mirrors the path expression in question and builds an index on the virtual column.
For this reason, we are going to add the following title virtual column on our book table:
ALTER TABLE book ADD COLUMN title VARCHAR(50) GENERATED ALWAYS AS ( properties ->> \"$.title\" )
As you can see, the title column is mapped to the $.title path expression on the properties JSON column.
Next, we are going to add an index on the title column, like this:
CREATE INDEX book_title_idx ON book (title)
And, when re-running the previous SQL query, we now get the following execution plan:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "0.35"
},
"table": {
"table_name": "book",
"access_type": "ref",
"possible_keys": [
"book_title_idx"
],
"key": "book_title_idx",
"used_key_parts": [
"title"
],
"key_length": "203",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "0.25",
"eval_cost": "0.10",
"prefix_cost": "0.35",
"data_read_per_join": "288"
},
"used_columns": [
"isbn",
"properties",
"title"
]
}
}
}
Now that the key attribute references the book_title_idx index we’ve just created, we can see that the SQL query is using the index to filter the book records, which is also confirmed by the rows_examined_per_scan attribute since a single index row was scanned.
Updating the indexed JSON attribute
If you update the indexed JSON attribute:
UPDATE
book
SET
properties = JSON_SET(
properties,
'$.title',
'High-Performance Java Persistence, 2nd edition'
)
WHERE
isbn = '978-9730228236'
You’ll see that the title virtual column is updated accordingly:
| id | isbn | properties | title |
|----|----------------|----------------------------------------------------------------|------------------------------------------------|
| 1 | 978-9730228236 |{ | High-Performance Java Persistence, 2nd edition |
| | | "price":44.99, | |
| | | "title":"High-Performance Java Persistence, 2nd edition", | |
| | | "author":"Vlad Mihalcea", | |
| | | "reviews":[ | |
| | | { | |
| | | "date":"2017-11-14", | |
| | | "rating":5, | |
| | | "review":"Excellent book to understand Java Persistence", | |
| | | "reviewer":"Cristiano" | |
| | | }, | |
| | | { | |
| | | "date":"2019-01-27", | |
| | | "rating":5, | |
| | | "review":"The best JPA ORM book out there", | |
| | | "reviewer":"T.W" | |
| | | }, | |
| | | { | |
| | | "date":"2016-12-24", | |
| | | "rating":4, | |
| | | "review":"The most informative book", | |
| | | "reviewer":"Shaikh" | |
| | | } | |
| | | ], | |
| | | "publisher":"Amazon" | |
| | |} | |
|----|----------------|----------------------------------------------------------------|------------------------------------------------|
However, while you can change the indexed JSON attribute, you are not allowed to change the virtual column directly. If you try to do that, MySQL will throw an exception.
Starting with version 8.0.21, you can now use the
JSON_VALUEto index a given path expression.Check out the MySQL documentation for more details about how you can use this feature.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
JSON column types can be very useful, especially if you’re implementing an audit log.
While adding a GIN index on the JSON columns is not allowed in MySQL, you can define a virtual column and add an index to it.
However, you need to make sure that the JSON attribute you’re using for indexing is highly selective, as otherwise, the index might not be used by the database engine when generating the SQL execution plan.






