How to index JSON columns using MySQL
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, I’m going to explain how we can index JSON columns when using MySQL.
While other relational database systems provide GIN (Generalized Inverted Index) indexes, MySQL allows you to index a virtual column that mirrors the JSON path expression you are interested in indexing.
Database table
Let’s assume we have the following database book
table:
| id | isbn | properties | |----|----------------|----------------------------------------------------------------| | 1 | 978-9730228236 |{ | | | | "price":44.99, | | | | "title":"High-Performance Java Persistence", | | | | "author":"Vlad Mihalcea", | | | | "reviews":[ | | | | { | | | | "date":"2017-11-14", | | | | "rating":5, | | | | "review":"Excellent book to understand Java Persistence", | | | | "reviewer":"Cristiano" | | | | }, | | | | { | | | | "date":"2019-01-27", | | | | "rating":5, | | | | "review":"The best JPA ORM book out there", | | | | "reviewer":"T.W" | | | | }, | | | | { | | | | "date":"2016-12-24", | | | | "rating":4, | | | | "review":"The most informative book", | | | | "reviewer":"Shaikh" | | | | } | | | | ], | | | | "publisher":"Amazon" | | | |} | |----|----------------|----------------------------------------------------------------|
The properties
column type is json
, so we can store JSON objects as book properties.
Querying MySQL JSON columns without an index
If we try to filter one record by its associated title
attribute located inside the properties
JSON object:
SELECT isbn FROM book WHERE properties ->> "$.title" = 'High-Performance Java Persistence'
A full-table scan will be used to filter all records found in the book
table:
{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "50.25" }, "table": { "table_name": "book", "access_type": "ALL", "rows_examined_per_scan": 500, "rows_produced_per_join": 500, "filtered": "100.00", "cost_info": { "read_cost": "0.25", "eval_cost": "50.00", "prefix_cost": "50.25", "data_read_per_join": "140K" }, "used_columns": [ "isbn", "properties" ], "attached_condition": "( json_unquote( json_extract( `high_performance_sql`.`book`.`properties`,'$.title' ) ) = 'High-Performance Java Persistence' )" } } }
The ALL
access type means that all pages have been scanned, which is confirmed by the rows_examined_per_scan
attribute as well since we only have 500 records in the book
table.
Add a MySQL virtual column to index a JSON path expression
In MySQL, the only way to index a JSON path expression is to add a virtual column that mirrors the path expression in question and build an index on the virtual column.
For this reason, we are going to add the following title
virtual column on our book
table:
ALTER TABLE book ADD COLUMN title VARCHAR(50) GENERATED ALWAYS AS ( properties ->> \"$.title\" )
As you can see, the title
column is mapped to the $.title
path expression on the properties
JSON column.
Next, we are going to add an index on the title
column, like this:
CREATE INDEX book_title_idx ON book (title)
And, when re-running the previous SQL query, we now get the following execution plan:
{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "0.35" }, "table": { "table_name": "book", "access_type": "ref", "possible_keys": [ "book_title_idx" ], "key": "book_title_idx", "used_key_parts": [ "title" ], "key_length": "203", "ref": [ "const" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 1, "filtered": "100.00", "cost_info": { "read_cost": "0.25", "eval_cost": "0.10", "prefix_cost": "0.35", "data_read_per_join": "288" }, "used_columns": [ "isbn", "properties", "title" ] } } }
Now that the key
attribute references the book_title_idx
index we’ve just created, we can see that the SQL query is using the index to filter the book
records, which is also confirmed by the rows_examined_per_scan
attribute since a single index row was scanned.
Updating the indexed JSON attribute
If you update the indexed JSON attribute:
UPDATE book SET properties = JSON_SET( properties, '$.title', 'High-Performance Java Persistence, 2nd edition' ) WHERE isbn = '978-9730228236'
You’ll see that the title
virtual column is updated accordingly:
| id | isbn | properties | title | |----|----------------|----------------------------------------------------------------|------------------------------------------------| | 1 | 978-9730228236 |{ | High-Performance Java Persistence, 2nd edition | | | | "price":44.99, | | | | | "title":"High-Performance Java Persistence, 2nd edition", | | | | | "author":"Vlad Mihalcea", | | | | | "reviews":[ | | | | | { | | | | | "date":"2017-11-14", | | | | | "rating":5, | | | | | "review":"Excellent book to understand Java Persistence", | | | | | "reviewer":"Cristiano" | | | | | }, | | | | | { | | | | | "date":"2019-01-27", | | | | | "rating":5, | | | | | "review":"The best JPA ORM book out there", | | | | | "reviewer":"T.W" | | | | | }, | | | | | { | | | | | "date":"2016-12-24", | | | | | "rating":4, | | | | | "review":"The most informative book", | | | | | "reviewer":"Shaikh" | | | | | } | | | | | ], | | | | | "publisher":"Amazon" | | | | |} | | |----|----------------|----------------------------------------------------------------|------------------------------------------------|
However, while you can change the indexed JSON attribute, you are not allowed to change the virtual column directly. If you try to do that, MySQL will throw an exception.
I'm running an online workshop on the 11th of October about High-Performance SQL.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
JSON column types can be very useful, especially if you’re implementing an audit log.
While adding a GIN index on the JSON columns is not allowed in MySQL, you can define a virtual column and add an index to it.
However, you need to make sure that the JSON attribute you’re using for indexing is highly selective, as otherwise, the index might not be used by the database engine when generating the SQL execution plan.
