JSON pattern matching with sed, perl and regular expressions
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Why VIM?
Sooner or later there comes the day when your easy-to-use IDE becomes useless for handling huge files. There aren’t many editors capable of working with very large files, like production logs for instance.
I’ve recently had to analyze a 100 MB one-line JSON file and once more VIM saved the day. VIM, like many other Unix utilities, is both tough and brilliant.
Git interactive rebase uses VIM by default, so it’s worth knowing VIM.
Let’s see how easily you can pretty print a JSON file with VIM. First, we will download a one-line JSON file from Reddit.
$ wget http://www.reddit.com/r/programming.json --2014-01-24 12:21:04-- http://www.reddit.com/r/programming.json Resolving www.reddit.com (www.reddit.com)... 77.232.217.122, 77.232.217.113 Connecting to www.reddit.com (www.reddit.com)|77.232.217.122|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 28733 (28K) [application/json] Saving to: `programming.json' 100%[======================================>] 28,733 --.-K/s in 0.03s 2014-01-24 12:21:04 (1021 KB/s) - `programming.json' saved [28733/28733]
This is how it looks like:
Pretty printing
Python comes along with most Unix distributions, so running the following VIM command manages to do the trick:
%!python -m json.tool
Let’s save the pretty printed JSON file and put other Unix tools to work.
:w programming_pretty.json
Matching time
Let’s say we want to extract all “domain” related values:
"domain": "mameworld.info"
Sed to the rescue
$ sed -nr 's/^.*"domain":\s*"(.*?)".*$/\1/p' <programming_pretty.json | sort -u blog.safaribooksonline.com chadfowler.com cyrille.rossant.net dot.kde.org evanmiller.org fabiensanglard.net galileo.phys.virginia.edu github.com halffull.org ibuildings.nl jaxenter.com jobtipsforgeeks.com kilncode.com libtins.github.io mameworld.info miguelcamba.com minuum.com notes.tweakblogs.net perfect-pentago.net periscope.io reuters.com tech.blog.box.com tmm1.net vocalbit.com youtube.com
Multi-line matching
Sed is line oriented, and while it offers multi-line support, it’s no match for Perl. Let’s say I want to match all authors in the following JSON pattern:
"data": { "author": "justrelaxnow", }
This is how I do it:
$ perl -0777 -n -e 'print "$2\n" while (m/("data":\s*\{.*?"author":\s*"(.*?)"[,|\s*\}].*?\},)/sgmp)' programming_pretty.json | sort -u AmericanXer0 azth bionicseraph bit_shiftr charles_the_hard Gexos jakubgarfield johnwaterwood joukoo justrelaxnow Kingvash krets mariuz mopatches nyphrex pseudomind rluecke3 sltkr solidus-flux steveklabnik1 sumstozero swizec vocalbit Wolfspaw
Conclusion
Unix tools are old school, some of those being written forty years ago. The learning curve might be steep, but learning them is a great investment. A great software library stands the test of time and Unix tools are a good reminder that tough jobs call for tough tools.
