Lightrun – the best way to debug production problems
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
In this article, I’m going to present you Lightrun, a very useful tool that I discovered recently while developing RevoGain, which helps me debug problems happening in production.
Lightrun is like no other tool I’ve used before since it allows us to insert log entries dynamically at runtime, capture snapshots, and even inject metrics without changing your production code.
This is especially useful when investigating issues reported by clients since we can figure out the problem while the user is executing the actions that can replicate the issue. Cool, right?
Getting Started with Lightun
Setting up Lightrun is very easy and takes you less than 5 minutes to configure it:
- Step 1: Install the Lightrun IntelliJ IDEA plugin, which works with both the Ultimate and the Community editions,
- Step 2: Create an account on the Lightrun App platform.
- Step 3: Install the Lightrun JVM agent that will be used to introspect our application. On the Lightrun App platform, you can find the instructions on how you can set up the agent depending on your development and production system requirements,
- Step 4: Configure your application to use the Lightrun JVM agent.
In my case, since RevoGain is a Spring Boot application, I can provide the agent on my local Windows environment, like this:
java -agentpath:%USER_HOME%/agent/lightrun_agent.dll ^ -jar revogain-%REVOGAIN_VERSION%.jar
And for the production system, I can use this Linux-based command:
java -agentpath:~/agent/lightrun_agent.so -jar revogain-$REVOGAIN_VERSION$.jar
Lightrun dynamic logging
A very common issue with parsing trading statements is when the trading balance doesn’t add up. This can happen with operations that are not yet supported or because either the statement file or the parsing logic is broken.
Debugging such issues requires having the trading statement, and unfortunately, not all clients are willing to provide it for us to debug it locally. So, in these particular cases, adding a dynamic log entry is going to help us spot the problem while the user is parsing their statements.
So, let’s add a dynamic log entry that displays the calculated trading balance for a specific user:
The Format
text field defines the message that’s going to be logged. The {calculatedBalance}
placeholder is going to be replaced with the value of the calculatedBalance
local variable when executing the method in question.
The Condition
text field allows us to define filtering criteria so that the message is logged only if the provided condition evaluates to true
. In our case, we want to display this message only for the user with the identifier value of 1
, as illustrated in the advanced log popup screenshot.
So, this log message is only going to be printed for the user with the id
value of 1
, while for other users, it will be ignored.
The Lightrun log messages are printed in the application log, but we can also pipeline them to our IDE.
Next, we can ask the user to import a new trading statement, and the calculatedBalance
log entries are going to be printed in the Lightrun Console, as follows:
Brilliant!
Check out how the balance is being calculated based on the trading operation we are parsing from the statement. If the calculated balance doesn’t match the balance values provided by the statement, we can pinpoint to the client what causes the issue so that they can inspect it as well.
Without Lightrun, we can’t just debug the production system since the entire server will halt, therefore affecting availability.
And that’s not all. Lightrun allows us to capture dynamic snapshots, as we will see in the next section.
Lightrun runtime snapshots
Another cool feature offered by Lighrun is the ability to capture runtime snapshots that contain both the stack trace and the variables available when the snapshot was taken.
Since RevoGain users are restricted to the countries where FastSpring, the external payment processor, is currently operating, we want to investigate the cases when the user country cannot be resolved, and, for this reason, we are going to use the following Lightrun snapshot.
The Condition
text field is used to activate the snapshot only when the country
local variable is null
, meaning the location cannot be fetched.
When trying to access the application from an IP address the GeoLocationService
cannot process, we can see how Lightrun manages to capture the in-memory context at the time when the snapshot was created:
Notice the geoLocationDTO
object that was captured at the moment when the country
object couldn’t be resolved.
This is a very valuable feature since it can allow us to aggregate multiple information at once, rather than having to do so using individual logs.
Lightrun dynamic metrics
And we can also add metrics dynamically without changing the source code we are monitoring. For instance, I employ this feature to figure out how long it takes to validate email addresses using the Stop Forum Spam API.
The reason I’m validating email addresses is that there are a plethora of bots running over the Internet trying to infest our applications with useless accounts that consume space in the database.
Adding a duration metric using Lightrun is very easy and, just like was the case with the dynamic logs and the runtime snapshots, we can do it directly from IntelliJ IDEA:
Now, every time a user registers, the isSpam
method invocation is going to be intercepted and monitored by Lightrun, and we are going to get the call durations printed in the Lightrun console:
Awesome, right?
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
Lightrun is easy to use but very powerful as we can inject logs, collect snapshots, or instrument our code using metrics without even changing the production source code that would require a redeployment. And that’s big since I’m offering a live chat to my clients, and I can debug their production problems during our live conversation. This helps me provide exceptional support to my clients that I couldn’t provide without a tool like Lightrun.
For this article, I used the Lightrun Free Tier, which is limited to 3 agents. However, since RevoGain is a majestic monolith, this is not an issue for me.
If you are using a microservice architecture and you wish to deploy more than 3 agents, then you will have to use the Professional edition instead.
This research was funded by Lightrun and conducted in accordance with the blog ethics policy.
While the article was written independently and reflects entirely my opinions and conclusions, the amount of work involved in making this article happen was compensated by Lightrun.
