21st century logging

I think logging should get more attention than we are currently giving it. When designing an application a great deal of effort goes into modelling the customer business logic, making sure all use cases are covered and handled properly. The business model is mapped to a persistence storage (be at a RDBMS or a NoSQL solution), frameworks are being chosen: web, middle-ware, batch jobs, and probably SLF4J with log4j or logback.

This has been the case of almost all applications I’ve been involved with, and logging was always a second class citizen, relying on good old string logging frameworks for that.

But recently I come to realize there is much more to logging than the current string based logging systems. Especially if  my system gets deployed in the cloud and takes advantage of auto-scaling, then gathering text files and aggregating them to a common place smells like hacking.

In my latest application we implemented a notification mechanism that holds more complex information since the String based log wasn’t sufficient. I have to thank one of my colleagues I work with who opened my eyes when saying “Notifications are at the heart of our application”. I haven’t ever thought of logging as the heart of any application. Business Logic is the heart of the application, not logging. But there is a lot of truth in his words, since you can’t deploy something without a good mechanism of knowing if your system is actually doing what it was meant for.

So my notifications are complex objects (debug ones having less data than error ones) and a NoSQL document database is a perfect store for our logs. A notification contains all sorts of data:
– the current executing job,
– the source of data,
– the component where the log originated,
– exceptions being thrown,
– input arguments,
– the message history of the Spring Integration Message carrying our request.

Therefore, since I am able to store complex objects in a schema-less fashion, I am also able to query logs, and it doesn’t matter the order they arrive since I can order them by source and creation time. I can have a scheduled job generating alerts and reports when too many error entries are detected.

This is a custom-built logging implementation as we haven’t been using a dedicated framework for our notifications, but I get more value out of it than from the classic String based log files.

I still think log4j and logback are very good implementations and we haven’t replaced them, we’ve only added an extra logging feature to overcome their limitations, but even with the new logback appenders, I still think the current String based logs are way too simple for production systems requirements. And if you use them more for debugging purposes, while having additional monitoring solutions for production environments, then maybe it’s time to use a smart logging solution that works for both development and production environments too.

If that was difficult to implement 10 years ago, when RDBMS ruled the storage world, and file based logging was a good trade-off, I think we have means for implementing better logging frameworks now. The current “string-based file logging” model might have been sufficient especially when our server was scaling vertically on a single machine, but in a world of many horizontally distributed servers, this model requires extra processing.

Big players are already employing such new generation logging systems Facebook Scribe, and LinkedIn Kafka log processing.

I really liked the LinkedIn solution, and it inspires me to reason about a new logging system working in a CQRS fashion, where log entries are events stored into a log database, and each event passes through a chain of handlers updating the current system state. This combines both logging and monitoring, and the monitoring commands go directly to a cached latest system state representation, which holds:

– alerts,
– status reports
– monitoring views of the current system status

How does it sound to you, is it worth implementing such solution, should we start a new open source new generation logging project?

If you have enjoyed reading my article and you’re looking forward to getting instant email notifications of my latest posts, you just need to follow my blog.

About these ads

5 thoughts on “21st century logging

  1. 1. distributed log contains many useful information like monitor/user action/ads, should be online and offline analysed and processed.
    2. traditional tools, like rdbms, msg queue, event no sql database, can not fufill such big throughput and low latency.
    3. kafka and scribe are proper tools. Making log more complicate than rdbms is useless. CQRS is not the proper theory.

    • Writing string logs to a file, then parse it and extracting the monitoring info sounds like CQRS too me. Kafka is great but doesn’t cover the storage part. Mongo can scale, e.g. Foursquare uses only Mongo and they store millions of objects a day. For a production systems you need smart logging, Scribe only resolves the scaling part, but it’s still text based. For me text based logging is like saving your business data in a file rather than a Db.

  2. Hi-

    I’ve written this system for my current company. We upload gigabytes of text&binary medical documents every day, and need to track all the activity: files, bytes, times, errors. We can’t forget files! We wrote a logging toolkit (java) where every record is a Json blob of ‘metrics’. Uploading a file might save a blob with file name, current server, java VM size stats, size of file, time to upload, current thread, user who uploaded, etc. This is all in one line of json text. All operations save some common fields and then unique files for that operation.

    We then query these with Hive, a Hadoop project database that does SQL-style queries over map-reduce. We can have a hundred machines storing logs on a distributed file system and do a query across it all. Also, we store up-to-the-minute logs where Hive can directly see them. So, we can do a query that sees the latest data.

    The point of a blob of fields is that a database query runs on a set. A DB query cannot rely on the fact that the records came in a sequence. Each record has to stand on its own with complete information.

    There are more details but I hope this gives a good picture of a database-oriented logging system.

    Lance Norskog
    lance.norskog@gmail.com

  3. Hi,

    We can’t ignore text files, after all theer are linux logs, database logs, and so on. Even JMX resources can be transformed to JSON using JMXTrans and then centralize everything in Hadoop. But setting Hadoop for a small company running a medium-size application is going to be costly in terms of infrastructure and maintenance costs. Probably moving to Cloud can save some money, but I think Hadoop it’s the right tool for this job.

    Vlad

  4. Couldn’t agree more with the concept of logging centrally. In my opinion this is absolutely imperative for “stateless” applications where each request in a user “session” can potentially go to a different node in the application cluster.
    We have been logging into MongoDB for some time now. We keep the logging paradigm unchanged, use slf4j backed by logback and use an appender that logs async to MongoDB. We also enforce some developer discipline by asking logging to be done not as a concatenated string but using {} where the value is replaced. That way the JSON has name=value pairs and querying becomes very easy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s