Making Thread Dumps Intelligent

Making Thread Dumps Intelligent

Long back I had learnt about something called Log MDC, and I was a big fan of it. I was suddenly able to make sense of anything that happens in log files and pin-point to a specific log entry and find what’s right or wrong with it especially when it was about debugging a bug in production.

In 2013 I was commissioned to work on a project that was running through some troubled waters (combination of several things) and almost every week I had to go through several Java Thread Dumps trying to make sense what’s happening in the application to make it stop. Also, there were times when I had to have profilers like AppDynamic, jProfiler, jConsole all hooked up to the application trying to find what’s the issue, and more importantly what’s triggering the issue. jStack was one of the most helpful tools that I had worked with but the thread dumps being bumps had no contextual information that I could work with. I was stuck with seeing 10(s) of dumps with stack traces of what classes are causing the block but there was no information of what’s call and what inputs were causing the issues and it got frustrating very fast. Eventually we found the issues but they were mostly after several rounds of deep debugging the code with variety of data sets.

Once I was done with that project I swore that I will never find myself in that situation again. I Explored ways in which I can use something similar to Log4j’s NDC but have that in threads so that my dumps mean something. And i was able to find that I can change the ThreadName. And my next project I did use that very effectively. I recently came across an article that explains that concept very well. I am not going to rewrite everything they said, so here is a link to their blog post.

So last week I am starting a new project and as I get into coding the framework (using Spring 4.1 and Spring Boot), this is the first class I am writing for the application and ensuring that the filter gets into the code ASAP which not only helps us in post-production but also makes my development logs meaningful.

A copy of the code for both Log4j NDC, and setting up a ThreadName is below.

2 Comments on "Making Thread Dumps Intelligent"

    Hi Kapil,

    Great article, We did something same in ENOW using MDC where-in varnish set-up a unique identifier in header and same will logged by all downstream servers like webserver, AEM all. This helped us trace an error across all tech stacks.
    With-in the same system sometimes I have felt that it is as important to have meaningful error message to help debug faster- for example if code was expecting something and we don’t write that in log but throw a general exception, it could make debugging a little tough since in that case you will need to look at the code. In one of my project, even though I had request ID and tenant along with request URI, but since it was a author mistake (referring to a node which no more exist), with only those information I could not figure out the error unless I looked at the code. Developers should be guided on what is important to write.

    Hi Vineet, I know the feat you have achieved in ENOW. I spoke to Sruthi last week and he told me so. I am so happy that you were able to make it work (I recall that’s something we wanted to do since 2013 when ENOW all started).

    But, what I am saying here is something even better. This is something you can actually skip MDC and just rely on thread names to be logged + thread dumps will be so much more informative.

    And yes, you are right, end of the day logging is an art and there is no recipe or standard that we can tell people. When done incorrectly, none of this means nothing in production errors. Having said that, a clever LOG.MDC or a Thread Name can avoid a lot of heart burn.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: