Valuable Tools For Diagnostics Gathering and Troubleshooting

Introduction

The Oracle A-Team is often asked to help customers identify a myriad of JVM and SOA application issues.  Without fail, the customer will be asked for data regarding their application.  This is not application data, but rather data about the running application from the JVM’s perspective. The data we ask for normally includes Java thread dumps, garbage collection logs, Java heap dumps, and also Java Flight recordings; one of the “new” favorites.

Once this information is available for analysis there is a good chance that one of our team members will be able to hone in on the underlying issues.  However, until such information is provided the team is pretty much flying blind.

This article will provide a brief description of this diagnostic data and related files, how to collect them, and the tools available to work with them. Our team finds these diagnostic files and tools to be invaluable.  You should consider adding them to your toolbox to help your troubleshooting and performance tuning efforts.

Thread Dumps

A thread dump is a snapshot of the state of all the threads in the JVM process.  Each thread, in the JVM, will provide a stack trace that will show its execution state.  A thread dump reveals information about the Java application’s activity at the time the thread dump was taken as well as all of the other activity that might be occurring in the JVM.  Since the Oracle SOA stack runs on WebLogic server, the stack trace will provide a snapshot of what the Weblogic server is doing, which includes things like handling incoming HTTP requests, authenticating a database adapter call, or dispatching a BPEL composite to perform some work, etc.

It’s important to understand that a single thread dump only provides a small snapshot or view of the activities within the JVM process.  In order to gain a better understanding of what is happening it is necessary to take multiple thread dumps over a period of time.  It varies according to the type of issue being analyzed, but a typical recommendation is to take 3 – 5 thread dumps at an interval of 10 – 30 seconds between each thread dump.  Being able to review the thread snapshots over these time intervals will provide a bigger window into the behavior of the application.

How to Collect  Java Thread Dumps

There are a number of ways to collect thread dumps from the JVM:

  • The most common method is to use the kill -3 <pid> command (on Unix) which sends an interrupt to the JVM causing it to dump threads. In scenarios where the application is executing in a clustered environment it is wise to create scripts that can be executed on each physical machine.

 

  • From the WebLogic server console, select the managed server in question, click on the “Monitor” tab then click on the “Threads” tab then click the “Dump Thread Stacks” button.

 

  • When using WebLogic Server a thread dump can be generated by using WLST (How to take Thread Dumps with WLST Doc Id 1274713.1) or by accessing the administration server for the domain, selecting the server, and then requesting a thread dump.

 

  • If using Windows, you can use the jstack command:

http://docs.oracle.com/javase/6/docs/technotes/tools/share/jstack.html

 

 

ThreadLogic

The ThreadLogic utility is a free tool you can download to assist in analyzing thread dumps taken from a JVM, (For the purposes of this discussion we will assume that WLS is running on the JVM).  Threadlogic can digest the log file containing the thread dump output.  This utility does a fair amount of the initial analysis for you like finding locks, the holder of locks, and fatal locking conditions.  If you’ve ever read a raw thread dump from a log file then you know it can be daunting – especially if you don’t know exactly what you are looking for.  Threadlogic helps by recognizing the type of each thread and categorizing them in the tool to help you understand which threads are JVM threads, WLS threads, and then “application” threads. In this context “application” means SOA, ADF, Service Bus, Coherence, etc.  In addition, Threadlogic can process a series of thread dumps and perform a “diff” operation between them. This is helpful in determining what threads are doing over a period of time.

The utility will not give one the ultimate answer, but it can reduce the overall effort it takes to review the thread dumps.

You can find the Threadlogic tool and download it at this location.

The following figures provide a sample of the ThreadLogic utility.  Figure 1 is the summary view for a selected thread dump.  Figure 2 demonstrates the view once a specific execute thread is selected.  Within the detail pane the advisories, as determined by ThreadLogic, and the stack trace of the selected execute thread are provided.

Figure 1 ThreadLogic Summary Page

Figure 1 ThreadLogic Summary Page

ThreadLogic Thread Detail

Figure 2 ThreadLogic Thread Detail

 

Verbose Garbage Collection

Being able to review the garbage collection (GC) behavior of the JVM is critical to understanding whether or not GC issues are causing slowdowns, high CPU, or hangs when the JVM is not responding. By using the GC logs you can get detailed information on GC performance in order to determine whether the garbage collection algorithms or parameters should be changed.

Requesting the verbose output of the garbage collection has very little overhead.  The little bit of overhead added is well worth what is provided in return.  The following parameters should be added to the JVM (HotSpot) startup command line to ensure verbose garbage collection is captured.

  • -XX:+PrintGCDateStamps
  • -XX:+PrintGCDetails
  • -Xloggc:<gc_log_path>/${SERVER_NAME}.$(date+%s)_gc.log

 

For JRockit, use the following commands:

  • -Xverbose:gc
  • -XverboseTimeStamp
  • -Xverboselog:<gc_log_path>/${SERVER_NAME}.$(date+%s)_gc.log

 

The parameters provided are strictly for the purpose of requesting garbage collection output.  These are to be in addition to the other JVM command line parameters. Note also that even though these GC log files are not large they should be managed, rotated, and cleaned up in the same manner as your other server logs so as not to overflow disk space.

Java Flight Recordings

This utility has been in the JRockIt JVM for many years and is an absolutely vital tool for identifying trouble spots in the JVM.  Beginning with the release of JDK 1.7.0_40 the flight recorder features have been added to the Java HotSpot JVM.

The utility gathers detailed run-time information about how the Java Virtual Machine and the Java application are behaving. The data collected includes an execution profile, garbage collection statistics, optimization decisions, object allocation, heap statistics, thread details, and latency events for locks and I/O.  The utility provides very detailed metrics and performance information about JVM execution over a rolling time window. Overhead is extremely low because the monitoring functionality is built into the JVM and is not generated using bytecode instrumentation, as is the case with other profiling tools.

The Mission Control Client is a graphical tool for analyzing the flight recording data that is captured by the JVM to a file. The Mission Control client can also be attached to a running JVM to view real-time performance and profiling data.

In order to instruct the flight recorder to begin collecting this information and storing data for off line analysis add the following JVM command line parameters for the HotSpot JVM.

  • -XX:+UnlockCommercialFeatures
  • -XX:+FlightRecorder
  • -XX:StartFlightRecording=maxage=<rolling amount of time>,filename=<filename>

 

Add the following command line parameter if you are running JRockit:

  • -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=<jfr_repos_path>,maxage=30m,dumponexit=true,dumponexitpath=<jfr_path>

 

When specifying the “rolling amount of time” keep in mind that one would want to capture a recording that’s long enough to span the time period you want to analyze. This is important in order to get an understanding of the behavior of the JVM and the application and specific events that occur that need to be captured.  The recommendation is usually to start with 30(m)inutes.

Refer to the relevant Java HotSpot Flight Recorder or JRockit documentation below for a complete understanding of the recording options and the Mission Control UI.

 

The flight recording offers many views into the JVM and application behavior.  The figures shown below are just a sampling of those views.  The views provided are just the summary page for each of the sections that the flight recorder provides.  When viewing the section summary pages notice that there are several tabs at the bottom of each page.  Selecting the tabs provides a deeper dive into each of the respective sections.

 

Figure 3 JFR General Section

 

Figure 4 JFR Memory Section

Figure 5 JFR Code Section

Figure 6 JFR CPU/Threads Section

Figure 7 JFR Events Section

 

Heap Dump

The heap dump is invaluable when it comes to determining what Java objects are consuming all of the Java heap space.  This is critical to understand when an out of memory exception occurs.  To ensure a heap dump is generated at an out of memory exception then add the following parameters to the JVM command line.

  • -XX:+HeapDumpOnOutOfMemoryError
  • -XX:HeapDumpPath=<dump path>

 

To get any value from the generated heap dump the use of an analysis tool such as the Memory Analyzer Tool (MAT) is required.  The Memory Analyzer Tool is downloadable from here.   An overview of the MAT tool and related techniques for Java heap dump analysis is beyond the scope of this post but you can find more details on using MAT here.  The screenshots below are some of the heap dump views.

 

Figure 8 Heap Dump Overview

Figure 9 Heap Dump Histogram

Figure 10 Heap Dump Dominator _Tree

Figure 11 Heap Dump Object Query Page

 

Summary

There are numerous free tools available for assisting in identifying Java application issues, hung threads, blocking threads, slow application and JVM performance, garbage collection, and heap consumption issues.  Unfortunately, too many developers, admins, and architects do not have these tools or the knowledge to use them in their toolboxes.  Becoming familiar with these tools will improve your ability to troubleshoot and tune Java applications more effectively and more rapidly. It is highly recommended that you add them to your toolkit as soon as possible.

Comments

  1. Sherwood Zern says:

    When using the flight recording one may decide to obtain a flight recording from the command line. This is a benefit, since one does not have to modify the start scripts, which would require the restarting of the managed server(s).

    $jcmd VM.unlock_commercial_features

    $jcmd VM.check_commercial_features

    //Start the flight recording for the process identified by the pid, provide the recording a name, and specify the recording to //have a rolling 30 minutes of data.

    $jcmd JFR.start name=my_recording maxage=30m

    //Dump the recording that has been collected thus far. A dump of the recording does not stop the recording. It only //dumps what has been collected thus far.

    $jcmd JFR.dump name=my_recording filename=filepath/filename.jfr

    // This actually stops the recording from collecting further data.
    $jcmd JFR.stop name=my_recording

Add Your Comment