Diagnosing performance issues front to back-end in WebLogic Server applications with Java Flight Recorder

Introduction

The Java Mission Control and Java Flight Recorder are relatively new tools that have extended greatly the diagnostics capabilities of the Java platform. They allow collecting an impressive amount of detailed runtime information about the JVM, with minimum performance impact, in a way that would have been hard to imagine a few years ago.

Java Mission Control is basically a set of tools that enables efficient and detailed analysis of the extensive data provided by Java Flight Recorder, which is the entity that lives in the JVM collecting a wide variety of runtime information. Java Flight Recorder used to be tightly integrated with the JRockit JVM, although it’s been bundled with the HotSpot JVM since Java 7 Update 40 release.

There is plenty of information out there about how to use both JMC and JFR in the form of blogs, videos and technical documentation, so I won’t cover that in much detail. The purpose of this article is to give a hint to developers unfamiliar with JFR on how to diagnose performance issues associated to an application flow triggered from the front-end.

 

Main Article

One of the most common scenarios that engineers working on applications deployed in WebLogic servers need to deal with is to diagnose a web application with poor performance.

Often, users complain about sluggishness after they click on a specific link of the application or as part of a specific operation. Also, it is common that these performance issues are not constant and happen rather randomly or intermittently.

Normally, getting a full picture of what could have gone wrong, from the front-end to the middle or back-end layers, requires a thorough analysis of all involved components. Depending on the logging capabilities or integrated diagnostic frameworks used by the application, the difficulty of debugging this way may vary, but in general, it becomes time consuming at the least.

If regular logging is only used, the debugger needs to correlate evidences of events and their timestamps in the log files from different components in order to get an idea of any potential bottlenecks.

Also, capturing fairly detailed performance information about an application is usually expensive, and typically requires enabling logging capabilities or using profiling tools based on the JVMPI/JVMTI interfaces, that may have a negative impact in performance as well.

Fortunately, Java Mission Control and Java Flight Recorder have made things much easier for everybody and have become the Holy Grail of Java application profiling, making it feasible to profile applications with virtually no performance degradation, which wasn’t possible a few years ago.

 

Capturing WebLogic event data with Java Flight Recorder

It is possible to integrate WebLogic and the Java Flight Recorder to collect event data from WebLogic containers, through the WebLogic Diagnostic Framework (WLDF). The overhead of enabling JFR and configure WLDF to generate WebLogic Server Diagnostics to be captured by JFR is minimal, and makes it ideal to be used in full-time basis, especially with production environments where it adds the greatest value.

Java Flight Recorder works with the concepts of events, which is the representation of a piece of data related to something that happened at a specific point in time.

When WebLogic is configured with the Oracle HotSpot JDK, JFR will be disabled by default and it will be necessary to enable it in one of the following ways:

 

1. By adding the following options in the java command that starts the WebLogic JVM:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

Since JFR is a commercial feature, you must specify ‘-XX:+UnlockCommercialFeatures’ before any other option, followed by ‘-XX:+FlightRecorder’ to enable JFR.

In the case of WebLogic, these options may be added to the startWebLogic.sh script that starts the Managed Server, in the JAVA_OPTIONS variable

Another alternative is adding them to the script: $DOMAIN_HOME/bin/setDomainEnv.sh

in the EXTRA_JAVA_OPTIONS variable.

For instance, if you wanted to start the server with a continuous recording, you can add the following parameters:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording= name=MyRecording,filename=myrecording.jfr

Continuous recordings have no specific duration and will keep recording until the JVM shuts down or until the recording is explicitly dumped through the jcmd ‘JFR.dump’ option. They are ideal for production systems, although it is possible to time-fix recordings by adding the ‘duration’ parameter to the jcmd command.

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording= duration=60s,name=MyRecording,filename=myrecording.jfr

Important: If you’re not familiar with editing these files, create a backup copy before making any changes.

 

2. By using the jcmd tool:

Jcmd is pretty flexible and has plenty of options, but the following command should suffice for starting a recording in the same fashion as the previous example:

$./jcmd <Managed_server_pid> JFR.start name=MyRecording filename=myrecording.jfr settings=default
For getting the PID of your managed server, you may get a list of running Java processes by running the jcmd command alone:
$ ./jcmd
You will see the Managed server process names and PID’s easy to identify as ‘weblogic.Server’.

 

3. By starting a recording from the Java Mission Control GUI:
It is also possible to use the Java Mission Control GUI itself, although for this alternative to work, you must start the Managed Server to be profiled with the ‘+XX:+UnlockCommercualFeatures’ and ‘+XX:+FlightRecorder’ parameters, as outlined in alternative #1. This method is as flexible as #2 in that it allows starting and stopping a recording on demand, in this case through the GUI.
One of the components available in the GUI is the JVM Browser, which you will see at the left side of the window when you open it:

JVM Browser

You can start a recording by following these steps:

– Start the application to be profiled with the following arguments to enable JFR:

-XX:+UnlockCommericalFeatures –XX:+FlightRecorder

– Start Java Mission Control by running $JAVA_HOME/bin/jcmd.

– You will see the JVM Browser at the left of the window. The JVM Browser is a component that shows a list of the JVM’s discovered in the host where the tool is running.
You can select the JVM you’d like to start JFR on and right-click on it, then select ‘Start Flight Recording’.
– In the next window, select ‘Timed Fixed Recording’ or ‘Continuous Recording’ depending on your needs.
If you need to create a duration-based recording, you can specify the recording time, otherwise, maximum age and maximum size options are available for continuous recordings.
– In the next couple of screens you can configure, optionally, advanced event capturing settings like ‘Method sampling’ (that defines how often JFR samples your methods), Thread Dump periodicity (every 1, 10 or 60s.), and Exceptions (errors only, or all errors and exceptions).

– Click on ‘Finish’.

– The recording will be created and once it’s done, it will be displayed in Mission Control.

Of course, there’s much more on how to start JFR in terms of options and capabilities it supports, but you can refer to the Java Flight Recorder Runtime Guide in the Java SE Documentation site and to the many tutorials available in the Internet.
It is possible to configure JFR to generate the recording after the JVM exits or after certain duration, after which a recording file will be generated as specified in the filename parameter.
It is important to mention that the amount of event data for the WebLogic Server captured by JFR can be adjusted for each event by configuring diagnostic volumes. Normally, the diagnostic volume control will be set to “Low”, which has minimal performance impact on the Server. It is possible to gather more detailed information by setting the volume to “Medium” or “High”, although the performance impact should be assessed (specially with “High”). For more information, refer to “Configuring WLDF Diagnostic Volume”:

 

The WebLogic Execution Context ID (ECID)

In Fusion Middleware, an Execution Context ID (ECID) is a unique identifier that is injected as a request enters the system, and that is associated with an execution flow, as it goes from front-end components like the Oracle HTTP Server to others like Oracle WebCache, multiple WebLogic servers, to the Oracle Database. It can be used to correlate events occurred in different layers that are associated to the execution flow of the same request.

The ECID will be tracked and will be used and displayed by tools like JMC and JFR as long as the WebLogic Diagnostic Framework is enabled in any format, like any of the following:

– By configuring WLDF settings in the Administration Console.
– By writing and running WLST scripts
– Programmatically, by using JMX and WLDF configuration MBeans.
– By editing the XML configuration files directly.

For the purposes of this article, one recommendation would be enabling a Diagnostic Context for the WebLogic servers you want to profile, or defining a Diagnostic Module that targets those servers. Both alternatives, along with many other advanced WLDF options, can be configured easily in the Administration Console.

The documentation on WLDF is very extensive and detailed so I won’t go into that in detail, but you can refer to the “Configure the WebLogic Diagnostic Framework” chapter in the Oracle WebLogic Server Administration Console Online Help for more information.

Once WLDF is enabled, an ECID will be created and initialized when a request enters the system (for example, when a client makes an HTTP request). The ECID will remain associated to the request, even as it crosses the JVM boundaries to other products (OHS, SOA Server, etc.).

We will use the ECID to correlate the events captured by JFR related to WebLogic.

 

Viewing WebLogic diagnostic data with Java Mission Control

We have referred to the Java Mission Control (JMC), which is the tool for viewing and analyzing the data collected by JFR.
JMC can be launched with the command $JAVA_HOME/bin/jmc, which starts a GUI application that you can use to open and view JFR recording files (that normally have .jfr extension).
However, in order to be able to see the data captured from WebLogic, we will need to install the WebLogic plug-in for JMC.
You can do so by following the next steps:

1. Open JMC by running $JAVA_HOME/bin/jmc
2. Once the GUI has been launched, go to the ‘Help’ menu and click on the option ‘Install new software’.
3. In the upcoming window, expand the ‘Flight Recorder plug-ins’ option from the list and select the ‘WebLogic Tab Pack’ checkbox.
4. Click ‘Next’ and follow the remaining instructions to install the plug-in.
5. Restart Java Mission Control.

Once these steps are completed, you should be able to see an additional ‘WebLogic’ tab in the left pane when you open a JFR recording with WebLogic event data, as follows:

 

JMC WebLogic tab 2

As you may have noticed, there are plenty of places in the recording where we can look for issues. Depending on the context we have of the problem, we may want to look at a specific sub-tab if we are suspicious about a particular type of problem (like the Database sub-tab).
The possible scenarios and use cases are countless, of course, so I’ll just focus on a couple of them that may give an idea of how to use the extensive information provided through the JMC GUI.

Scenario #1: Find expensive events from Servlet invocations

One possible scenario we may find is that we want to drill down from a particular URL invoked to the underlying layers where the performance issue could be. Application users may have told us about an URL particularly slow. In this case, the WebLogic plug-in lets us focus on events associated to one or more URI invocations.

You can follow the next steps:

1. Click on the WebLogic tab in the left pane of the JMC window.

2. Go to the ‘Servlets’ sub-tab. You will see the Servlets window containing different sections, including Servlet Operations, Servlet Invocations by URI and a chart at the top showing the peaks of occurring events.

Servlets subtab

3. Take a look at the ‘Servlet Invocations by URI’ panel at the bottom. It will contain a list of URI’s invoked during the recording, along with useful attributes like their sample count and average duration. If your application serves client HTTP requests through Servlets (as it is commonly the case) this is a place where you would search for the URI related to the particular action that you identify as poorly performing:

Servlets by URI

If the performance issue is actually located in your Java application and it really happens within the execution flow of the URL you suspect, you will see its corresponding URI showing up in the list with a high average duration.

4. Add the corresponding URI from the list to an Operative Set.

Operative Sets are a feature of Java Mission Control that allows grouping a set of events having some very specific properties. The user may pick specific events of interest from different tabs (like the WebLogic tab) and add them to the Operative Set, which can be reached and utilized from almost every tab in the JMC user interface, especially from the Events tab of the JMC GUI, from where they can be managed, analyzed or removed.

In order to create the Operative Set with the events related to the execution of a given URI invocation, follow the next steps:

4.1. Select the corresponding URI from the list and right-click to get the context menu.
4.2. Select ‘Operative Set’ -> ‘Add Related Events’

OperativeSet submenu 1

4.3. In the upcoming submenu, select ‘With ECID=<very long hexadecimal id>.. (First in Selection)’.

OperativeSet submenu 2

This will add to the Operative Set the events associated with the first ECID found with an invocation of the URI in question (with the very long hexadecimal ID value).

It is also possible to add all the events related to all the invocations of the indicated URI (even with different ECID’s) by selecting ‘With any ECID in Selection’ in this step. The same would work if option ‘With URI=<URI_being_selected>’ is selected.

After this, the events added to the Operative Set will be able to be viewed in the Events tab.

5. Go to the ‘Events’ tab in the left vertical pane and visualize the events data. You will see multiple sub-tabs for different interesting aspects of the Event data.

Events tab

 

 By default, the Events tab will show data for all the events in the recording. However, that can be limited in each tab to the data in our Operative Set by checking the ‘Show only Operative Set’ checkbox.

6. Search the ‘Event types’ tab, that appears in the left side of the JMC window and select the ‘WebLogic Server’ checkbox. This will include WebLogic events to be shown in the Events tab as well:

Event types WebLogic checkbox

7. Click on the ‘Log’ sub-tab. This sub-tab displays a table if the available events. When you select one of the events, all the information (attribute values) in the event will be shown to you in the lower half of the tab. Again, in this tab you can opt by only seeing what’s in the Operative Set by checking the same box:

Events Log subtab

8. Play with the data shown.

As you may notice, the event log table has several attributes of interest by default:

– Event type
– Start time
– End time
– Duration
– Thread
– Java Thread Id
– OS Thread Id

Depending on what you need, you may sort the table by any of these columns. For instance, you can sort by duration and check the event type that took longer to execute.

The following example shows a sequence of events sorted by duration. Since we are only showing events associated with the same ECID, it is possible that the events on top of the list are aggregating time from underlying events.

Events Log by Duration

Once you identify an event type that is particularly time consuming, you can select it and its corresponding attributes will show at the bottom (in the ‘Event attributes’ pane). Among them, you can see the executing class name, the method name, and the thread stack of the event with the corresponding line number of the running code.

Events Log details 1

In the previous example, I’ve selected an event of type ‘JDBC Statement execute’ with duration of 645 microseconds. Whilst in this case such duration is insignificant, lengths of magnitude of several milliseconds may be too high for production systems running those statements concurrently thousands of times for hundreds of users. Fortunately, JFR will provide enough details based on the event type, like the SQL statement and the JDBC invoking class (weblogic.jdbc.wrapper.PreparedStatement here), which may have negative performance repercussions if not chosen adequately.

We can also see the sequence of events that occurred before and after the event in question by keeping it selected, and then sorting by ‘Start time’. That will give you a clear picture of the flow of events and their duration.

Event Log details 2

Tip: You may chose different combinations depending on your specific use case. For instance, in the case of application components running concurrently with multiple users, you may want to check if all the executions had a similar issue. In that case, you could add all the events related to invocation executions for a given URI by selecting ‘With URI=<your_URI>’ in step 4.3.

 

Scenario #2: Find expensive executions by their ECID’s

If we are not sure of what URL/URI invocation ended up in a performance issue but have an estimated time when the issue happened, you can also search long running executions that occurred around such time and then find their associated events, with their respective durations and URI’s.

Follow these steps:

1. Click on the WebLogic tab in the left pane of the JMC window.
2. Click on the ‘ECIDs’ sub-tab.

WebLogic ECID subtab

3. You will see three main sections in the window (Events chart, ECIDs and ECID log).

4. In the ECIDs table, you may sort by ‘Start time’ and then locate the ones close to the time where you suspect the issue occurred. An alternative is to sort by the ‘Duration’ column and find the executions with the longest duration.

Events Log ECID by Duration

5. Select the ECID you’d like to see more about.

6. Look at the ECID log table at the bottom. It will show the sequence of events for that particular ECID, including their start and end time, their duration and their URI (if the URI column is not visible, you can enable it by right clicking on any event, then selecting ‘Visible Columns’ -> URI).

Event Log ECID details

Once you identify an event that looks suspicious, you can go to the ‘Events’ tab -> Log sub-tab and locate it, getting additional information about it like the Event thread stack, just like it’s explained in the scenario #1.

 

Conclusion

What I’ve shown here is the teeny tiny tip of a huge iceberg of possibilities brought by the Java Flight Recorder and Java Mission Control. They put together things that, in the past, needed to be looked up in different places from different tools, and make it incredibly easy to access key information about your system.

Nowadays, as application containers like WebLogic have become the cornerstone of modern critical business applications, having the capability of doing an efficient troubleshooting against the clock may be the difference between success and failure of enterprise applications in many cases.

As platforms and languages evolve, we see that advanced diagnostic features play a more core, relevant role. We can just imagine what is coming in years ahead, but for now there are still a lot of possibilities to be found with these two great Java features.

Happy diagnosing!

Add Your Comment