Forwarding Messages from WebLogic to TIBCO EMS

Introduction Messaging is one of those technologies that have been around for decades, but only recently people got aware of the important role that messaging plays to any software-based architecture. Any use case that requires messages to be exchanged between systems in a reliable, fault-tolerant and scalable way may leverage messaging to fulfill its needs. […]

Identity Cloud Services and Weblogic Federation with Virtual Users and Groups

Introduction Federation is a well-known pattern and has been discussed at length on this blog. Almost every vendor or cloud provider out there supports Federation and it’s been around for quite some time now. In this blog post, I will talk about Federation again, but this time in combination with Weblogic’s Virtual Users and Groups. […]

Transport Level Security (TLS) and Java

Know Which Versions of TLS are Supported in Recent Java Versions NOTE:  A more comprehensive examination of TLS and what to examine when setting up web service integrations in Oracle Cloud Saas extensions has been published.  See Transport Layer Security (TLS) and Web Service Connections in SaaS Integrations In the twenty-plus years of the Internet’s […]

Understanding listen ports and addresses to effectively troubleshoot Fusion Applications

Introduction: To communicate with any process, you need to know at least 3 things about it – IP address, port number and protocol. Fusion Applications comprises many running processes. End users communicate directly with only a handful of them, but all processes communicate with other processes to provide necessary services. Understanding various IP addresses and listen ports […]

WebLogic Server: Saving Disk Space in /tmp

Introduction Many WebLogic Server (WLS) implementations use JRockit 28 as the JVM implementation. JRockit 28 comes with the very useful JRockit Flight Recorder which helps in many troubleshooting situations. Problem In high volume WLS implementations with many domains and many managed servers the Flight Recorder could fill up the disk of the temporary file storage […]

Addition by Subtraction – A Quick Win in Performance

Recently, I worked with a customer where we would notice severe performance degradation over time within JCS (Java Cloud Service). The product used was Oracle Service Bus (OSB) where it would take around 2 seconds to process a request and send a response.  Over time, we would see the same request/response take over 1 minute!  If we […]

Caching in OSB 12c without Out-Of-Process Coherence Servers

Introduction One of the most popular use cases for Oracle Service Bus (OSB) is the mediation of synchronous service interactions. In this scenario, a client invokes the service through a proxy instead of the actual service endpoint, guaranteeing that the consumer is decoupled from the producer. This type of architecture allows producers to be changed […]

Understanding the use of “WebLogic Plugin Enabled”

Introduction: Requests to a WebLogic Server (WLS) usually go through a web server or a load balancer which serve as a proxy for the client requests. When the WLS requests are “front-ended” by either a web server or a load-balancer, the requests are handled via a plugin. It is important for WLS to be aware of the […]

Where did my MDB concurrency go?

Problem Description Recently, there have been numerous customers reporting a similar symptom with regards to MDB (Message Driven Bean) concurrency within WebLogic Server.  The complaint is that not all JMS messages are being consumed in a timely manner and in fact some messages look to be consumed in a serial fashion instead of concurrently. Here is […]

Diagnosing High CPU problems with Java Mission Control

How to diagnose high CPU problems with Java Mission Control? Recently we worked on couple of customer escalations where we have extensively used Java Mission Control to resolve high CPU problems. That prompted me to write an article on Java Flight Recording analysis using Java Mission Control (JMC) In this posting, I’m going to teach you […]

A crude JMS performance comparison between different WebLogic server persistence stores

When it comes to persisting service and subsystems data (i.e. JMS), WebLogic server offers customers a choice: filesystem or relational database using JDBC.  Persistence store has implications on WLS performance and systems management.  In this post, I will provide an unofficial JMS performance comparison using different persistence stores. PLEASE NOTE: This is not an official Oracle performance benchmark.  […]

WebLogic Server 12.1.2 and Coherence 12.1.2: New Technical Whitepapers

Several new technical whitepapers on WebLogic Server 12.1.2 and Coherence 12.1.2 have recently been released. In case you missed them, this post provides links to some of the more interesting ones.  You’ll need to log into to download these.   Oracle WebLogic: 12.1.1 Business Overview – PDF download This whitepaper provides an overview of […]

WebLogic Server Cluster Messaging Protocols

by Robert Patrick and Sabha Parameswaran WebLogic Server clusters form a loosely-federated group of managed servers that provide a model for applications to leverage for achieving scalability, load balancing, and failover. Each managed server manages its view of what servers are in the cluster and its own cluster-wide view of the JNDI tree. The cluster […]

Oracle Dynamic Monitoring Service – metricdump Log

Introduction Have you ever wondered how to control/configure the metricdump log files that accumulate in your WebLogic Server domains configured with Fusion Middleware?  This blog will help to answer that question. Main Article When running WebLogic Server instances configured with Fusion Middleware (e.g., SOA Suite), there are Dynamic Monitoring Service (DMS) metrics that are gathered every 3 […]

Log rotation for WebLogic Server (and friends)

If you have a number of WebLogic Server instances running, or applications that write a lot of information into the logs, you might find that your log files, and your stdout files start to eat up a lot of space … Continue reading

A Universal JMX Client for Weblogic – Monitoring BPEL Thread Pools in SOA 11g

Monitoring and optimizing BPEL Thread Pool utilization (and other metrics) is one of the key activities in performance tuning of BPEL/SOA based integrations. Although EM console provides some basic monitoring of the BPEL engine statistics, it is limited regarding the update interval, detail and the recording interval and cannot display historic data. Of course you can setup Grid Control 11g with its Repository, but this is in most cases too complex to setup just for monitoring during performance and load testing.

So, the idea came to create a tool which can easily record these statistics and export them to MS Excel or OpenOffice to create charts for the thread pool utilization over a time period (for example a whole load test execution).

All values of WLS or the SOA engine can be queried using the JMX MBean framework. I have designed the JMXClient to be able configure which MBeans should be queried by using a property file ( I decided to connect in this first release to only one Managed Server of WLS to record / export data. This means that if you have a WLS cluster, you need to start multiple JMX Clients to record the values of each node. (In a later release the JMX Client could be optimized to query all nodes automatically).

JMXClient can be used by downloading from the project page at Sourceforge.  (including JDeveloper project and sources)

After that you need to configure

  1. your connection properties, JAVA_HOME and WLS_HOME of  your WLS managed server of SOA in  jmxclient.bat (or
  2. the MBeans names, WLS Server name  and the attributes to record in classes/  (you can find the MBean names form the System MBean Browser in EM)

The syntax in jmxclient.bat is

java -cp classes;%INCLUDE_LIBS% jmxclient.JMXClient <server> <user> <password> -monitor 1000

For example

java -cp classes;%INCLUDE_LIBS% jmxclient.JMXClient 7001 weblogic welcome1 -monitor 1000

“1000” specifies the interval in milliseconds between the recording.

Then you can run it with

jmxclient > out.txt

Then simple import this text file using Excel or OpenOffice and a comma “,” as delimiter and create a line chart using line 2 as titles and lines 3 to end as data.

Let me first show a couple of results using JMXClient using the properties to record the BPEL thread pool statistics:

The following chart shows a scenario where the invoke thread pool is much too low (20) so that the queue of scheduled invocations waiting for a free thread is growing rapidly:


The second example shows a scenario where invoke and callback threads are within normal limits:


In the next posts I will show how to use JMXClient to record the BPEL process execution times or the number of messages in the AIA 11g JMS queues by simply exchanging the file…..!

Have fun,

DISCLAIMER: JMXClient is provided for free use “as is” without any support or warranty. Please provide enhancements or modifications you make yourself. Feedback is welcome using the comments feature of this blog.

PS: for the experts: the format of the file:

Every line contains 3 items separated by semicolon:

  1. the name of the MBean to query
  2. the attribute to query
  3. the title string which should be displayed for the column

Example for the bpel thread pools:

oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/invoke,type=soainfra_bpel_requests;active_maxValue;Invoke Active Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/invoke,type=soainfra_bpel_requests;scheduled_maxValue;Invoke Scheduled Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/invoke,type=soainfra_bpel_requests;scheduled_value;Invoke Scheduled Current
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/invoke,type=soainfra_bpel_requests;active_value;Invoke Active Value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/invoke,type=soainfra_bpel_requests;threadCount_value;Invoke Threads Value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/invoke,type=soainfra_bpel_requests;threadCount_maxValue;Invoke Threads Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/system,type=soainfra_bpel_requests;active_maxValue;System Active Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/system,type=soainfra_bpel_requests;scheduled_maxValue;System Scheduled Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/system,type=soainfra_bpel_requests;scheduled_value;System Scheduled Current
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/system,type=soainfra_bpel_requests;active_value;System Active value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/system,type=soainfra_bpel_requests;threadCount_value;Invoke Threads Value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/system,type=soainfra_bpel_requests;threadCount_maxValue;Invoke Threads Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/engine,type=soainfra_bpel_requests;active_maxValue;Engine Active Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/engine,type=soainfra_bpel_requests;scheduled_maxValue;Engine Scheduled Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/engine,type=soainfra_bpel_requests;scheduled_value;Engine Scheduled Current
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/engine,type=soainfra_bpel_requests;active_value;Engine Active value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/engine,type=soainfra_bpel_requests;threadCount_value;Engine Threads Value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/engine,type=soainfra_bpel_requests;threadCount_maxValue;Engine Threads Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/audit,type=soainfra_bpel_requests;active_maxValue;Audit Active Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/audit,type=soainfra_bpel_requests;scheduled_maxValue;Audit Scheduled Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/audit,type=soainfra_bpel_requests;scheduled_value;Audit Scheduled Max
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/audit,type=soainfra_bpel_requests;active_value;Audit Active value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/audit,type=soainfra_bpel_requests;threadCount_value;Audit Threads Value
oracle.dms:Location=AdminServer,name=/soainfra/engines/bpel/requests/audit,type=soainfra_bpel_requests;threadCount_maxValue;Audit Threads Max

Starting a cluster

Recently, I have been involved in a number of discussions with people who are setting up clusters of various Fusion Middleware products, often on an Exalogic machine.  These discussions have led me to feel that it would be worth sharing … Continue reading

ThreadLogic version 0.95 available

  ThreadLogic version 0.95 is available for public download now. Additional Features Biggest change is support for externalizing the Advisories and group definitions. Users can use the pre-defined AdvisoryMap.xml to come up with custom …

Introducing ThreadLogic

Eric Gross and Sabha Parameswaran, from Oracle FMW Architects Team (The A-team) are happy to introduce ThreadLogic, an open source Thread Dump Analyzer tool, to Middleware community.Motivation behind ThreadLogic  The current set of TDA tools (Sam…

Analyzing Thread Dumps in Middleware – Part 4

This posting is the fourth and final section in the series Analyzing Thread Dumps in Middleware

In this section, we will see a new version of TDA (Thread Dump Analysis) tool developed by the author of this blog (Sabha Parameswaran in collaboration with his colleague, Eric Gross, also from Oracle A-Team) and its capabilities as well as some real world samples of thread dump analysis before concluding the series.

TDA A-Team – an Enhanced TDA (version 3.0)

As mentioned earlier, the current TDA tools (Samurai/TDA) don’t mine the data inside thread dumps or provide a more detailed view of what each thread is doing while just limiting themselves to reporting the state (locked/waiting/running) or the lock information. No mention of type of activity within a thread, should it be treated as normal or deserving a closer look? Can a pattern or anti-pattern be applied against them? Any possible optimizations? Are there any hot spots? Any classification of threads based on their execution cycles?

In order to fill some of the gaps in the existing TDA Tools as well as provide a detailed analysis of WebLogic Server specific thread dumps, the author of this blog (Sabha Parameswaran) decided to developed a custom TDA tool that can suggest set of actions and recommendations based on some predefined thread execution patterns.

In collaboration with Oracle A-team colleague, Eric Gross, the author decided to enhance the existing open source TDA version 2.2 instead of reinventing the wheel and leveraging the capabilities of base TDA to handle base parsing of the thread dump, the UI and Reporting. Eric Gross enhanced the TDA to fully support Sun Hotspot, JRockit (support was partial for JRockit in base TDA v2.2) and IBM JVM Thread dumps as well as adding support for custom categories in the TDA tool. Sabha Parameswaran enhanced the tool’s analytics capabilities – grouping of threads based on functionality and tagging of threads with advisories using pre-defined rules and patterns which can be extended to handle additional patterns.

We wish to thank Ingo Rockel, Robert Whitehurst and numerous others who had contributed to the original TDA which allowed us build on their work in delivering a more powerful tool for the entire Java community.

Once a thread dump is parsed and threads details are populated, each of the thread is then analyzed against matching advisories and tagged appropriately. The threads are also associated with specific Thread Groups based on functionality or thread group name.

Each of the advisory has a health level indicating severity of the issue found, pattern, name, keyword and related advice. 
Samples of advisories:
Thread Advisory Name Large # of WLS Muxer Threads
Health Level
Keyword WebLogicMuxerThreads
Description Large number of WLS Muxer Threads
Advice Reduce number of WLS Muxer Threads to under 4, use -Dweblogic.SocketReaders=NoOfThreads flag in command line


Thread Advisory Name Stuck Thread
Health Level
Keyword STUCK
Description Thread is Stuck, request taking very long time to finish
Advice Check why the thread or call is taking very long?? Is it blocked for unavailable or bad resource or contending for Lock? Can be ignored if it is doing repeat work in a loop (like adapter threads polling for events in a infinite loop)…
Thread Advisory Name Hot Spots
Health Level
Keyword HotCallPattern
Description Multiple Threads executing same code path
Advice Ensure there are no blocking locks or bottlenecks, sufficient resources are available,remote service being invoked is responsive and scaling well to handle increased load

Each of the advisory gets triggered based on either call execution patterns observed in the thread stack or presence of other conditions (thread blocked or multiple threads blocked for same lock can trigger BlockedThreads Advisory). Sometimes a thread might be tagged as IGNORE or NORMAL based on its execution logic or might be tagged more specifically as involved in JMS send or receive client or a Servlet thread.

The health levels (in descending of severity) are FATAL (meant for Deadlocks, STUCK, Finalizer blocked etc), WARNING, WATCH (worth watching), NORMAL and IGNORE. Based on the highest criticality of threads within a group, that health level gets promoted to the Thread Group’s health level and same is repeated at the thread dump level. There can be multiple advisories tagged to a Thread, Thread Group and Thread Dump.

Snapshot of Advisory Map

Advisory Map

Snapshot of Threads tagged with advisories in the thread dump

Threads in a thread dump tagged with Advisories/Health Levels

Thread Groups Summary

The threads are associated with thread groups based on the functionality or thread names. Additional patterns exists to tag other threads (like iWay Adapter, SAP, Tibco threads) and group them. The summary page reports on health level of the group, total number of threads, threads that are blocked, critical advisories found etc.

Thread Groups Summary

Critical Advisories per thread group
The critical advisories (at Warning/Fatal health levels) found in individual threads are then promoted to the the parent thread group and reported in the thread group summary page.

Critical Advisories for Thread Group

Thread Groups

One can see the thread groups are divided into two buckets – WLS and non-WLS related threads. The JVM threads, Coherence, LDAP and other unknown custom threads go under the non-WLS bucket while all the WLS, Muxer, Oracle, SOA, JMS, Oracle Adapter threads are all under the WLS bucket.


 Individual Thread tagging with Advisories

Clicking on the individual threads will display the advisories and thread stack.

Advisories and details at thread level

The details of the advisory will pop up on mouse over on the advisory links.
The Advisories are color coded and  details can be highlighted.

Colored advisories for individual threads

Sub-groups are also created within individual Thread Groups based on Warning Levels, Hot call patterns (multiple threads executing same code section), threads doing remote I/O (socket or db reads) etc.

Following snapshot shows example of a Hot call pattern where multiple threads appear to be executing the same code path (all are attempting to update the MBean Server with a new mbean).

Hot Call Pattern – multiple threads exhibiting similar code execution

Merging of threads across multiple thread dumps and reporting of progress in the thread state

Merge has been enhanced to report on the progress of the thread across the thread dumps. Based on the order of the thread dumps, the thread stack trace is compared for every consecutive thread dump.

Merged view showing progress information for individual threads  

Merged reporting of individual thread stack traces (exists in base TDA version 2.2)

Merged Thread stack traces across thread dumps

Merging can also be done across multiple thread dump log files (like in case of IBM which creates new log file containing the thread dump every time a request is issued).

Usability benefits of new TDA A-Team
Thanks to the advisories and health levels, its easy for users to quickly understand the usage patterns, hot spots, thread groups, as well as highlight the patterns or anti-patterns already implemented in the advisory list.

For example of an anti-pattern: a Servlet thread should not be waiting for an event to occur as this will translate to bad performance for end user. Similarly usage of synchronous jms consumers might be less performant compared to using async consumers. Too many WLS Muxer threads is not advisable. If WLS Muxer or Finalizer threads are blocked for unrelated locks, this will be a fatal condition. It would be okay to ignore STUCK warning issued by WLS Server for Pollers like AQ Adapter threads but not for other threads that are handling servlet request.

The thread groups help in bunching together related threads; so SOA Suite users can see how many BPEL Invoke and Engine threads are getting used, B2B users can see number of JMS consumers/producers, WLS users can look at condition and health of Muxer threads, similarly for JVM/Coherence/LDAP/other thread groups.

The merged report lets the user see at a glance the critical threads and check if they are progressing or not instead of  wading through numerous threads and associated thread dumps.

We hope this tool can really help both beginners and experts do their jobs more quickly and efficiently when it comes to thread dumps. 

Real World Sample of Thread Dump Analysis

A Customer was planning on a POC (Proof of Concept) of WebLogic Portal Server (WLP) on the new Oracle Exalogic Platform and measure its performance relative to their existing configurations. The test configuration used a cluster of WLP servers, communicating via Multicast for cluster membership and other service announcements. As soon as the servers were brought up, the servers became unresponsive. The server instances could not serve any request – neither traffic to Portal Application nor WLS Admin Server requests for status from the server instances. Following errors get getting thrown in the server: 

UnsyncCircularQueue$FullQueueException: Queue exceeded maximum capacity of: ‘65536’ elements. 

There were also frequent log messages of p13n cache getting updated and flushed repeatedly. The p13n cache is the Personalization cache used internally by Portal for handling Portal personalization. Each of the servers in the cluster are kept of changes in cache via multicast broadcasts. When the customer disabled the p13n cache, the server regained normalcy and responsiveness. But the p13n cache is a requirement for portal application functionality and its not possible to have it disabled.

The FullQueueException essentially indicated that the WLS’s internal request queue was getting overwhelmed with requests and its starting to ignore/drop new requests as its already has 64K requests waiting to be executed. These requests can be internal (like transaction timeouts, schedulers, async processing) or externally generated (like client http requests). The storm of requests coupled with p13n being a key presence for the issue indicated p13n cache is triggering an internal flood of requests to the server somehow. It was decided to take thread dumps to understand what the p13n cache is executing.

Analyzing the thread dumps threw light on a weird behavior.

    at java/lang/Thread.sleep(J)V(Native Method)
at weblogic/cluster/MulticastFragmentSocket.sendThrottled(
at weblogic/cluster/MulticastFragmentSocket.send(
^-- Holding lock: weblogic/cluster/MulticastFragmentSocket@0x1d4afd4c8[thin lock]
at weblogic/cluster/FragmentSocketWrapper.send(
at weblogic/cluster/MulticastSender.fragmentAndSend(
at weblogic/cluster/MulticastSender.send(
^-- Holding lock: weblogic/cluster/MulticastSender@0x1c7ce3628[thin lock]
at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushCluster(
at com/bea/p13n/cache/internal/system/SystemCacheManager.flushClusterKeys(
at com/bea/p13n/cache/CacheManager.flushKeys(

at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushLocal(
at com/bea/p13n/cache/internal/system/SystemCacheManager.flushLocalKeys(
at com/bea/p13n/cache/internal/system/CacheClusterMessage.execute(
at weblogic/cluster/MulticastReceiver$
at weblogic/work/ExecuteThread.execute(
at weblogic/work/

The above thread stack indicates the MulticastReceiver used by WLS internally to handle multicast announcements got some request/announcement that gets handled by the p13n cache. It appears to trigger a flush of local keys in the cache. But then it gets modified as a flush of cluster keys which then results in a MulticastSender sending out a cluster broadcast.

This essentially means for every announcement that lands on a server instance, it will resend the announcement to other members whcih then gets replicated for every member across the cluster leading to an infinite looping of the same message and contributing to a never ending multicast packet storm.

Analyzing the product code against the stack trace, it appeared that there was a mismatch between the CacheManager.flushKeys() implementation as the method code lines didn’t match. So, it was apparent that there was some variant form of CacheManager that was getting picked by the application while handling the cluster announcements.

    at java/lang/Thread.sleep(J)V(Native Method)
at weblogic/cluster/MulticastFragmentSocket.sendThrottled(
at weblogic/cluster/MulticastFragmentSocket.send(


^-- Holding lock: weblogic/cluster/MulticastFragmentSocket@0x1d4afd4c8[thin lock]
at weblogic/cluster/FragmentSocketWrapper.send(
at weblogic/cluster/MulticastSender.fragmentAndSend(
at weblogic/cluster/MulticastSender.send(

^-- Holding lock: weblogic/cluster/MulticastSender@0x1c7ce3628[thin lock]

at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushCluster(
at com/bea/p13n/cache/internal/system/SystemCacheManager.flushClusterKeys(
at com/bea/p13n/cache/CacheManager.flushKeys(

at com/bea/p13n/cache/internal/system/SystemCacheManager.doFlushLocal(
at com/bea/p13n/cache/internal/system/SystemCacheManager.flushLocalKeys(
at com/bea/p13n/cache/internal/system/CacheClusterMessage.execute(

at weblogic/cluster/MulticastReceiver$
at weblogic/work/ExecuteThread.execute(
at weblogic/work/

Using WLS ClassLoader Analysis Tool (CAT), we were able to drill down to where this particular CacheManager class was getting loaded from. It was packaged inside a jar that got added to the Portal EAR application libraries. Removing that particular jar containing the CacheManager resolved the problem. Later investigation revealed that the jar came from an older version of the Portal Product and should not have been used with the latest version of Portal that was being tested on Exalogic. The analysis of a single thread stack trace helped pin point the root cause for all the hang conditions in this scenario.

Limitations of Thread Dumps

Although Thread dumps can throw light on multiple things, there are also some limitations.

  • It is a snapshot of a running system
    • The system should have some representative load or exhibit symptoms when capturing thread dumps
    • The dumps are of no value if none of the above
  • It cannot show details on memory usage
  • It can show what is the pattern executed but not how many times it was executed previously (unless there are multiple threads dumps captured at periodic intervals).
  • It cannot show details of the request that is driving the code execution or which remote service is slow
    • Same set of framework code can get executed for SOA Suite BPEL Processes or OSB Proxies
    • we wont be able to determine which external service is blocking the response for a thread
    • Would require analysis of server logs/proxies/access logs and or heap dumps to identify slow services and then drill down from there or other forms of monitoring (EM/OSB Console monitoring statistics)
Thread Dump takeaways
Even if thread dump does not give a direct solution, it can definitely provide good pointers as to where the problem lies and what are the hotspots.
  • Finalizer doing heavy work (too many finalizable objects to clean up) in each successive thread dump would indicate memory handling issues (and possibly overuse/misuse of finalize – lazy cleanup or recreating additional objects during finalize instead of active deleting/cleaning up)
  • Busy GC Threads would indicate memory thrashing
  • Threads stuck on socket reads would indicate slow reads or backends not responding in a timely fashion or socket closures due to firewall or other bad socket handling
  • Threads stuck in socket connect indicates the backend service itself is unavailable or not listening at that address or wrong connection detail or connection recreation every time.
  • Threads stuck in database sql reads/executes can indicate db response issue or sql tuning issue
  • Threads blocked in opening files, jars, zips or checking on file attributes can indicate problem in reading the application bits or files on a remote storage or network mount point that’s will slow down server startup/deployment. One customer problem was related to server instance accessing WLS install & deployed application bits from a remote site that was about a hundred miles away.
  • Threads blocked on locks for synchronized code imply bottlenecks
    • Resolve the bottleneck by increasing the resources under contention
    • Avoid the synchronized calls entirely (ex: logging might be excessively synchronized, reduce logging levels or turn off logging ) whenever possible
    • Optimize the call execution to reduce lock time
  • Some of the issues might then require further investigation into environment (process, cpu, memory, io, network etc) or services/partners/actors/clients to resolve the issues
  • For cluster related or distributed system issues, thread dumps should be captured and analyzed from all related parties (cluster/services). 
Thread Dump Analysis is a really valuable and key weapon in every Java Developer and Architect’s arsenal when it comes to understanding, identifying and resolving performance or other issues. I hope this series (along with the new enhanced TDA tool) helps makes it easier and useful for people to troubleshoot, tune performance and get a better handle over complex Java/JEE systems.