This posting is the second section in the series Analyzing Thread Dumps in MiddlewareThis section details with when, how to capture and analyze thread dumps with special focus on WebLogic Application Server related thread dumps. Subsequent sections will deal with more real-world samples and tools for automatic analysis of Thread Dumps.
Everyone must have gone through periodic health checkups. As a starting point, Doctors always order for a blood test. Why this emphasis on blood test? Can't they just go by the patient's feedback or concerns? What is special about a blood test?
Blood Test helps via:
The Thread Dump is the equivalent of a blood test for a JVM. It is a state dump of the running JVM and all its parts that are executing at that moment in time.
Under what conditions should we capture thread dumps? Anytime or specific times? Capturing thread dumps are ideal for following conditions:
The cost associated with capturing thread dumps is close to near zero; Java Profilers and other tools can consume anywhere from 5 to 20% overhead while a thread dump is more of a snapshot of threads which requires no real heavy weight processing on part of the JVM. There can be some cost only if there are too many interrupts to the JVM within real short intervals (like dumps forever every second or so).
There are various options to capture thread dumps. JVM vendor tools allow capture and gather of thread dumps. Operating System interrupt signals to the process can also be used in generating thread dumps.
Sun Hotspot's jstack tool (under JAVA_HOME or JRE Home/bin) can generate thread dumps given the jvm process id. Similarly, jrcmd (from JRockit Mission Control) can generate thread dumps given a jvm process id and using the print_threads command. Both require to be run on the same operating system as the jvm process and both dump the output to the user shell.
In Unix, kill -3 <PID> or kill -SIGQUIT<PID> will generate thread dumps on the JVM process. But the output of the thread dump will go to the STDERR. Unless the STDERR was redirected to a log file, the thread dump cannot be saved when issued via kill signals. Ensure the process is always started and STDERR is redirected to a log (best practice to also have STDOUT redirected to same log file as STDERR).
In Windows, Ctrl-Break to the JVM process running in foreground can generate thread dumps. The primary limitation is the process needs to be running in the shell. Process Explorer in windows can also help in generating thread dumps but its much more problematic to get all the thread stacks in one shot. One has to wade through each thread and gets its stack. Another thing to keep in mind is, JVM might ignore interrupts if they were started with -Xrs flag.
WebLogic Application Server allows capture of thread dumps via additional options:
But generating thread dumps via Weblogic specific commands is not recommended as the JVM itself might be sluggish or hung and never respond to higher app level commands (including weblogic). Issuing thread dumps via the WLS Admin Console requires the Admin server to be first up and healthy and also in communication with the target server.
Sometimes, even the OS level or JVM vendor specific tools might not be able to generate thread dumps. The causes can include memory thrashing (Out Of Memory - OOM) and thread deaths in the JVM, the system itself under tough constraints (memory/cpu) or the process is unhealthy and cannot really respond to anything in predictable way.
Now that we can capture thread dumps, what to observe? Its always recommended to take multiple thread dumps at close intervals (5 or 6 dumps at 10-15 seconds intervals). Why? A thread dump is a snapshot of threads in execution or various states. Taking multiple thread dumps allows us to peek into the threads as they continue execution.
Compare threads based on thread id or name across successive thread dumps to check for change in execution. Observe change in thread execution across thread dumps. Change in the thread state and or stack content for a given thread across successive thread dumps implies there is progress while no change or absence of progress indicates either a warning condition or something benign.
So, what if a given thread is not showing progress between thread dumps? Its possible the thread is doing nothing and so has no change between dumps. Or its possible its waiting for an event to happen and continues in same state and stack appearance. These might be benign. Or it might be blocked for a lock and continues to languish in the same state as the owner of the lock is still holding on the same lock or someone else got the lock ahead of our target thread. Its good to figure out which of the conditions applies and rule out benign cases.
One can use the following as rule of thumb while analyzing Application Server side thread dumps. Some additional pointers for WebLogic Specific threads.
"schedulerFactoryBean_Worker-48" id=105 idx=0x1e4 tid=7392 prio=5 alive, waiting, native_blocked -- Waiting for notification on: org/quartz/simpl/SimpleThreadPool$WorkerThread@0xd5ddc168[fat lock] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) at java/lang/Object.wait(J)V(Native Method) at org/quartz/simpl/SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:519) ^-- Lock released while waiting: org/quartz/simpl/SimpleThreadPool$WorkerThread@0xd5ddc168[fat lock] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace "Agent ServerConnection" id=17 idx=0xa0 tid=7154 prio=5 alive, sleeping, native_waiting, daemon at java/lang/Thread.sleep(J)V(Native Method) at com/wily/introscope/agent/connection/ConnectionThread.sendData(ConnectionThread.java:312) at com/wily/introscope/agent/connection/ConnectionThread.run(ConnectionThread.java:65) at java/lang/Thread.run(Thread.java:662) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace
"DynamicListenThread[Default]" id=119 idx=0x218 tid=7465 prio=9 alive, in native, daemon at sun/nio/ch/ServerSocketChannelImpl.accept0(Ljava/io/FileDescriptor;Ljava/io/FileDescriptor;[Ljava/net/InetSocketAddress;)I(Native Method) at sun/nio/ch/ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145) ^-- Holding lock: java/lang/Object@0xbdc08428[thin lock] at weblogic/socket/WeblogicServerSocket.accept (WeblogicServerSocket.java:30) at weblogic/server/channels/DynamicListenThread$SocketAccepter.accept(DynamicListenThread.java:535) at weblogic/server/channels/DynamicListenThread$SocketAccepter.access$200(DynamicListenThread.java:417) at weblogic/server/channels/DynamicListenThread.run(DynamicListenThread.java:173) at java/lang/Thread.run(Thread.java:662)
If its WebLogic specific thread dump,
"[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" id=71 idx=0x15c tid=10158 prio=9 alive, waiting, native_blocked, daemon -- Waiting for notification on: weblogic/work/ExecuteThread@0x1755df6d8[fat lock] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) at jrockit/vm/Locks.wait(Locks.java:1973)[inlined] at java/lang/Object.wait(Object.java:485)[inlined] at weblogic/work/ ExecuteThread.waitForRequest(ExecuteThread.java:160)[optimized] ^-- Lock released while waiting: weblogic/work/ExecuteThread@0x1755df6d8[fat lock] at weblogic/work/ExecuteThread.run(ExecuteThread.java:181) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace
"ExecuteThread: '4' for queue: 'weblogic.socket.Muxer'" id=31 idx=0xc8 tid=8777 prio=5 alive, blocked, native_blocked, daemon -- Blocked trying to get lock: java/lang/String@0x17674d0c8[fat lock] at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method) at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized] at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized] at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized] at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized] at weblogic/socket/EPollSocket<b>Muxer.processSockets(EPollSocketMuxer.java:153) at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29) at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42) at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145) at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace ExecuteThread: '5' for queue: 'weblogic.socket.Muxer'" id=32 idx=0xcc tid=8778 prio=5 alive, native_blocked, daemon at jrockit/ext/epoll/EPoll.epollWait0(ILjava/nio/ByteBuffer;II)I(Native Method) at jrockit/ext/epoll/EPoll.epollWait(EPoll.java:115)[optimized] at weblogic/socket/EPollSocket/Muxer.processSockets(EPollSocketMuxer.java:156) ^-- Holding lock: java/lang/String@0x17674d0c8 [fat lock] at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29) at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42) at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145) at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace
WebLogic tags threads internally as Active and Standby based on its internal workmanager and thread pooling implementation. There is another level of tagging based on a thread getting returned to the thread pool or not within a defined time period. If the thread does not get returned within a specified duration its tagged as Hogging and if it exceeds even higher limits (default of 10 minutes), its termed as STUCK. It means the thread is involved in a very long activity which requires some manual inspection.
The java thread stack content in a thread dump are always in text/ascii format enabling a quick read without any tools. Almost invariably, the thread stack is in English which lets the user read the package, classname and method call. These three apsects help us understand the general overview of the call the thread is executing, who invoked it and what it is invoking next thereby establishing the caller-callee chain as well as what is getting invoked. The method name gives us a general idea of what the invocation is about. All of these data points can show what is getting executed by the thread even if we haven't written the code or truly understand the implementation.
"[ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.rjvm.ResponseImpl@1b1d86 WAITING java.lang.Object.wait(Native Method) weblogic.rjvm.ResponseImpl.waitForData</b>(ResponseImpl.java:84) weblogic.rjvm.ResponseImpl.getTxContext(ResponseImpl.java:115) weblogic.rjvm.BasicOutboundRequest.sendReceive(BasicOutboundRequest.java:109) weblogic.rmi.internal.BasicRemoteRef.invoke(BasicRemoteRef.java:223) javax.management.remote.rmi.RMIConnectionImpl_1001_WLStub.getAttribute(Unknown Source) weblogic.management.remote.common.RMIConnectionWrapper$11.run(ClientProviderBase.java:531)…… weblogic.management.remote.common.RMIConnectionWrapper.getAttribute(ClientProviderBase.java:529) javax.management.remote.rmi.RMIConnectorRemoteMBeanServerConnection.getAttribute(RMIConnector.java:857) weblogic.management.mbeanservers.domainruntime.internal.ManagedMBeanServerConnection.getAttribute(ManagedMBeanServerConnection.java:281 weblogic.management.mbeanservers.domainruntime.internal.FederatedMBeanServerInterceptor.getAttribute(FederatedMBeanServerInterceptor.java:227 weblogic.management.jmx.mbeanserver.WLSMBeanServerInterceptorBase.getAttribute(WLSMBeanServerInterceptorBase.java:116) weblogic.management.jmx.mbeanserver.WLSMBeanServerInterceptorBase.getAttribute(WLSMBeanServerInterceptorBase.java:116)…… weblogic.management.jmx.mbeanserver.WLSMBeanServer.getAttribute(WLSMBeanServer.java:269) javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1387) javax.management.remote.rmi.RMIConnectionImpl.access$100(RMIConnectionImpl.java:81)…… weblogic.rmi.internal.ServerRequest.sendReceive(ServerRequest.java:174) weblogic.rmi.internal.BasicRemoteRef.invoke(BasicRemoteRef.java:223) javax.management.remote.rmi.RMIConnectionImpl_1001_WLStub.getAttribute(Unknown Source) javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:857) javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:175) weblogic.management.jmx.MBeanServerInvocationHandler.doInvoke(MBeanServerInvocationHandler.java:504) weblogic.management.jmx.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:380) $Proxy61.getName(Unknown Source) com.bea.console.utils.DeploymentUtils.getAggregatedState(DeploymentUtils.java:507) com.bea.console.utils.DeploymentUtils.getApplicationStatusString(DeploymentUtils.java:2042) com.bea.console.actions.app.DeploymentsControlTableAction.getCollection(DeploymentsControlTableAction.java:181)
In the above stack trace, we can see the thread belong to the "weblogic.kernel.Default (self-tuning)" group with Thread Name as "ExecuteThread: '4'" and is waiting for a notification on a lock. Some WLS Specific observations based on the stack:
ACTIVE ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aabc9820800 nid=0x7a7d runnable 0x000000004290d000 java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185) at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:202)- locked <0x00002aab7b441ee0> (a java.lang.Object) at sun.security.provider.NativePRNG$RandomIO.access$300(NativePRNG.java:108)at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:102) at java.security.SecureRandom.generateSeed(SecureRandom.java:495) at com.bea.security.utils.random.AbstractRandomData.ensureInittedAndSeeded(AbstractRandomData.java:83) - locked <0x00002aab6da12720> (a com.bea.security.utils.random.SecureRandomData) at com.bea.security.utils.random.AbstractRandomData.getRandomBytes(AbstractRandomData.java:97) - locked <0x00002aab6da12720> (a com.bea.security.utils.random.SecureRandomData) at com.bea.security.utils.random.AbstractRandomData.getRandomBytes(AbstractRandomData.java:92) at weblogic.management.servlet.ConnectionSigner.signConnection(ConnectionSigner.java:132) - locked <0x00002aaab0611688> (a java.lang.Class for weblogic.management.servlet.ConnectionSigner) at weblogic.ldap.EmbeddedLDAP.getInitialReplicaFromAdminServer(EmbeddedLDAP.java:1268) at weblogic.ldap.<b>EmbeddedLDAP.start(EmbeddedLDAP.java:221) at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
In the above thread stack, it shows WebLogic LDAP is just starting up and is attempting to generate a Random Seed Number using the entropy of the file system. This thread can stay in the same state if there is not much activity in the file system and the server might appear hung at startup.• Solution would be to add -Djava.security.egd=file:/dev/./urandom to java command line arguments to choose a urandom entropy scheme for random number generation for Linux.
Note: this is not advisable to be used for Production systems as the random number is not that truly random.
"[ACTIVE] ExecuteThread: '32' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x0000002c5ede5000 nid=0x2ca6 waiting for monitor entry [0x000000006a6cc000] java.lang.Thread.State: BLOCKED (on object monitor) at weblogic.messaging.kernel.internal.QueueImpl.addReader(QueueImpl.java:1082) - waiting to lock <0x0000002ac75d1b80> (a weblogic.messaging.kernel.internal.QueueImpl) at weblogic.messaging.kernel.internal.ReceiveRequestImpl.start(ReceiveRequestImpl.java:178) at weblogic.messaging.kernel.internal.ReceiveRequestImpl.<init>(ReceiveRequestImpl.java:86) at weblogic.messaging.kernel.internal.QueueImpl.receive(QueueImpl.java:841) at weblogic.jms.backend.BEConsumerImpl.blockingReceiveStart(BEConsumerImpl.java:1308) at weblogic.jms.backend.BEConsumerImpl.receive(BEConsumerImpl.java:1514) at weblogic.jms.backend.BEConsumerImpl.invoke(BEConsumerImpl.java:1224) at weblogic.messaging.dispatcher.Request.wrappedFiniteStateMachine(Request.java:961) at weblogic.messaging.dispatcher.DispatcherServerRef.invoke(DispatcherServerRef.java:276) at weblogic.messaging.dispatcher.DispatcherServerRef.handleRequest(DispatcherServerRef.java:141) at weblogic.messaging.dispatcher.DispatcherServerRef.access$000(DispatcherServerRef.java:34) at weblogic.messaging.dispatcher.DispatcherServerRef$2.run(DispatcherServerRef.java:111) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
This thread stack shows WLS adding a JMS Receiver to a Queue.
Following thread stack shows Oracle Service Bus (OSB) Http Proxy service blocked for response from an outbound web service callout and got tagged as STUCK as there was no progress for more than 10 minutes.
[STUCK] ExecuteThread: '53' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, waiting, priority=1, DAEMON&> -- Waiting for notification on: java.lang.Object@7ec6c98[fat lock] java.lang.Object.wait(Object.java:485) com.bea.wli.sb.pipeline.PipelineContextImpl$SynchronousListener.waitForResponse() com.bea.wli.sb.pipeline.PipelineContextImpl.dispatchSync() stages.transform.runtime.WsCalloutRuntimeStep$WsCalloutDispatcher.dispatch() stages.transform.runtime.WsCalloutRuntimeStep.processMessage() com.bea.wli.sb.pipeline.StatisticUpdaterRuntimeStep.processMessage() com.bea.wli.sb.stages.StageMetadataImpl$WrapperRuntimeStep.processMessage() com.bea.wli.sb.stages.impl.SequenceRuntimeStep.processMessage() com.bea.wli.sb.pipeline.PipelineStage.processMessage() com.bea.wli.sb.pipeline.PipelineContextImpl.execute() com.bea.wli.sb.pipeline.Pipeline.processMessage() com.bea.wli.sb.pipeline.PipelineContextImpl.execute() com.bea.wli.sb.pipeline.PipelineNode.doRequest() com.bea.wli.sb.pipeline.Node.processMessage() com.bea.wli.sb.pipeline.PipelineContextImpl.execute() com.bea.wli.sb.pipeline.Router.processMessage() com.bea.wli.sb.pipeline.MessageProcessor.processRequest() com.bea.wli.sb.pipeline.RouterManager$1.run() com.bea.wli.sb.pipeline.RouterManager$1.run() weblogic.security.acl.internal.AuthenticatedSubject.doAs() weblogic.security.service.SecurityManager.runAs() com.bea.wli.sb.security.WLSSecurityContextService.runAs() com.bea.wli.sb.pipeline.RouterManager.processMessage() com.bea.wli.sb.transports.TransportManagerImpl.receiveMessage() com.bea.wli.sb.transports.http.HttpTransportServlet$RequestHelper$1.run()
What constitutes a Hot Spot? Hot spot is a set of method calls that gets repeatedly invoked by busy running threads (not idle or dormant threads). It can be as simple healthy pattern like multiple threads executing requests to the same set of servlet or application layer or it can be a symptom of bottleneck if multiple threads are blocked waiting for the same lock.
Attempt to capture thread dumps and identify hot spots only after the server has been subjected to some decent loads by which time most of the code have been initialized and executed a few times.
For normal hot spots, try to identify possible optimizations in the code execution.
It might be that the thread is attempting to do a repeat look of JNDI resources or recreation of a resource like JMS Producer/consumer or InitialContext to a remote server.
Cache the JNDI resources, Context and producer/consumer objects. Pool the producers/consumers if possible, reuse the jms connections. Avoid repeated lookups. It can be as simple as resolving an address by doing InetAddress.getAllByName(), cache the results instead of repeat resolutions. Same way, cache XML Parsers or handlers to avoid repeat classloading of xml parsers/factories/handlers/implementations. While caching, ensure to avoid memory leaks if the instances can grow continuously.
As Goldfinger says to James Bond in "Goldfinger", Once is happenstance, Twice is coincidence and the third time is enemy action when it comes to threads blocked for same lock. Start navigating the chain of the locks, and try to identify the dependencies between the threads and the nature of locking and try to resolve them as suggested in Reducing Locks Section 1 of this series.
This entire exercise can be a bit like pealing onions as one layer of bottleneck might be masking or temporarily resolving a different bottleneck. It requires repetition of analyzing, fixing, testing and analyzing.
Sometimes a thread blocking others from obtaining a lock might itself be on blocked list for yet another lock. In those cases, one should navigate the chain to identify who and what is blocking it. If there are deadlocks (circular dependency of locks), the JVM might report the deadlock automatically. But its best practice to navigate and understand the chains as that will help in resolving and optimizing the behavior for the better.
Let take the following thread stack. The ExecuteThread: '58' is waiting to obtain lock on the java/util/TreeSet@0x4bba82c0
"[STUCK] ExecuteThread: '58' for queue: 'weblogic.kernel.Default (self-tuning)'" id=131807 idx=0x90 tid=927 prio=1 alive, blocked, native_blocked, daemon -- Blocked trying to get lock: java/util/TreeSet@0x4bba82c0[thin lock] at jrockit/vm/Locks.monitorEnter(Locks.java:2170)[optimized] at weblogic/common/resourcepool/ResourcePoolImpl$Group.getCheckRecord(ResourcePoolImpl.java:2369) at weblogic/common/resourcepool/ResourcePoolImpl$Group.checkHang(ResourcePoolImpl.java:2463) at weblogic/common/resourcepool/ResourcePoolImpl$Group.access$100(ResourcePoolImpl.java:2210) ........... at weblogic/common/resourcepool/ResourcePoolImpl.reserveResourceInternal(ResourcePoolImpl.java:450) at weblogic/common/resourcepool/ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:329) at weblogic/jdbc/common/internal/ConnectionPool.reserve(ConnectionPool.java:417) at weblogic/jdbc/common/internal/ConnectionPool.reserve(ConnectionPool.java:324) at weblogic/jdbc/common/internal/MultiPool.searchLoadBalance(MultiPool.java:312) at weblogic/jdbc/common/internal/MultiPool.findPool(MultiPool.java:180) at weblogic/jdbc/common/internal/ConnectionPoolManager.reserve(ConnectionPoolManager.java:89)
Search for the matching lock id among the rest of the threads. The lock is held by ExecuteThread '26' which itself is blocked for oracle/jdbc/driver/T4CConnection@0xf71f6938
"[STUCK] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'" id=1495 idx=0x354 tid=7858 prio=1 alive, blocked, native_blocked, daemon -- Blocked trying to get lock: oracle/jdbc/driver/T4CConnection@0xf71f6938[thin lock] at jrockit/vm/Locks.monitorEnter(Locks.java:2170)[optimized] at oracle/jdbc/driver/OracleStatement.close(OracleStatement.java:1785) at oracle/jdbc/driver/OracleStatementWrapper.close(OracleStatementWrapper.java:83) at oracle/jdbc/driver/OraclePreparedStatementWrapper.close(OraclePreparedStatementWrapper.java:80) at weblogic/jdbc/common/internal/ConnectionEnv.initializeTest(ConnectionEnv.java:940) at weblogic/jdbc/common/internal/ConnectionEnv.destroyForFlush(ConnectionEnv.java:529) ^-- Holding lock: weblogic/jdbc/common/internal/ConnectionEnv@0xf71f6798[recursive] at weblogic/jdbc/common/internal/ConnectionEnv.destroy(ConnectionEnv.java:507) ^-- Holding lock: weblogic/jdbc/common/internal/ConnectionEnv@0xf71f6798[biased lock] at weblogic/common/resourcepool/ResourcePoolImpl.destroyResource(ResourcePoolImpl.java:1802) at weblogic/common/resourcepool/ResourcePoolImpl.access$500(ResourcePoolImpl.java:41) at weblogic/common/resourcepool/ResourcePoolImpl$Group.killAllConnectionsBeingTested(ResourcePoolImpl.java:2399) ^-- Holding lock: java/util/TreeSet@0x4bba82c0</b>[thin lock] at weblogic/common/resourcepool/ResourcePoolImpl$Group.destroyIdleResources(ResourcePoolImpl.java:2267) ^-- Holding lock: weblogic/jdbc/common/internal/GenericConnectionPool@0x5da625a8[thin lock] at weblogic/common/resourcepool/ResourcePoolImpl$Group.checkHang(ResourcePoolImpl.java:2475) at weblogic/common/resourcepool/ResourcePoolImpl$Group.access$100(ResourcePoolImpl.java:2210) at weblogic/common/resourcepool/ResourcePoolImpl.checkResource(ResourcePoolImpl.java:1677) at weblogic/common/resourcepool/ResourcePoolImpl.checkResource(ResourcePoolImpl.java:1662) at weblogic/common/resourcepool/ResourcePoolImpl.makeResources(ResourcePoolImpl.java:1268) at weblogic/common/resourcepool/ResourcePoolImpl.makeResources(ResourcePoolImpl.java:1166)
The oracle/jdbc/driver/T4CConnection@0xf71f6938 lock is held by yet another thread MDSPollingThread-owsm which is attempting to test the health of its jdbc connection (that it reserved from the datasource connection pool) by invoking some basic test on the database. The socket read from the DB appears to the one is either slow or every other thread kept getting the lock except for ExecuteThread 58 & 26 as both appear STUCK in the same position for 10 minutes or longer. Having multiple thread dumps will help confirm if the same threads are stuck in the same position or there was change in ownership of locks and these threads were just so unlucky in getting ownership of the locks.
"MDSPollingThread-[owsm, jdbc/mds/owsm]" id=94 idx=0x150 tid=2993 prio=5 alive, in native, daemon at jrockit/net/SocketNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BIII)I(Native Method) at jrockit/net/SocketNativeIO.socketRead(SocketNativeIO.java:32) at java/net/<b>SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I(SocketInputStream.java) at java/net/SocketInputStream.read(SocketInputStream.java:129) at oracle/net/nt/MetricsEnabledInputStream.read(TcpNTAdapter.java:718) at oracle/net/ns/Packet.receive(Packet.java:295) at oracle/net/ns/DataPacket.receive(DataPacket.java:106) at oracle/net/ns/NetInputStream.getNextPacket(NetInputStream.java:317) at oracle/net/ns/NetInputStream.read(NetInputStream.java:104) at oracle/jdbc/driver/T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:126) at oracle/jdbc/driver/T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:82) at oracle/jdbc/driver/T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1177) at oracle/jdbc/driver/T4CTTIfun.receive(T4CTTIfun.java:312) at oracle/jdbc/driver/T4CTTIfun.doRPC(T4CTTIfun.java:204) at oracle/jdbc/driver/T4C8Oall.doOALL(T4C8Oall.java:540) at oracle/jdbc/driver/T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1079) at oracle/jdbc/driver/OracleStatement.doExecuteWithTimeout(OracleStatement.java:1419) at oracle/jdbc/driver/OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3752) at oracle/jdbc/driver/OraclePreparedStatement.execute(OraclePreparedStatement.java:3937) ^-- Holding lock: oracle/jdbc/driver/T4CConnection@0xf71f6938[thin lock] at oracle/jdbc/driver/OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1535) at weblogic/jdbc/common/internal/<b>ConnectionEnv.testInternal(ConnectionEnv.java:873) at weblogic/jdbc/common/internal/ConnectionEnv.test(ConnectionEnv.java:541) at weblogic/common/resourcepool/ResourcePoolImpl.testResource(ResourcePoolImpl.java:2198)
The underlying cause for the whole hang was due to firewall dropping the socket connections between the server and database and leading to hung socket reads.
If the application uses java Concurrent Locks, it might be harder to identify who is holding a lock as there can be multiple threads waiting for notification on the lock but no obvious holder of lock. Sometimes the JVM itself report the lock chain. If not, enable -XX:+PrintConcurrentLocks in the java command line to get details on Concurrent Locks.
The overall health of JVM is based on the interaction of threads, resources, partners/systems/actors along with the OS, GC policy and hardware.
This section of the series discussed how to analyze thread dumps and navigate lock chains as well as some tips and pointers on what to look for and optimize. The next section will go into tools that can help automate thread dump analysis, some real world examples and limitations of thread dumps.