Java Tuning in a Nutshell – Part 1

While delivering a training recently, I got a request to put together a JVM tuning cheat sheet. Given the 50+ parameters available on the Sun hotspot, this request is understandable. The diagram below is what I came up with. I’ve tried to narrow down the most important flags that will solve 80% of JVM performance needs with 20% of the tuning effort. This article assumes basic JVM tuning knowledge – the different generations used in the Sun hotspot JVM, different garbage collection algorithms available, etc. Although this is intended primarily for enterprise grade Oracle Fusion Middleware products, it applies to most server JVM’s with large heaps and hosted on server class, multi-core machines. This is not an exhaustive list, only low hanging fruit. In fact, many JDK1.6 users need no tuning at all – the JVM picks good defaults and ergonomics does a decent job. Follow this only if the default behavior is not good enough (for instance, frequent garbage collections, low throughput, long GC pauses, etc). In my experience, a non-trivial production topology with Oracle Fusion Middleware products often requires this level of tuning. This includes Oracle WebLogic Server (JavaEE apps), Oracle Coherence, Oracle Service Bus, Oracle SOA Suite, BPM, AIA and other enterprise FMW apps running on the Sun hotspot JVM. I’ve used a mind map below to help visualize the relationship and dependencies between various JVM tuning flags. In the diagram, the flags in black are the ones to try first; the ones in gray are optional; anything not covered here can be ignored!

I’ve categorized the flags into 4 groups:

  1. 1. Garbage collection (GC): The garbage collection algorithm is one of the two mandatory tunables for java performance tuning. Start with UseParallelOldGC. If GC pauses are not acceptable, switch to UseConcMarkSweepGC (prioritizes low application pause times at the cost of raw application throughput). Specify parameter ParallelGCThreads to limit GC threads (yes limit, the default is usually too high for multiple Weblogic servers sharing a large, multi-core machine). Recommendations for values and other flags will be covered later.
  2. 2. Heap tuningThis is the other mandatory tunable. I’m using ‘heap’ as an umbrella term for all Java memory spaces. Technically, Perm and Stack are not part of the java heap in Sun hotspot. Required flags in my tuning exercise are total heap size (XmxXms), young generation size (Xmn) and permanent generation size (PermSizeMaxPermSize). Xss tuning is optional. I only use it when tuning on a 32-bit heap-constrained JVM; reducing Xss only to squeeze memory out from native space so more is available for Xmx. In any case, never set Xss below 128k for Fusion Middleware (default is usually 512k to 1m depending on OS).
  3. 3. Logging: GC logging is mandatory only for the duration of the tuning exercise itself. However, due to its low overhead (typically only one line written per collection, which itself is relatively infrequent), it is highly recommended for production as well. Otherwise, you will not be able to make an educated tuning decision if/when things don’t work as expected.
  4. 4. (Optional) Other Performance: These are only used for fine tuning when performance is the driver for the tuning exercise. Even then, try these only after GC and heap are well tuned to begin with.

The primary requirement that warrants JVM tuning in production Oracle Fusion Middleware is not performance, rather unacceptable GC pauses. The cultprit almost always is a Full GC that causes long application pause. Symptoms include temporarily unresponsive servers, client session timeouts, etc. If you’re capturing GC logs using the flags in the diagram, a search for “Full GC” will show how many, how frequent and how long Full GC’s took. Following the tunables in the diagram above, this is how you can solve the problem (I have highlighted the parameters to match those in the diagram):

Heap not sized correctly, causing Full GC’s

1. -Xmx should be equal to -Xms Growing from Xms to Xmx requires Full GC’s to resize the heap. Set these to the same value if Full GC’s are to be completely eliminated in production.

2. –XX:PermSize should be equal to –XX:MaxPermSize. Both params need to be specified and should have the same value. Otherwise, a full GC is required for each Perm Gen resize while it grows up to MaxPermSize

3. –XX:NewSize is specified but not equal to –XX:MaxNewSize. Like the other heap params, resize of new/young gen requires a Full GC. The preferred approach is to avoid these two parameters and use -Xmn instead. This eliminates the problem as setting, say “-Xmn1g”, is the same as setting “-XX:NewSize=1g -XX:MaxNewSize=1g“.

4. –XX:SurvivorRatio is specified but –XX:-UseAdaptiveSizePolicy is not. The SurvivorRatio       specified will not stick if AdaptiveSizePolicy is in effect. By default, the JVM adapts and overrides the value you specified based on runtime heuristics. Use this parameter to disable adaptive sizing of generations (notice the ‘minus’ sign preceding UseAdaptiveSizePolicy).

Concurrent Mark Sweep GC not tuned correctly

–XX:+UseConcMarkSweepGC is almost always used when there is a strict latency requirement or Service Level Agreement (SLA) and long GC pauses are unacceptable. That is, avoid Full GC’s at all cost. However there are many reasons why Full GC’s could still occur:

1. Although UseConcMarkSweepGC is specified, CMS can and often will kick in too late, causing a Full GC when it can’t catch up. In other words, although CMS is collecting garbage, the application threads that are executing concurrently run out of heap for allocation because CMS couldn’t free garbage soon enough. At this point, the JVM stops all application threads and does a Full GC. This is also called a “concurrent mode failure” in GC logs. The reason for concurrent mode failure – the JVM dynamically finds a value for when CMS should be initiated and changes this value based on statistics. However, in production, load is often bursty which leads to misses/miscalculation for the last dynamically computed initiation value. To prevent this, provide a static value for CMSInitiation. Use –XX:CMSInitiatingOccupancyFraction (as percentage of total heap) to tell the JVM what point it should initiate CMS. A value between 40 to 70 usually works for most Fusion middleware products. Start with the higher value (70) and tune down only if you still see the string “concurrent mode failure” in GC logs.

2. Secondly, always specify –XX:+UseCMSInitiatingOccupancyOnly when CMSInitiatingOccupancyFraction is used, otherwise the value you specify does not stick (JVM will dynamically change it on the fly again). This is very important and commonly missed.

Parallel GC not tuned correctly

I frequently see -XX:+UseParallelGC used instead of (or without) –XX:+UseParallelOldGC. UseParallelOldGC does old gen collection in parallel unlike UseParallelGC. In both cases, young gen (minor) collections are still parallel. By having multiple threads do old gen collection, the overall Full GC pause can be reduced. If no GC params are specified, UseParallelGC is usually the default (this may have changed in later versions of JDK6), so it is safe to always specify UseParallelOldGC when throughput is the goal.

Rarely, no matter how well you tune your JVM, the heap gets backed up eventually and results in back-to-back Full GC’s (again, use GC logs to guide you). If this is the case, there is a possibility that your code has introduced a memory/reference leak. To confirm, take a few heap dumps and compare them to see if any particular object count is growing with time, even after GC completes. Again, this is very rare so make sure you do your due diligence with JVM tuning first.

Add Your Comment