Because Coherence cache servers are built on top of Java, they must deal with the heap size limitations that arise from garbage collection (GC). This is often at odds with the intended use case for Coherence, which is to manage large amounts of data in memory. In order to fully utilize the available memory on server machines while minimizing the number of JVMs required, we want to use the largest heap sizes that are possible without running into garbage collection problems (excessive pauses and/or CPU utilization). In practice, with modern JVMs (Java 1.6 or higher), heap sizes of 4-8GB are most common. 16GB heaps are much less common, and 32GB heaps are very rare. Thus, our focus is on basic tuning of JVMs with 4-8GB heaps. Similar considerations will also apply to cache clients that have large near caches (in fact, GC overhead for near caches will often be higher than for cache servers due to differences in how objects are stored in memory).
This article focuses on the HotSpot JVM as it is the most common JVM used for Coherence deployments. We’re not focused on getting optimal performance out of the JVM, but rather identifying a simple set of parameters that produce acceptable performance across a broad range of scenarios. Advanced JVM tuning (and especially version-specific tuning) is another area of expertise.
The most critical JVM options (highlighted in the Coherence production checklist) are the -server flag and making sure that the initial (-Xms) and maximum (-Xmx) heap sizes are set to the same values. But there are other options that can make a big difference.
-XX:+UseConcMarkSweepGC or -XX:UseParallelGC
The CMS collector is a popular option for reducing “stop the world” GC pauses. This feature, by itself, will often suffice if the only objective is to eliminate GC pauses. However, it also tends to consume a significant amount of processor resources so it will likely reduce overall system throughput. On top of this, there are often complex relationships between latency (which would be worsened by GC pauses) and throughput, so some experimentation may be required.
Using the Parallel collector instead of CMS will reduce processor use at the cost of some moderate GC pauses. With modern hardware, these pauses are usually acceptably short and infrequent for user-facing applications with heaps of 4-8GB.
The new ratio specifies how much of the heap is used for new objects versus tenured objects. Coherence tends to hold onto objects for considerably longer than would be typical in a Java environment, and as such the default ratio is typically fairly low for a Coherence workload. The optimal setting will require some testing (with fully populated Coherence caches), but values in the range of 4-10 will usually be a better fit for cache servers (a slightly lower range will be used for cache clients, depending on the size and usage patterns of the near caches).
Use a multi-threaded collector for the young generation.
There are other options that we occasionally see used (e.g. -XX:+UseLargePages, -XX:SurvivorRatio=6, -XX:MaxPermSize=128m, -XX:+CMSParallelRemarkEnabled, -XX:ParallelGCThreads=4) but most of the time using the handful of options above will suffice. In general, tuning the memory management features of the JVM is very vendor-specific and version-specific, not to mention hardware-specific and OS-specific. And of course there are occasionally entire new collectors (such as G1, which is still fairly infrequently used in Coherence installations). There are also occasionally incompatibilities between these settings (e.g. NewRatio being ignored when used with CMS on certain JVM versions). It is generally best to stick to the defaults as much as possible to avoid surprises, and only specify non-default behavior when there is a clear benefit. This also reduces dependencies on a specific JVM offering or version, making ongoing maintenance simpler and less risky.