A quick performance tuning hint for high speed Exalogic SOA performance

This is a very quick observation on a simple performance tuning fix for SOA Suite on Exalogic.

The problem

SOA Suite appears to grind to a halt when a load is imposed upon it, when running on Exalogic. CPU may or may not be spiked on SOA at this point in time. SOA may become completely unresponsive, or just be very slow. You may see 504 gateway timeout errors, servers in doubt in the admin server screen, or other symptoms of a “barely responding” SOA Suite system.

The resolution

Turn ON “Always use keep-alive” for the origin server in Oracle Traffic Director (this defaults to OFF). Information on how to enable it at http://docs.oracle.com/cd/E23389_01/doc.11116/e21036/perf014.htm.

Note This setting is hidden at the bottom of the advanced settings list for a ‘route’ on recent OTD installations.

The details

The recommended architecture for SOA Suite on Exalogic is to use Oracle Traffic Director to route webservice callouts between SOA Composites. This allows for broad load distribution as well as some resiliency against failure. It also provides the capability to use Infiniband class connection speeds and latency between SOA Composites, which is not a bad thing.

Oracle Traffic Director, as detailed in the documentation above, defaults to NOT using HTTP Keep Alive for PUT and POST requests to it’s origin servers. For SOA on Exalogic, the SOA Suite servers are considered origin servers.

SOA Suite requests are 99% POST requests. This means that every request between composites (if you are using the recommended setup) will generate a NEW HTTP connection at the remote SOA Server. This causes a build up of “stale” connections, which are slow to garbage collect for technical reasons. Eventually, with sufficient load, the pile up will be so great that SOA ends up stuck in a Garbage collection loop and will either throw OutOfMemoryError or slow to a crawl. By changing the Oracle Traffic Director setting to true, Oracle Traffic Director will cache and reuse the same HTTP connection for multiple requests (it actually has a small pool of cached connections), the build up won’t happen, and performance will dramatically improve as a result.

Add Your Comment