The Ultimate Apache/OHS11g Tuning Guide for OAM11g WebGate

Introduction

OK, maybe “Ultimate” could be stretching it, but it caught your eye so you can be the judge. This post is part of a larger series on Oracle Access Manager 11g called Oracle Access Manager Academy. An index to the entire series with links to each of the separate posts is available here. Though OAM11g is in the title, this article should also apply to OAM12c(12.2.1.3.0.) since it still uses 11g Webgates.

In my previous post OAM 11g Webgate Tuning, I covered a number of Webgate parameters and some Apache/OHS directives to help understand some ideas on how to tune the Webgate. In this post I expand on a mystery of how Apache/OHS works with the Webgate plugin, and enlighten you on the best way to tune Apache/OHS and Webgate together so that it will not only significantly improve performance and throughput, but potentially prevent big bottlenecks and mitigate the infamous Webgate error “The WebGate plug-in is unable to contact any Access Servers.”  Yes, bad OHS/Apache tuning can actually be a culprit for the error most people immediately jump to the conclusion the problem is a firewall, network, or even Access Server problems.

This article is only intended for Apache/OHS running in Worker-mode, which is the default for Apache 2.2+, OHS11g, and OHS12c. Worker-mode uses threads to serve requests, so it is able to serve a large number of request using fewer resources. Worker-mode also provides more stability over Pre-fork by keeping multiple processes available, each with many threads.  There is an initail httpd.worker process that is created called the parent.  The parent is responsible for launching a child process; more on this later.  In regard to Prefork-mode, the only reason you would use this mode is for compatiblity of some sort, but when working with the Webgate plug-in I recommend Worker-mode.  Below is an example comparing Prefork and Worker.

 

Apache Prefork vs Worker Mode

 

 

 

 

Primer on Key Tuning OHS Directives and Webgate Parameters

Before diving into tuning, it is important to cover the basics on the directives and parameters I will talk about for Apache/OHS and Webgate respectively so you can understand what each are responsible for in their respective component. To start, the following are short explanations for each of the Apache/OHS directives covered in this article for the purposes of Apache/OHS tuning.

Apache/OHS Directives –

 

  • MaxClients –
    This parameter limits how many simultaneous requests Apache/OHS will concurrently accept. In Apache specifically, if this parameter is not set, there is a default hard limit of 256 unless Apache was compiled with a different value. One approach to determining what value to use for MaxClients to use the following formula, but this would only tell you the max number of MaxClients that could physically be used on a web server.
         MaxClients ≈ (RAM – size_all_other_processes)/(size_apache_process)
    However, another reason to set MaxClients is to a limit the incoming requests based on maybe an application has its own limits in order to reduce the number of incomig requests forwarded so the application is not overwhelmed. Once you determine your MaxClient, as a tip use the following formula to calculate ServerLimit and ThreadsPerChild; more on this later.
         MaxClients = ServerLimit * ThreadsPerChild
    See “TIP on Calculating ServerLimit” at the bottom of this article which can correlate to what you may be allowed to set your MaxClients value.  As a side note, in newer versions of Apache 2.4 and OHS12c (based on Apache 2.4), MaxClients has been replaced with MaxRequestWorkers. Going forward I will use MaxClients, so simply replace it with MaxRequestWorkers.
  • ServerLimit –
    This parameter sets a limit on how many httpd.worker processes can be spawned. There is always a parent httpd.worker that manages all the children, so if you run the command “ps –ef | grep ‘httpd\|PID’” the parent will be the smallest PID, the rest are children; see the illustration below.
    HTTPD PIDS
  • ThreadsPerChild –
    This parameter determines how many threads each httpd.worker child gets. In worker mode, the threads are what Apache/OHS uses to process HTTP requests and from the Webgate perspective these threads are used to check Webgate cache or send requests to the Access Server for authentication and authorization operations.

Now that I covered some important Apache/OHS directives, let’s go over some key Webgate parameters that relate to Apache/OHS tuning.  Though I am pointing out three Webgate parameters, in interest of the article not getting too long I will only focus on Max Connections later on.  The other parameters have already been covered pretty well in my previous OAM 11g Webgate Tuning article, but I still wanted to talk about them for a refresher and in the interest you check out the other article.

 

Webgate Parameters –

 

  • Max Connections –
    This parameter tells the Webgate how many OAP connections each httpd.worker process needs to open to an Access Server(s). Note that each httpd.worker process will open ALL OAP connections immediately as soon as it is created; there is no ramp up time. The connections are distributed across all primary Access Servers based on each Access Server setting for  Number of Connections (OAM11gPS2), or Max Connections (OAM11gPS3+).
  • AAA Timeout Threshold –
    This parameter determines how long a Webgate thread will wait in seconds for a response on a OAP connection from an Access Server before giving up. It is used as a TCP/IP timeout over the operating system TCP/IP timeout. If “-1” is used, the operating system network TCP/IP timeout will be used. If the response takes longer than this timeout, the Webgate will try another primary Access Server in the order of the list, and finally a secondary Access Server if that is configured. If the response takes twice as long as the timeout value, it will abandon the connection and try on a new connection.
  • User Define client_request_retry_attempts –
    This is a special User Defined parameter that tells the Webgate thread to try X number of times on the same OAP connection before giving up; the default is 1. For example if the AAA Timeout Threshold is set to 10, the Webgate thread try request on an OAP connection, wait about 10 seconds or so for a response, and if there is no answer it will try X number of times using the same connection before giving up. Sometimes this can be a better approach if the AAA Timeout Threshold is lower like 5 seconds, then the request does not have some extra overhead of closing the connection and trying for a new connection especially if there are a lot of threads sharing very few OAP connections.

Now that I have some of the more detailed boring information out of the way, let’s move on to the interesting things why you came to this blog.  You may want to read through the next sections a couple times so you can wrap your head around everything. I promise you this will really help your tuning approach to Apache/OHS servers running a Webgate.

 

How Apache/OHS and Webgate plug-in Work Together

Let’s start with the basics of how the Webgate plug-in works in conjunction with Apache/OHS. When a httpd.worker process is created, at a high level some things happen:

 

  1. 1. A number of OAP connections are immediately opened per the Webgate Max Connections
  2. 2. All OAP connections will be distributed across each primary Access Server per Number of Connections or Max Connections.
  3. 3. A number of threads are created per ThreadsPerChild

For example, using the following illustration there are two httpd.worker processes running, the Webgate Max Connection is set to 2, so each httpd.worker process will have 2 OAP connections open to the Access Server.  The number of OAP Connections will only change as connections expire, are torn down, or closed by the client, but the Webgate will be certain to keep the number of Max Connections open for each httpd.worker process, and it should be pretty consistent.  The illustration below is meant to show the logical architecture of the flow beween a Apache/OHS server and an Access Server.

Understanding how Apache and Webgate work together

 

Please see my previous article OAM 11g Webgate Tuning that goes deeper on how the number of OAP connections can be distributed across multiple primary Access Servers based each primary Access Server configuration values set in the Number of Connections (OAM11gPS2) or Max Connections (OAM11gPS3+).

 

Tuning OHS/Apache and the Webgate by Example

I find the easiest way to explain complex tuning like this by using an example.  I will take the approach to first explain a bad approach to tuning followed by a much better way; let’s start with the following illustration. The ServerLimit is set to 2, which means the maximum number of httpd.worker processes that Apache/OHS will create is 2. To be accurate, Apache/OHS will create a maximum ServerLimit of httpd.worker processes even when the maximum MaxClient number of http requests are hitting the web server.  As a side note, the directive StartServers will tell Apache/OHS how many httpd.worker processes to create on startup. If we focus on the 2 httpd.worker processes, you will see in the illustration there are 2 OAP connections for each httpd.worker process opened up to the Access Server.  This is because the Webgate Max Connections parameter value is set to the value of 2 .  The Apache/OHS server you will see at least 4 OAP connecions to each primary Access Server(s).

Apache Tuning that creates Webgate Contention

 

 

You can also see that ThreadsPerChild is set to 256, which means each httpd.worker child will have 256 threads.  Again, the threads are there to process http requests.  It is important to understand that each thread within a httpd.worker process has to share all the OAP connections.  In this case we have 2 OAP connections that are shared by 256 threads within each httpd.worker process, and each OAP connection can only be used by one thread at a time.  For example, if a thread gets some work to check credentials or an authorization, it must first check to see if an OAP connection is available, if so then grab it, send that request to the Access Server, wait for a response, and once it is good on the response, it will put that OAP connection back into the connection pool for another thread to use.

 

Immediately some concerns can be identified:

  1. 1. Limiting only 2 OAP connections for 256 threads may not be enough.
  2. 2. Limiting only 2 httpd.worker children for 512 MaxClient http requests can be a bottleneck

Some may say, providing 256 threads for each httpd.worker process allows a lot of work to get done out of each httpd.worker and reduces the amount of memory needed on the web server thereby getting a lot of bang for your buck if you will.  That sounds fair, but let’s see what happens.

 

Low HTTP Traffic has Little Impact on Users

In the next illustration HTTP traffic begins to increase though still reasonable and nothing to spur spikes or any peak load. The traffic is enough for Apache/OHS to create its maximum 2 httpd.worker proccesses per the ServerLimit. Since traffic is not heavy, and each thread has a quick response between the Webgate and Access Server no problems bubble up — so far so good.

 

Low HTTP Traffic had no impact of bad Apache and Webgate Turning

 

Peak Loads Increase HTTP Traffic and an Impact is Observed

Suddenly, HTTP traffic increases to peak load or there is a burst of heavy traffic, and all the 256 threads for each httpd.worker are engaged which means all 256 threads put a demand on the only 2 OAP connections. In the following illustration one of the httpd.worker processes has a thread (in red) that seems to be waiting for some time on a response from the Access Server. This is no surprise since all 256 threads have to share only 2 OAP connections. Unless each request is returned within milliseconds congestion can quickly happen as illustrated. I pointed out earlier that a OAP conection can only be utilized by one thread at a time. If there are hundreds of threads the httpd.process needs to juggle for only a few OAP connections that is a big problem. This approach to tuning in my opinion creates a recipe for disaster.

Heavy HTTP Traffic Immediately Bubbles-up Causing Congestion

In fact, in some scenarios it does not even take heavy HTTP traffic to quickly get threads piled up in a queue waiting on a OAP connection to do some work because all it could take is some delays in LDAP, network hiccups, etc. to quickly become a problem because you only have 2 OAP connections for all 256 threads.  Unfortunately, the impact is enhanced from the users perspective because they will quickly experience authentication and authorization problems. The operations team at first blush will think something is wrong with the Access Servers because some symptoms in the Webgate logs will churn up errors that say “The WebGate plug-in is unable to contact any Access Servers.” — I mentioned this in the introduction.  One reason you would see this error is because a thread will exceed the AAA Timeout Threshold waiting on a response from the Access Server, and as the thread queue piles up you will see many more of these errors as long as the AAA Timeout Threshold is exceeded.  Hopefully, this gives some basic ideas of problems when tuning very high ThreadsPerChild per httpd.worker process over a small number of Max Connections, and to add having to few httpd.worker processes works against better performance too.

 

A Better Approach to Tuning Apache/OHS for the Webgate

There is a better way! I will walk with you Grasshopper.  In this section I will walk you through various combinations of tuning parameter values lead you to balance the Yin and the Yang of the Apache/OHS and Webgate to bring harmony.

The methodology I took was by using JMeter (see Part 1: How To Load Test OAM11g using Apache JMeter) as a load testing tool agains a basic OAM11g and OHS11g setup.  I measured results from each test while systematically tweaking the Apache/OHS and Webgate values I mentioned earlier. I kept each load test consistently heavy by running thousands of logins and authorizations per minute, which applied a pretty decent amount of stress on the system. Just to make sure my results were not tainted, I would restart all the components so the tests would not be skewed by things like caching, memory, etc.  Each test was repeated with the same tuning changes after being restarted to be sure the results looked fairly consistent.

Let’s look at the first test run…

TEST RUN1 —

In RUN 1, I decided to keep MaxClients at 1024 and Max Connections to 8 for all load tests, while tweaking ServerLimit and ThreadsPerChild. After several iterative tests I discovered as the ServerLimit increased while ThreadsPerChild was reduced, I was able to get better throughput…at a point. Interestingly, as ServerLimit continued to be adjusted higher and ThreadsPerChild inversely reduced, performance began to degrade, authentication failures started, and errors started showing up in the Webgate logs. Referencing table RUN 1 below, TEST5, TEST6, TEST7, and TEST8 had the best results though TEST6 seem to provide the most optimum overall (Darker green is better, whereas red is worse).

TEST Table

TEST RUN2 —

In RUN 2, I kept MaxClients the same like in RUN 1 since this would keep things consistent across different types of testing. However, in TEST RUN2 I took the best combo of ServerLimit and ThreadsPerChild combinations from RUN 1, and only tweaked the Webgate Max Connection values. After running load tests and repeating them for good measure while systematically making changes to the Webgate Max Connections, the best results came from tests TEST5D, TEST6C, TEST7C, and TEST8A; refer to the table RUN 2 above with the columns highlighted in green…green is better.  I then filtered out the best combinations of ServerLimit, ThreadsPerChild, and MaxClients shown in the above table labeled TOP 4. My best result was from test TEST6C.  It not only had the best throughput, but best CPU, and zero errors, so TEST6C was the ultimate winner. Again, green is better like a Grasshopper.

 

In Summary of Balanced Apache/OHS and Webgate Tuning

I summarized the principals behind what I determined in the following graph. With MaxClients being equal, there is a sweet spot where ServerLimit, ThreadsPerChild, and Webgate Max Connections come into a perfect balance that result in the best performance, throughput, CPU, and least errors. The graph illustrates how the Throughput curve improves as ThreadsPerChild is lowered while ServerLimit and Webgate Max Connections are increased, but there is a point where of those same adjustments if continued in the same direction begin to show performance degradation. Basicall, the Yin and Yang forces are balanced tweaking ServerLimit, ThreadsPerChild, and Webgate Max Connections together.

Apache/OHS and Webgate Tuning Principals Graph

 

I am sure the burning question is, Master, what are the best tuning values for these parameters?  Because variables in a deployment such as load, types of authorization policies, various backend Identity Stores, number of Access Servers, number of web servers, network stability and responsiveness, load balancers, etc. can all have an impact on your tuning, the values I tell you to use may not be perfect for you.  That said, based on these principals on understanding how Apache/OHS and the Webgate plugin work together, how ThreadsPerChild, ServerLimit, and Webgate Max Connection can dramatically impact throughput and performance, I feel you should have a good starting point using the following recommendations:

 

  • Max Connections –
    Start with values from 8 to 16. If you have more ThreadsPerChild, the best approach is to add more connections by increasing the Max Connections, but increasing it to high will have a negative impact.
  • ThreadsPerChild –
    Start with values between 32 to 64. Increasing ThreadsPerChild beyond 64 seems more important for Apache/OHS Admin because they identify that as getting more work out of each httpd.worker process, but not so good for the Webgate plugin and going too high causes significant congestion.
  • ServerLimit –
    Once you determine your MaxClients, the formula I would start with is as follows:
         MaxClient / ThreadsPerChild = ServerLimit
    For example if MaxClients is 2048 and ThreadsPerChild is 64, ServerLimit would be 32 ( 2048 / 64 = 32 ).
  • TEST, TEST, TEST –
    Please use my numbers as a guideline, but you need, need, NEED, to use an environment that simulates production as close as possible and load test with 1x, 2x, and even 4x loads so that normal loads, peak loads, and even burst load conditions are simulated.
  • For Each Test, do the following –
    a. Adjust the three parameters base on my guidance, ThreadsPerChild, ServerLimit, and Max Connections
    b. Record CPU for Apache/OHS, Access Servers, and LDAP
    c. Record the parameter values and results of each test so you can compare.
    d. Look for Webgate “The WebGate plug-in is unable to contact any Access Servers.” errors in the oblog.logs and record the count after each test run. Adjust your tuning as needed to mitigate these errors.
  • Implement your final tuning values into production based on what you learned and monitor.

 

The next illustration shows an example of a more balanced approach to tuning Apache/OHS directives and the Webgate parameter to get better throughput and performance even at extreme peak loads.  Using 64 ThreadsPerChild with 8 Max Connections give a little more breathing room for all the threads on each httpd.worker process versus 256 threads competing for 2 OAP connections. Also increasing the ServerLimit to allow more httpd.worker processes may increase the amount of memory Apache/OHS need, but at least help the incoming http requests be more distributed across workers as loads suddenly peak or even burst.

Balanced Apache Tuning Improves Performance and Throughput

Using these exact values may not be your optimum, but it is a better start than the alternative I started with.

 

TIP on Calculating ServerLimit:

Before you settle on the ServerLimit, it would be a good idea to get an average on how much memory each httpd.worker process is using so that there is enough memory for Apache/OHS and other things so you don’t run into a lot of swapping.

Running the following Linux command on a OHS server during a test will output your httpd.worker processes and include some important details you can leverage like the RES column which gives you the resident memory that each httpd.worker process is using. If you are using Apache, you may need to change “httpd.worker” to “httpd”.

 

    top -c -p $(pgrep -d’,’ -f httpd.worker) -n 1

 

Top ouptut from command line

If you run the following Linux command on a OHS server during a load test (You can do a base load test to get this information) the average memory for each httpd.worker process is using can be then multiplied by your ServerLimit to get an idea of how much memory your Apache/OHS server will need for all httpd.worker processes when all have been created; consider it the maximum.

 

    ps aux | grep ‘httpd.worker’ | \
    awk ‘{print $6/1024;}’ | \
    awk ‘{avg += ($1 – avg) / NR;} \
    END {print avg ” MB”;}’

    oracle$ 22.3346 MB

 

Based on my list of httpd.worker processes in the screenshot the output was 22.3346MB, so if I multiplied 64 ServerLimit by 22.3346MB I would need about 1.4GB of RAM for Apache/OHS when all 64 httpd.worker processes are running.

 

Summary

Hopefully this article will help you understand how Apache/OHS works with the Webgate in regard to OAP connections to the Access Server(s) to provide a basic direction to improved tuning for both Apache/OHS and Webgate in order to optimize throughput, performance, and mitigate the “The WebGate plug-in is unable to contact any Access Servers.”  I would consider reviewing my previous article OAM 11g Webgate Tuning, to make sure you understand some of the other important Webgate parameter knobs to complete your tuning exercise.  It is said that a moth that lives too close to the flame leads a short life.  The moth being bad Apache/OHS and Webgate tuning. Let’s hope your Apache/OHS and Webgate tuning are not too close to the flame and instead you find Yin and Yang using this advice. Good luck!

Add Your Comment