OK, maybe "Ultimate" could be stretching it, but it caught your eye so you can be the judge. This post is part of a larger series on Oracle Access Manager 11g called Oracle Access Manager Academy. An index to the entire series with links to each of the separate posts is available here. Though OAM11g is in the title, this article should also apply to OAM12c(12.2.1.3.0.) since it still uses 11g Webgates.
In my previous post OAM 11g Webgate Tuning, I covered a number of Webgate parameters and some Apache/OHS directives to help understand some ideas on how to tune the Webgate. In this post I expand on a mystery of how Apache/OHS works with the Webgate plugin, and enlighten you on the best way to tune Apache/OHS and Webgate together so that it will not only significantly improve performance and throughput, but potentially prevent big bottlenecks and mitigate the infamous Webgate error “The WebGate plug-in is unable to contact any Access Servers.” Yes, bad OHS/Apache tuning can actually be a culprit for the error most people immediately jump to the conclusion the problem is a firewall, network, or even Access Server problems.
This article is only intended for Apache/OHS running in Worker-mode, which is the default for Apache 2.2+, OHS11g, and OHS12c. Worker-mode uses threads to serve requests, so it is able to serve a large number of request using fewer resources. Worker-mode also provides more stability over Pre-fork by keeping multiple processes available, each with many threads. There is an initail httpd.worker process that is created called the parent. The parent is responsible for launching a child process; more on this later. In regard to Prefork-mode, the only reason you would use this mode is for compatiblity of some sort, but when working with the Webgate plug-in I recommend Worker-mode. Below is an example comparing Prefork and Worker.
Before diving into tuning, it is important to cover the basics on the directives and parameters I will talk about for Apache/OHS and Webgate respectively so you can understand what each are responsible for in their respective component. To start, the following are short explanations for each of the Apache/OHS directives covered in this article for the purposes of Apache/OHS tuning.
Now that I covered some important Apache/OHS directives, let’s go over some key Webgate parameters that relate to Apache/OHS tuning. Though I am pointing out three Webgate parameters, in interest of the article not getting too long I will only focus on Max Connections later on. The other parameters have already been covered pretty well in my previous OAM 11g Webgate Tuning article, but I still wanted to talk about them for a refresher and in the interest you check out the other article.
Now that I have some of the more detailed boring information out of the way, let's move on to the interesting things why you came to this blog. You may want to read through the next sections a couple times so you can wrap your head around everything. I promise you this will really help your tuning approach to Apache/OHS servers running a Webgate.
Let's start with the basics of how the Webgate plug-in works in conjunction with Apache/OHS. When a httpd.worker process is created, at a high level some things happen:
For example, using the following illustration there are two httpd.worker processes running, the Webgate Max Connection is set to 2, so each httpd.worker process will have 2 OAP connections open to the Access Server. The number of OAP Connections will only change as connections expire, are torn down, or closed by the client, but the Webgate will be certain to keep the number of Max Connections open for each httpd.worker process, and it should be pretty consistent. The illustration below is meant to show the logical architecture of the flow beween a Apache/OHS server and an Access Server.
Please see my previous article OAM 11g Webgate Tuning that goes deeper on how the number of OAP connections can be distributed across multiple primary Access Servers based each primary Access Server configuration values set in the Number of Connections (OAM11gPS2) or Max Connections (OAM11gPS3+).
I find the easiest way to explain complex tuning like this by using an example. I will take the approach to first explain a bad approach to tuning followed by a much better way; let’s start with the following illustration. The ServerLimit is set to 2, which means the maximum number of httpd.worker processes that Apache/OHS will create is 2. To be accurate, Apache/OHS will create a maximum ServerLimit of httpd.worker processes even when the maximum MaxClient number of http requests are hitting the web server. As a side note, the directive StartServers will tell Apache/OHS how many httpd.worker processes to create on startup. If we focus on the 2 httpd.worker processes, you will see in the illustration there are 2 OAP connections for each httpd.worker process opened up to the Access Server. This is because the Webgate Max Connections parameter value is set to the value of 2 . The Apache/OHS server you will see at least 4 OAP connecions to each primary Access Server(s).
You can also see that ThreadsPerChild is set to 256, which means each httpd.worker child will have 256 threads. Again, the threads are there to process http requests. It is important to understand that each thread within a httpd.worker process has to share all the OAP connections. In this case we have 2 OAP connections that are shared by 256 threads within each httpd.worker process, and each OAP connection can only be used by one thread at a time. For example, if a thread gets some work to check credentials or an authorization, it must first check to see if an OAP connection is available, if so then grab it, send that request to the Access Server, wait for a response, and once it is good on the response, it will put that OAP connection back into the connection pool for another thread to use.
Immediately some concerns can be identified:
Some may say, providing 256 threads for each httpd.worker process allows a lot of work to get done out of each httpd.worker and reduces the amount of memory needed on the web server thereby getting a lot of bang for your buck if you will. That sounds fair, but let’s see what happens.
In the next illustration HTTP traffic begins to increase though still reasonable and nothing to spur spikes or any peak load. The traffic is enough for Apache/OHS to create its maximum 2 httpd.worker proccesses per the ServerLimit. Since traffic is not heavy, and each thread has a quick response between the Webgate and Access Server no problems bubble up --- so far so good.
Suddenly, HTTP traffic increases to peak load or there is a burst of heavy traffic, and all the 256 threads for each httpd.worker are engaged which means all 256 threads put a demand on the only 2 OAP connections. In the following illustration one of the httpd.worker processes has a thread (in red) that seems to be waiting for some time on a response from the Access Server. This is no surprise since all 256 threads have to share only 2 OAP connections. Unless each request is returned within milliseconds congestion can quickly happen as illustrated. I pointed out earlier that a OAP conection can only be utilized by one thread at a time. If there are hundreds of threads the httpd.process needs to juggle for only a few OAP connections that is a big problem. This approach to tuning in my opinion creates a recipe for disaster.
In fact, in some scenarios it does not even take heavy HTTP traffic to quickly get threads piled up in a queue waiting on a OAP connection to do some work because all it could take is some delays in LDAP, network hiccups, etc. to quickly become a problem because you only have 2 OAP connections for all 256 threads. Unfortunately, the impact is enhanced from the users perspective because they will quickly experience authentication and authorization problems. The operations team at first blush will think something is wrong with the Access Servers because some symptoms in the Webgate logs will churn up errors that say "The WebGate plug-in is unable to contact any Access Servers." --- I mentioned this in the introduction. One reason you would see this error is because a thread will exceed the AAA Timeout Threshold waiting on a response from the Access Server, and as the thread queue piles up you will see many more of these errors as long as the AAA Timeout Threshold is exceeded. Hopefully, this gives some basic ideas of problems when tuning very high ThreadsPerChild per httpd.worker process over a small number of Max Connections, and to add having to few httpd.worker processes works against better performance too.
There is a better way! I will walk with you Grasshopper. In this section I will walk you through various combinations of tuning parameter values lead you to balance the Yin and the Yang of the Apache/OHS and Webgate to bring harmony.
The methodology I took was by using JMeter (see Part 1: How To Load Test OAM11g using Apache JMeter) as a load testing tool agains a basic OAM11g and OHS11g setup. I measured results from each test while systematically tweaking the Apache/OHS and Webgate values I mentioned earlier. I kept each load test consistently heavy by running thousands of logins and authorizations per minute, which applied a pretty decent amount of stress on the system. Just to make sure my results were not tainted, I would restart all the components so the tests would not be skewed by things like caching, memory, etc. Each test was repeated with the same tuning changes after being restarted to be sure the results looked fairly consistent.
Let’s look at the first test run…
In RUN 1, I decided to keep MaxClients at 1024 and Max Connections to 8 for all load tests, while tweaking ServerLimit and ThreadsPerChild. After several iterative tests I discovered as the ServerLimit increased while ThreadsPerChild was reduced, I was able to get better throughput…at a point. Interestingly, as ServerLimit continued to be adjusted higher and ThreadsPerChild inversely reduced, performance began to degrade, authentication failures started, and errors started showing up in the Webgate logs. Referencing table RUN 1 below, TEST5, TEST6, TEST7, and TEST8 had the best results though TEST6 seem to provide the most optimum overall (Darker green is better, whereas red is worse).
In RUN 2, I kept MaxClients the same like in RUN 1 since this would keep things consistent across different types of testing. However, in TEST RUN2 I took the best combo of ServerLimit and ThreadsPerChild combinations from RUN 1, and only tweaked the Webgate Max Connection values. After running load tests and repeating them for good measure while systematically making changes to the Webgate Max Connections, the best results came from tests TEST5D, TEST6C, TEST7C, and TEST8A; refer to the table RUN 2 above with the columns highlighted in green...green is better. I then filtered out the best combinations of ServerLimit, ThreadsPerChild, and MaxClients shown in the above table labeled TOP 4. My best result was from test TEST6C. It not only had the best throughput, but best CPU, and zero errors, so TEST6C was the ultimate winner. Again, green is better like a Grasshopper.
I summarized the principals behind what I determined in the following graph. With MaxClients being equal, there is a sweet spot where ServerLimit, ThreadsPerChild, and Webgate Max Connections come into a perfect balance that result in the best performance, throughput, CPU, and least errors. The graph illustrates how the Throughput curve improves as ThreadsPerChild is lowered while ServerLimit and Webgate Max Connections are increased, but there is a point where of those same adjustments if continued in the same direction begin to show performance degradation. Basicall, the Yin and Yang forces are balanced tweaking ServerLimit, ThreadsPerChild, and Webgate Max Connections together.
I am sure the burning question is, Master, what are the best tuning values for these parameters? Because variables in a deployment such as load, types of authorization policies, various backend Identity Stores, number of Access Servers, number of web servers, network stability and responsiveness, load balancers, etc. can all have an impact on your tuning, the values I tell you to use may not be perfect for you. That said, based on these principals on understanding how Apache/OHS and the Webgate plugin work together, how ThreadsPerChild, ServerLimit, and Webgate Max Connection can dramatically impact throughput and performance, I feel you should have a good starting point using the following recommendations:
The next illustration shows an example of a more balanced approach to tuning Apache/OHS directives and the Webgate parameter to get better throughput and performance even at extreme peak loads. Using 64 ThreadsPerChild with 8 Max Connections give a little more breathing room for all the threads on each httpd.worker process versus 256 threads competing for 2 OAP connections. Also increasing the ServerLimit to allow more httpd.worker processes may increase the amount of memory Apache/OHS need, but at least help the incoming http requests be more distributed across workers as loads suddenly peak or even burst.
Using these exact values may not be your optimum, but it is a better start than the alternative I started with.
Before you settle on the ServerLimit, it would be a good idea to get an average on how much memory each httpd.worker process is using so that there is enough memory for Apache/OHS and other things so you don’t run into a lot of swapping.
Running the following Linux command on a OHS server during a test will output your httpd.worker processes and include some important details you can leverage like the RES column which gives you the resident memory that each httpd.worker process is using. If you are using Apache, you may need to change “httpd.worker” to “httpd”.
top -c -p $(pgrep -d',' -f httpd.worker) -n 1
If you run the following Linux command on a OHS server during a load test (You can do a base load test to get this information) the average memory for each httpd.worker process is using can be then multiplied by your ServerLimit to get an idea of how much memory your Apache/OHS server will need for all httpd.worker processes when all have been created; consider it the maximum.
ps aux | grep 'httpd.worker' | \
awk '{print $6/1024;}' | \
awk '{avg += ($1 - avg) / NR;} \
END {print avg " MB";}'
oracle$ 22.3346 MB
Based on my list of httpd.worker processes in the screenshot the output was 22.3346MB, so if I multiplied 64 ServerLimit by 22.3346MB I would need about 1.4GB of RAM for Apache/OHS when all 64 httpd.worker processes are running.
Hopefully this article will help you understand how Apache/OHS works with the Webgate in regard to OAP connections to the Access Server(s) to provide a basic direction to improved tuning for both Apache/OHS and Webgate in order to optimize throughput, performance, and mitigate the "The WebGate plug-in is unable to contact any Access Servers." I would consider reviewing my previous article OAM 11g Webgate Tuning, to make sure you understand some of the other important Webgate parameter knobs to complete your tuning exercise. It is said that a moth that lives too close to the flame leads a short life. The moth being bad Apache/OHS and Webgate tuning. Let's hope your Apache/OHS and Webgate tuning are not too close to the flame and instead you find Yin and Yang using this advice. Good luck!