Many Oracle Access Management 11g customers opt to deploy a combination of Oracle Access Manager and Oracle Adaptive Access Manager using the Advanced Integration option. This combination of product features can provide strong, adaptive authentication and fraud mitigation for online applications. In this post, we examine a number of strategies for configuring the connectivity between these components in order to provide scalability and high availability for production deployments.
The information in this post applies to the 11g R2 versions of OAAM and OAM only ( at the time of writing, 220.127.116.11, 18.104.22.168 and 22.214.171.124).
Before continuing, readers are advised to consult the Appendix C of the Oracle Fusion Middleware Integration Guide for Oracle Identity Management Suite (126.96.36.199 release here) to familiarize themselves with the Advanced Integration option, in terms of its features, benefits and configuration steps. This post will concentrate only on the configuration of the necessary parameters controlling the OAP communication pool between OAAM and OAM.
When OAM and OAAM are deployed using the Advanced Integration pattern, the two product components play different role during the authentication process. Through the use of the OAAM Authentication Scheme in OAM, the process of collecting credentials (and thus handling the entire authentication flow with the user's browser) is handled by OAAM. The actual authentication (or, in fact, credential validation) step is still performed by OAM via a back-channel OAP (Oracle Access Protocol) call from OAAM. OAAM uses its configured logic to collect username and password from the user, with the aid of virtual strong authentication devices, fraud detection rules and the like. Once it has collected these credentials, it uses an embedded OAM Access SDK client (or custom AccessGate) to pass these credentials to the OAM server. OAM validates the credentials against its configured LDAP identity store and returns the result to OAAM. Should the authentication succeed, OAAM then generates a Delegated Authentication Protocol (DAP) token and redirects the user back to OAM with this token in order to create the necessary OAM session.
In order to ensure sufficient performance and availability for production deployments, it is thus critical to ensure that this OAP connection mechanism between OAAM and OAM is correctly configured to meet the applicable requirements.
Unlike OAM webgates, which are completely configurable via the webgate profile in the OAM console (which in turn generates the ObAccessClient.xml file), OAM Access SDK clients (such as OAAM) do not use the webgate profile for anything other than basic authentication to the OAM server. What this means is that while the webgate ID and password are important, OAAM will essentially ignore any other settings on the webgate profile - in particular, those settings controlling the number of primary and secondary OAP connections that should be created against each OAM server, which allow for load balancing and high availability when configuring webgates. Instead, OAAM's connection pool is configured via a number of OAAM properties, which provide somewhat less flexibility in terms of support for load balancing. We'll explore these properties below, before discussing a number of strategies that can be used to ensure a production-ready deployment. Please also see Appendix C of the Oracle Fusion Middleware Administrator's Guide for Oracle Adaptive Access Manager (188.8.131.52 release here)
IAMSuiteAgentand should not be changed.
Perusing the above properties, the immediate observation is that only a single primary and single secondary OAM server can be specified. This is obviously of limited usefulness for large-scale production deployments, where it is a fairly obvious requirement to want to load balance requests from OAAM across a number of OAM servers. Below, we explore a number of options that can work.
In a deployment where the number of OAAM nodes matches the number of OAM nodes exactly, then a fairly sensible and robust load balancing approach is simply to allocate a single primary and a single secondary OAM server to each OAAM server. This can be achieved by overriding the deployment-wide oaam.uio.oam.host and oaam.uio.oam.secondary.host settings on each individual OAAM host. In order to do this, first ensure that you delete the applicable property values from the OAAM database via the OAAM console. Then pass a unique value to each OAAM server instance at startup via a java property, e.g.
-Doaam.uio.oam.host=<primary_host_name> and -Doaam.uio.oam.secondary.host=<secondary_host_name>
Consider a deployment comprising two OAAM hosts (Host A and Host B) and two further OAM hosts (Host C and Host D). Using this approach, Host A would be configured with the following settings:
oaam.uio.oam.host: Host C and oaam.uio.oam.secondary.host: Host D
while Host B would be configured with
oaam.uio.oam.host: Host D and oaam.uio.oam.secondary.host: Host C
This configuration would ensure that both OAM hosts received an equivalent number of connections, thus providing load balancing, while also providing resilience in case either OAM server should become unavailable.
This approach, though, would suffer from a number of drawbacks, including the following:
The second option is similar to the first, in that it allows for the definition of a single primary and a single secondary OAM server for each OAAM server. In this case, though, rather than overriding domain-wide property values, the approach is to user virtual hostnames to define the OAM servers.
For example, we would define the following:
We would then use the /etc/hosts file on each OAAM node to define exactly which physical OAM server IP address the virtual hostnames oam-primary and oam-secondary should resolve to. In our above scenario, OAAM HOST A would have entries in its hosts file mapping oam-primary to the IP address for OAM Host C and oam-secondary to the IP address for OAM Host D. HOST B would instead map oam-primary to the IP address for OAM Host D and oam-secondary to the IP address for OAM Host C.
In cases where OAAM and OAM servers are co-located on the same hardware, we can use a shortcut and specify "localhost" as the oaam.uio.oam.host value.
This approach provides pretty much exactly the same benefits as the first option and incurs the same drawbacks, with the possible exception that it may prove somewhat easier to manage in production. In particular, the fact that any of the virtual mappings could be changed dynamically (without needing to restart OAAM) would be a definite advantage of this strategy.
Perhaps the most obvious solution to this problem is to insert some form of external load balancer between OAAM and OAM. In this case, OAAM is configured such that the oaam.uio.oam.host property points to the address of the load balancer, which then in turn distributes requests to the OAM servers according to whatever algorithm is desired. In this scenario, it does not even make sense to define the oaam.uio.oam.secondary.host property (unless there is a second, redundant load balancer in place) since it's assumed that the load balancer itself will only route requests to active OAM nodes.
This approach has a number of benefits when compared to options 1 and 2 above, including the following:
These benefits do come at a cost, however, in terms of increased complexity within the deployment. There will obviously also be a physical cost to procuring and commissioning the necessary load balancing device.
In addition, some caveats need to be mentioned at this point.
Firstly, while it may seem an obvious point, it's worth remembering that OAP is a long-lived, TCP-based protocol and thus the load balancer used must be able to handle such a protocol. OAP is not HTTP, thus an HTTP-only load balancer can not be used here.
The fact that OAP connections are long-lived can introduce some unforseen complications, like the ones described in this excellent post by Chris Johnson. Unless the load balancer is able to dynamically rebalance connections, it is possible that an OAM server outage could result in an unbalanced connection load even after the troublesome server is brought back on-line. The only way to mitigate this situation would be to perform a managed rolling restart of the OAAM cluster once all the OAM servers are up again.
The comments in this blog post about connection timeouts are also applicable; it is best to configure the load balancer so as not to time out idle/long-lived connections if possible. If not, these time-outs should be set for as long as possible, since we do not have the equivalent of the webgate "Max Session Time" parameter available through OAAM's configuration properties. If it is not possible to avoid connection time outs, then as a mitigation, be sure to set the oaam.oam.oamclient.periodForWatcher property to a low enough value, to increase the likelihood that the OAAM pool watcher will detect and re-establish a timed-out connection before a real client request attempts to use it.
While there is obviously no perfect answer or one-size-fits-all solution here, the most sensible approach may well be to combine the above options; a number of the more unpleasant side effects caused by load balancing OAP can be avoided by using a direct host connection (either option 1 or 2) for the primary OAM server connection. If a load balancer is available, it could be used as the secondary, thus allowing the solution to scale beyond two nodes without compromising availability.