Disaster Recovery and Business Continuity are key requirements with most business critical on-premise Fusion Applications environments. Disaster Recovery for FA can be broken down into two layers that have to be syncronised to achieve full recoverability: Shared storage & databases.
The solution described here supports simple full site switchover / failover and will satisfy most business requirements. Partial failover is not part of this example. Currently only symmetric topologies are supported - this means the sites (Primary and Standby) have to be identical in every aspect: Operating System Release, Patches, Users (incl. UIDs), Groups (incl. GIDs), file system permissions, directory structures etc. Hardware / VMs also have to be symmetrical deployed – meaning the same amount of hardware has to be available on the primary and standby site. This approach is an Active-Passive Deployment – meaning that if the applications are active on the primary site, the server on the secondary are running, but applications are not active – only the database is running in standby mode. If the environment is using load balancers these have to be configured to serve the correct site as well. This has to be included in the switch/failover process as well, but is outside the scope of this document.
This approach assumes that the entire Fusion Applications including Identity Management stack has been deployed on shared storage. If this is not the case additional factors have to be taken into consideration, e.g. if the WebTier has been provisioned in the DMZ it has to be handled separately as well.
The following diagram shows the basic architecture that is the basis for most Fusion Applications Disaster Recovery scenarios.
Oracle Data Guard is used to keep the Databases used by FA and IDM in sync between the Primary and Standby site. Generally already established best practices within the organization can be used to satisfy the requirements in regards to data loss etc. If there are no organizational standards implemented please refer to the Oracle Database MAA best practices.
Database Host Name Aliases have to be setup in order to allow a simple switch from the primary site to the secondary side. These ensure that the applications only access the databases at the site in which they are deployed.
For example - deployed are two database hosts: fadbhost1.example.com on the primary site and stbyfadbhost1.example.com on the standby site. The replication between the two databases has already been established using the companies best practice. Both hosts get the alias fadb.example.com in the respective /etc/hosts or local DNS servers (Primary site or Secondary site). This allows the alias fadb.example.com to always to be resolved to the correct site. Requests from the Primary site will go to fadbhost1.example.com and requests from the secondary site go to stbyfadbhost1.example.com. This allows the application to switchover to the other site without the need to reconfigure the data sources.
|Site||Database Host Name||Alias||Database Connect String|
This has to be done for all databases – FA & IDM (OID/OIM). More details on Database Host Name Aliases can be found here: https://docs.oracle.com/cd/E29542_01/doc.1111/e15250/design_consid.htm#ASDRG421
Additionally SID names and Instance Names must be the same for all database peers at the primary and standby site. This is also applicable for Service Names, Listener Port numbers, and entries in TNSNAMES.ORA, where applicable.
For simplified management of the solution it is recommended to implement Role based services for the databases. Role based services simply start up the correct database services on the correct database hosts in case of a failover/switchover. Using Role-Based services effectively reduce the management effort required for the manual switchover.
The shared storage requirements for Fusion Applications are discussed in the Planning Chapter of the Installation Documentation. This example assumes that the local storage option has not been chosen in the provisioning, meaning the APPLICATIONS_BASE and APPLICATIONS_CONFIG is completely installed under one mount point, e.g. /u01. In order to enable Disaster Recovery it is required to enable Storage Replication or Mirroring for the shared file-system for the entire FA and IDM deployment.
The replication has to be done on the storage hardware level – please see the documentation of your storage vendor for details. The configuration details for the Oracle ZFS storage appliance can be found here. If you are not sure, if you have the correct configuration please contact Oracle Support.
The replication of the file system is mandatory to allow Disaster Recovery. There are multiple dependencies on the file system – if these are lost this can lead to data loss and inconsistencies – these can include depending on the usage of the environment:
- Oracle Secure Files or File-based persistent stores
- Oracle Imaging and Process Management Files
- Oracle WebLogic Server JMS and T-Logs
- Oracle BPM JMS Persistent Store
- Oracle BI Repository (RPD) and Oracle BI Presentation catalog (Web Catalog)
A switchover allows the application to be moved from one site to the other. This includes databases and all other components – the following diagram shows the steps involved in this procedure.
They only reliable way to make sure that the switch/failover is working properly is to test it thoroughly. Switch-/Failover tests should be run on a regular basis, as a best practice, to ensure that the procedure works as expected.
It is vital that the databases and file systems are switched over to the same point in time to avoid data loss, i.e. the database cannot be further ahead than the file system and vice-versa. This is especially important in case of a failover, where files or the databases have to be restored. If you have a discrepancy between your file system and you database, please contact Oracle Support to rectify this issue.
The following diagram show the state after a switchover.
Oracle Site Guard can optionally be used to automate the whole Disaster Recovery process. Site Guard is part of Oracle Enterprise Manager Cloud Control 12c. Bascially Site Guard can be used to tie all the scripted part of the solution together – this leads to a comfortable management capability and can reduce downtime in case of a disaster. The screenshot below shows an example how this could be implemented – if configured properly all it takes is the click of a button to switch over to the other site. Detailed information about Oracle Site Guard can be found here.
Please note that it is required to implement separate Site Guard operations for Failover in both directions as well as Switchover in both directions.
Oracle Fusion Application Disaster Recovery 11.x Certified Configuration (Doc ID 1668687.1)
Configure High Availability And Disaster Recovery For Oracle Fusion Applications (Doc ID 1574038.1)