Introducing Disaster Recovery into a Fusion Applications Environment can increase the management complexity - Oracle VM offers reliable solutions to simplify ongoing maintenance and switchover/failover processes. The solution discussed here is an example deployment for Fusion Applications as an Active-Passive Environment across two datacenters. It is an addition to the previously published article Disaster Recovery for On-Premise Fusion Applications
This deployment is applicable for all Oracle VM deployments that use NFS or SAN based storage repositories. The underlying storage array needs to provide storage replication functionality. The performance and latency requirements for this part of the DR solution are a lot less strict as Fusion Applications mostly relies on the database and to a certain extent Shared Storage to store application data.
An integral prerequisite to a working Disaster Recovery solution for Fusion Applications is the proper maintenance of servers/VMs and their corresponding operating systems. If e.g. a kernel patch gets installed on the Primary Site, this patch also needs to be installed on the Secondary Site to make sure switchover/failover work properly. This requirement includes user management, configuration changes, patching, and more to satisfy the symmetrical deployment requirement and minimize risks during switchover/failover.
Storage Repository Replication
Oracle VM offers an effective solution in conjunction with the underlying storage technology to cut down on this type of maintenance in virtualized environments. This approach basically replicates the entire disks of all the VMs across to the Secondary Site. In case of a disaster at the Primary Site – all VMs can simply be started at the Secondary Site and Fusion Applications can resume service without having to reconfigure the environment. Even though the switchover/failover process is taking longer compared to an active-active Oracle VM approach - where the VMs on the standby side are already started, but not FA - however the achieved cost and maintenance reduction is making this solution ideal for environments with a less critical recovery time objective.
The Switchover Process as well as the Failover Process is very similar to the Standard Fusion Applications Disaster Recovery Process. The only addition is basically the steps to start and/or stop Oracle VM Manager and the Oracle VMs at each of the sites. This diagram describes the flow of the switchover operation.
Assuming the Primary Site is offline after the disaster it will be required to execute the failover process. This process is basically a subset of the switchover process. As soon as the primary site is available again the storage replication can be enabled again to allow mirroring of the storage repository back to the primary site. This part of the procedure is described in the storage vendor’s documentation.
The environment at the secondary site operates completely autonomous from the primary site after the disaster has occurred. It is important to make sure that the primary site does not restart, when it comes online this can be achieved by additional tools like Oracle Site Guard.
Oracle VM Implantation Considerations
In order to implement this Disaster Recovery with minimal problems it is recommended to have multiple independent Oracle VM Manager with the same UUID in an active-passive way. To retrieve the UUID of the Oracle VM Manager simply login and click on Help – About... Alternatively the UUID is stored in /u01/app/oracle/ovm-manager-3/.config
This can also easily be achieved by following the steps described in the documentation ‘3.7. Running Oracle VM Manager as a Virtual Machine’ – just create a second VM on the secondary site using the same steps as in the primary site to deploy the second Oracle VM Manager. Make sure not to run both OVMs at the same time to avoid conflicts should you not have a separate management network per site.
The server pools at the primary and secondary site are independent of each other. For simplified management you can discover and register both pools (the one at the primary site and the one at the secondary site) in both Oracle VM Manager. Make sure that the correct storage repositories are presented to the correct server pool. This is achieved the easiest way by presenting the Repository only to the relevant server pool.
Should clustering on the server pool level be enabled – please note that the pool file system does not need to be replicated. Only the actual storage repositories need to be replicated across datacenters. Also required is that the WWIDs for LUNs have to be identical across both datacenters.
For additional simplification the repository for the Oracle VM Manager can be deployed using Data Guard across both datacenters removing additional rediscovery work after a switchover.
Disaster Recovery for On-Premise Fusion Applications (http://www.ateam-oracle.com/disaster-recovery-for-on-premise-fusion-applications/)
Whitepaper: Oracle VM 3: Overview of Disaster Recovery Solutions (http://www.oracle.com/technetwork/server-storage/vm/ovm3-disaster-recovery-1872591.pdf )