ISV Implementation Details - Part 4B – Linux Clustering with Keepalived (VRRP)

June 8, 2020 | 7 minute read
Javier Ramirez
Principal Cloud Solution Architect
Text Size 100%:

This is the 4th blog, (Part 4B) in a series of blogs regarding the ISV Architecture Validated Design. This blog focuses on the Failover Implementation section.

This blog series will contain the following topics

  1. ISV Home Page
  2. ISV Architecture Validated Design 
    • Requirements, Design, Solution
    • Life of a packet
    • High Availability (HA) Concepts
  3. Core Implementation
  4. Failover Implementation – You can choose from the two options below for implementation
  5. Operations
    • How to add a customer to an existing POD
    • How to create a new POD
    • References, key files and commands

 

Overview

This architecture follows the ISV Architecture but it uses keepalived for the failover implementation to monitor the interfaces of the virtual routers (VR) and it requires scripting tools like Python or OCI CLI to move the Virtual IP (VIP) between the two VRs instead of using Pacemaker & Corosync (Part 4a) as outlined in the ISV Architecture. Protocols like HSRP (Cisco) or VRRP (IETF) are used to monitor the state of a route cluster. These protocols use multicast by default to monitor each other but multicast is not a featured available on the cloud. Keepalive which uses VRRP protocol has an option to use unicast instead of multicast giving us the opportunity to use it in the cloud. Keepalived is an open source code which is packaged with Linux. The purpose of this architecture is to provide an alternate way to deploy the architecture. For production environment you could use this software but keep in mind it is open source. New code releases or patches might cause this setup not to work. For ISVs/MSPs to provide service to their customer, it is recommended to use commercial virtual routing software from routing vendors like Cisco, Juniper, Fortinet, etc. which has additional features to track objects, has logging and visibility capabilities, and technical support in case of problems or bugs with the software as long as they support cloud deployments.

Implementation

The diagram below is a representation of this lab and it will be use through this blog to guide you on the implementation. All VCNs are in the same region, all subnets are regional.

Please refer to Part 3A - OCI - Routing and Security Lists and Part 3B - OCI vRouter - OCI Shape, OS choices, VNIC and Secondary IP address creation of the ISV architecture before proceeding with the Keepalived configuration.

 

Keepalived

Now that you have the two VRs configured with the proper VNICs, IP addressing, and routing, the next step is to install keepalived. Keepalived will track the interfaces on each VR and will elect one VR as master and the other as backup using VRRP.

  • You need sudo access to install the application
  • Install keepalived on both VRs
yum install -y keepalived
  • Move default keeplaived config
mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bkp
  • Now that you moved the original configuration to a backup file, create a new keepalived.conf file using a text editor such as vi using this command vi /etc/keepalived/keepalived.conf. With this configuration keepalived will track all the VNICs in the VR and if any of them fails then it will move the master to the other VR. Remember to add any additional VNICs you have. In the config file below there is no configuration for the VIP as in the cloud the VIP is handle differently so you need to use scripting tools to interact with the cloud and move the VIP from one VR to the other. When Keepalived detects failures on the VRs it will call a script to move the VIPs. This is defined by the last statement in the config "notify_master /root/claim-vips-notify-master.sh"
  • In the config below priority is used to determine who is the master. Higher priority equals master.
  • VRRP is configured on the primary interface only (ens3)
  • unicast_src_ip and unicast_peer are important for VRRP to create the peering relationship.
  • The configuration is using authentication between the two VRs. In the example below I’m using “ISV-test” as the password. Whenever you want to change it, you will need to edit the config file on both VRs and re-start the service.
  • If you are interested to learn more please visit the keepalived website
VR1 VR2
global_defs {
    enable_script_security
    script_user root
}
#VRRP configuration for the VNIC in the transit subnet
vrrp_instance ISV {
    state MASTER
    interface ens3
    track_interface {
       ens3
       ens5
       ens6
    }
    virtual_router_id 51
    priority 200
    unicast_src_ip 172.20.136.130
    unicast_peer {
     172.20.136.131
    }
   authentication {
        auth_type PASS
        auth_pass ISV-test
    }
    notify_master /root/claim-vips-notify-master.sh
}
global_defs {
    enable_script_security
    script_user root
}
#VRRP configuration for the VNIC in the transit subnet
vrrp_instance ISV {
    state BACKUP
    interface ens3
    track_interface {
       ens3
       ens5
       ens6
    }
    virtual_router_id 51
    priority 100
    unicast_src_ip 172.20.136.131
    unicast_peer {
     172.20.136.130
    }
   authentication {
        auth_type PASS
        auth_pass ISV-test
    }
    notify_master /root/claim-vips-notify-master.sh
}
 
  • start keepalived service
service keepalived start
  • Check keepalived status. When running this command on both VRs, per the config above VR1 should say "Entering MASTER STATE" while VR2 should say "Entering BACKUP STATE"
  • Make keepalived service to start at boot
chkconfig keepalived on

You can use the commands below to manage keepalived

service keepalived stop
service keepalived start
service keepalived status

 

Identity

You need to provide access rights to the two VRs in order for the scripts to interact with the cloud and move the VIPs. In the next steps you will create the IAM policy using Instance Principles. Instance Principles is an IAM service feature that enables instances to be authorized actors (or principals) to perform actions on service resources. Each compute instance has its own identity, and it authenticates using the certificates that are added to it. These certificates are automatically created, assigned to instances and rotated, preventing the need for you to distribute credentials to your hosts and rotate them.

  • Create a Dynamic Groups in the Oracle Console. A Dynamic group can be created by explicitly adding objects or using tags. In this case because there are only two objects it is easier to specify them.
    • Select Identity from the main menu and select Dynamic Groups
    • Click Create Dynamic Group and add the VRs
    • Use the OCID for each VR
      • VR1 OCID - ocid1.instance.oc1…………pruscpq
      • VR2 OCID - ocid1.instance.oc1……….d4ebx24q
Create Dynamic Groups
Name: oci-vip-DG
Matching rules

ANY {instance.id = ‘ocid1.instance.oc1…………pruscpq’,instance.id =’ocid1.instance.oc1……….d4ebx24q’}
  • Create a Policy in the Oracle Console
    • Select Identity from the main menu and select Policies
    • Click the Create Policy button
    • The statements in the policy use the Dynamic Group created in the previous step and the compartment name where the VRs are deployed
      • Compartment – ISV for our example
Create Policy
Name: oci-vip-p
Matching rules

Allow dynamic-group oci-vip-DG to use private-ips in compartment ISV
Allow dynamic-group oci-vip-DG to use vnics in compartment ISV

 

Scripting

As stated above keepalived can’t move the VIPs in the cloud as the VIPs are owned by the cloud so you need an external application that can interact with the cloud. This can be accomplished via some scripting tool like Python or the OCI CLI or some other tool. Keepalived will execute claim-vips-notify-master.sh when it detects failures on the VRs. You need to create this file based on the scripting tool you use to perform the task.

You  might need to collect the OCIDs for the VNICs and VIPs for each of the VRs depending of the scripting tool used. When a VR becomes the master it will execute the script to claim the VIPs for all the interfaces.

Python

You can reference the links below for examples how Python can interact with the OCI API

OCI CLI

You can use OCI CLI to move the VIPs. It should be installed on both VRs as they will execute a command when they are elected masters to move the VIPs. The OCI CLI it is very simple platform to use, if you decide to use it, there is no need to create a config file as stated on the public documentation because on this setup you will use instance principles.

To install OCI CLI your VM needs access to the Internet via an Internet Gateway or NAT gateway. Open a terminal window and run the following command. Accept the default prompts

bash -c "$(curl -L https://
raw.githubusercontent.com/oracle/oci-cli/master/scripts/install/install.sh)"

To test OCI CLI run the command below. Please note the use of –auth instance_principal at the end of the command which allows the use of Intance Principals with the OCI API gateway.  If it works you should get the name of your tenancy. OCI CLI uses your primary VNIC (ens3) to interact with the OCI API gateway so make sure it has an Internet gateway or a NAT gateway to reach the API gateway and that DNS is enabled.

oci os ns get --auth instance_principal

You can use the command below to move the VIP for each VNIC on the VR.

oci network vnic assign-private-ip --vnic-id <VNIC OCID> --ip-address <VIP IP address> 
--unassign-if-already-assigned --auth instance_principal

Create a file on each VR and each file will have the command to claim the VIPs for all the VNICs. The OCIDs in the commands are the VNICs for each VR.

VR1 claim-vips-notify-master.sh
oci network vnic assign-private-ip --vnic-id ocid1.vnic.oc1………diqtaxjs6yoa --ip-address 172.20.136.132 --unassign-if-already-assigned --auth instance_principal

oci network vnic assign-private-ip --vnic-id ocid1.vnic.oc1………ftksweyvhegg --ip-address 10.0.0.4 --unassign-if-already-assigned --auth instance_principal

oci network vnic assign-private-ip --vnic-id ocid1.vnic.oc1………5jdgolrn5fhhg --ip-address 10.0.0.36 --unassign-if-already-assigned --auth instance_principal
VR2 claim-vips-notify-master.sh
oci network vnic assign-private-ip --vnic-id ocid1.vnic.oc1………kd8fnhrudjhfo --ip-address 172.20.136.132 --unassign-if-already-assigned --auth instance_principal

oci network vnic assign-private-ip --vnic-id ocid1.vnic.oc1………jsdnshdi42k94 --ip-address 10.0.0.4 --unassign-if-already-assigned --auth instance_principal

oci network vnic assign-private-ip --vnic-id ocid1.vnic.oc1………ks9d8gjdhdekl4 --ip-address 10.0.0.36 --unassign-if-already-assigned --auth instance_principal
 

Testing

  • To test open your Oracle Console, navigate to VR1 instance, VNICs, select VR1 VNIC, select IP Addresses on the left menu
  • You should see two ip addresses assigned to this VNIC, the primary IP and VIP IP
  • Open two terminal windows one to VR1 and another to VR2
  • Execute command service keepalived status
  • VR1 should be in MASTER state and VR2 should be in BACKUP state
  • On VR1 shutdown interface ens5 using command ifconfig ens5 down
  • After couple seconds in the Oracle Console, note how the VIP IP disappears from the screen. If this is not happening review to script to make sure it is interacting with the OCI API and can move the VIP
  • If you run service keepalived status on both VRs again, VR1 should be in FAULT state and VR2 should be in MASTER state. If this is not the case review the configuration for keepalived to make sure both VRs are configured properly
  • If you enable ens5 again using ifconfig ens5 up command
  • In the console the VIP should appear again
  • If you run service keepalived status on both VRs again, VR1 should be in MASTER state and VR2 should be in BACKUP state

 

Next

You have successfully create the failover mechanism between the to VRs. Next check future blog about Operations which will be part 5 of this set ob blogs for the ISV Architecture

 

Javier Ramirez

Principal Cloud Solution Architect


Previous Post

Protecting Eloqua Microsites with OCI WAF

Amit Chakraborty | 2 min read

Next Post


Extending SaaS with Cloud Native (Part 2 - Network Connectivity)

Maximilian Froeschl | 3 min read