Cisco CSR1000v High Availability (HA) Deployment in OCI

July 1, 2020 | 23 minute read
Javier Ramirez
Principal Cloud Solution Architect
Text Size 100%:

Overview

The purpose of this blog is to provide guidance deploying two Cisco CSR1000v in High Availability (HA) in Oracle Cloud Infrastructure (OCI).

Warning At the time of this writing Cisco does not offer a service or solution natively to create a true HA relationship between two Cisco CSR1000vs. This blog is intended to demonstrate a methodology for customers to build an HA deployment on OCI but the customer is responsible for support. This is not an Oracle or Cisco supported configuration. Also note that Cisco CSR1000v is not part of Oracle Market Place.

 

This document will guide you through the different components and tools used to monitor the health of the two routers and based on conditions use OCI Command Line Interface (CLI) to interact with Oracle cloud to update routing tables assigned to the subnets within a Virtual Cloud Network (VCN).

This documents assumes you have extensive knowledge with Cisco IOS software and the protocols (EIGRP, HSRP, VRRP, GRE) and commands (ping, ip sla, track, event) used on this document.

Important: When instantiating the two virtual routers make sure the primary VNIC is in a subnet with access to the Internet as this interface is used to interact with the OCI API gateway.

 

The Cisco CSR 1000v is a virtual version of the Cisco CSR1000 appliance. The software can be installed in a Virtual Machine (VM). Currently Cisco CSR1000v is not in Oracle Marketplace, you need to get an image from Cisco and read the prerequisites and hardware requirements to install it. Choose the proper Oracle VM shape based on the performance requirements and the number of interfaces needed for the solution.

HA Overview

For this blog the objective is to have the Green-VM to be able to communicate with Blue-VM. In a traditional deployment using HSRP or VRRP a Virtual IP (VIP) is created between the two routers and the route table for the hub subnet will point to the VIP. HSRP or VRRP will monitor the health of the routers and will move the VIP to the active router when it detects problems as depicted on the diagram below.

You can’t use HSRP or VRRP between the two routers because those protocols use multicast to talk between the members of the group. Multicast is NOT supported in the cloud. For this reason a process is required to monitor the health of the two routers and use two route tables in OCI. Use Route-CSR1 when CSR1 is healthy and use Route-CSR2 when CSR1 has failed. One route table will be assigned to the hub subnet at all times, which one depends of the health of the routers as depicted on the diagram below. This blog describes two options for HA: a) GRE Tunnel and b) IP SLA.

As mentioned at the beginning of this document Cisco does not offer a true HA solution for deploying Cisco CSR1000v in OCI. What can be done is to monitor/track some key elements of the virtual routers like interfaces and based on an event trigger a script to make some changes in the cloud. There are three parts to this process:

  1. Monitoring – As stated above there are two options to monitor elements of the virtual routers, they are explained in detail in the following sections
  2. Triggering – It uses Cisco Embedded Event Manager (EEM) to execute a script to interact with the Oracle Cloud based on an event from the monitoring process
  3. Scripting – It uses several tools like guest shell, OCI CLI tools, and instance principals to interact with the OCI API and move the subnet route table within the VCN to use CSR1 or CSR2 as the next hop.

Prerequisites

Guest Shell

The Cisco CSR1000v has a Linux built-in container with Linux CentOS 7 operating system called the guest shell. It can be used to run Linux based applications. For more information see Cisco documentation. The OCI CLI will be installed on this container to make calls to OCI API to interact with the cloud.

The diagram below is a representation of how the guest shell interacts with the router and the rest of the network. Basically on the router side you have a VirtualPortGroup which is the default gateway for the container. Within the container you have an ethernet interface. Keep in mind that the IP address for the VirtualPortGroup and the container is local to the CSR. The cloud and the outside word does not have any idea about these IPs. When the container needs to talk to something outside of the router, its traffic is NATed to the IP address of the primary interface of the router (GigabitEthernet1).

Enable the guest shell

Once the two virtual routers are instantiated within OCI the next step is to configure/check the guest shell. This blog was validated with Cisco IOS XE Software, Version 16.09.04. It should work in future releases but its subject to change by Cisco.

Note: These tasks are performed on both routers

 

1. Log into the router

2. Check if the IOX subsystem is running. Depending of the image you get from Cisco, the guest shell might be already pre-configured but not active.

CSR2#show iox-service 

IOx Infrastructure Summary:
---------------------------
IOx service (CAF)    : Not Running 
IOx service (HA)     : Not Running 
IOx service (IOxman) : Not Running 
Libvirtd             : Running

3. Start IOX if not running

CSR2#conf t
CSR2(config)#iox

4. Check again to see if the service is running

CSR2#show iox-service 

IOx Infrastructure Summary:
---------------------------
IOx service (CAF)    : Running 
IOx service (HA)     : Not Running 
IOx service (IOxman) : Running 
Libvirtd             : Running 

CSR2#

5. Before executing the commands below, check if they are already in your configuration, if not proceed configuring the guest shell components:

a. Update GigabitEthernet1 interface configuration

CSR2#conf t
CSR2(config)#interface GigabitEthernet1
CSR2(config-if)#ip nat outside

b. Create the Virtual Port Group 0

CSR2(config)#interface VirtualPortGroup0
CSR2(config-if)#ip address 10.1.1.1 255.255.255.0
CSR2(config-if)#ip nat inside
CSR2(config-if)#no mop enabled
CSR2(config-if)#no mop sysid

c. Configure the NAT process

CSR2(Config)#ip access-list standard GS_NAT_ACL
CSR2(config-std-nacl)#permit 10.1.1.0 0.0.0.255
CSR2(Config)#ip nat inside source list GS_NAT_ACL interface GigabitEthernet1 overload

d. Tie the Guest shell interface to the virtual port group

CSR2(config)#app-hosting appid guestshell
CSR1(config-app-hosting)# app-vnic gateway0 virtualportgroup 0 guest-interface 0
CSR1(config-app-hosting)# guest-ipaddress 10.1.1.2 netmask 255.255.255.0
CSR1(config-app-hosting)# name-server0 208.67.222.222

e. Enable Guest Shell. Note that it is not done from the config terminal level. The execution of the command takes couple minutes, be patient

CSR2#guestshell enable

6. Switch to the guest shell. This is a Linux container where you can run Linux commands. To exit back to the router environment use the exit command

CSR2#guestshell run bash

Instance Principals

Instance principals is a process for a VM to interact with OCI via API calls. To enable instance principals the first step is to create a Dynamic Group and then a policy. This is done from the Oracle Console. To complete this section you need the OCIDs for both CSRs.

1. Log into the Oracle Console

2. Select the hamburger menu on the top left corner, select Identity, select Dynamic Groups

3. Click Create Dynamic Group and create a dynamic group called CSR-DG and enter the information as shown on the picture below. For this step you need to collect the OCIDs for both of your routers.

CSR1 - ocid1.instance.oc1..........37ds2zt62kmcwmzvukv32ljq
CSR2 - ocid1.instance.oc1..........cgrf5bt67miptn2tny2gd4y6g55a

Matching Rule
ANY {instance.id = ‘ocid1.instance.oc1..........37ds2zt62kmcwmzvukv32ljq’,
instance.id = 'ocid1.instance.oc1..........cgrf5bt67miptn2tny2gd4y6g55a’}

4. Click Identity on the top left corner

5. Select Policies

6. Click Create Policy and create a policy called CSRs-Policy and enter the information below. Compartment is the specific compartment when the subnet and the route-tables belong to.

allow dynamic-group CSR-DG to manage subnets in compartment 
allow dynamic-group CSR-DG to manage route-tables in compartment 

7. Move to the next section

OCI CLI Installation

Now that the guest shell is enabled on the router, the next step is to install the OCI CLI tool. This tool will allow the guest shell to interface with the Oracle Cloud via the OCI API using the instance principal configured above.

Note: These tasks are performed on both routers

 

1. Install OCI CLI

a. Go to the Oracle CLI public documentation, select Quickstart, copy the link for Linux

b. Log into the router and enter the guest shell

c. Paste the link copied previously and install OCI CLI. Accept all the defaults

OCI CLI still uses Python 2.7 if it is already installed, the Guest shell already has Python v2.7.5 installed. If there is no Python installed, the installation process will install version 3.x which is the new version. During the installation process, you may see an error message about this which can be safely ignored.

d. Restart the shell following the instructions at the end of the OCI CLI installation process

2. Do not follow the rest of the configuration for OCI CLI. There is NO need for a config file for OCI CLI as this process will use instance principals for authentication.

3. Test OCI CLI by running the command below from the guest shell. For the test to be successful GigabitEthernet1 should be able to reach the Internet and any security list in the path should allow this traffic. You might need to open security rules, update routing tables, and crate an Internet Gateway or NAT Gateway within the VCN via the Oracle Console if needed.

[guestshell@guestshell ~]$ oci os ns get --auth instance_principal

4. If your configuration is done properly you will get a message similar to this where you can see your tenancy name

/home/guestshell/lib/oracle-cli/lib/python2.7/site-packages/cryptography/
hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: 
Support for your Python version is deprecated. The next version of 
cryptography will remove support. Please upgrade to a release 
(2.7.7+) that supports hmac.compare_digest as soon as possible.
  utils.PersistentlyDeprecated2018,
{
  "data": "your-tenancy-name"
}
[guestshell@guestshell ~]$

5. In the home directory (/home.guestshell) create a file called test-oci-cli and enter the command above. This way any time you want to test the OCI CLI you can run this file and you don’t have to remember the whole command

[guestshell@guestshell ~]$vi test-oci-cli
oci os ns get --auth instance_principal

HA Implementation Process

Scripting

As mentioned in the HA Overview section there are three parts to the process (Monitoring, Triggering, and Scripting). Let’s define the scripting mechanism first as it is the same for the options presented in this document and it is useful to have it configured for testing and to validate your OCI CLI and Instance principals is working properly. To perform this task you will need to collect the OCID for the hub subnet (green), and the two routing tables (Route-CSR1 and Route-CSR2). The diagram below shows the content of the two route tables.

Note: These tasks are performed on both routers

 

1. Log into the Oracle Console

2. Create route table Route-CSR1 and Route-CSR2. Note that the destination is the same on both, but the target is the only thing different. Route-CSR1 uses router CSR1 as the gateway and Route-CSR2 uses router CSR2 as the gateway

3. Copy the OCID for these two route tables and also for the Hub subnet as is required to build the script

4. Log into your router

5. Switch to the guest shell

guestshell run bash

6. In the home directory (/home.guestshell) create two files called Route-CSR1 and Route-CSR2. These files will have the OCI CLI command to assign a route table to the Hub subnet (green) so traffic from this subnet can reach the Spoke VCN via the proper CSR. It is a single command , it is a single line but per formatting it is shown in multiple lines. Create both files on each router.

Hub Subnet - ocid1.subnet.oc1…….5t4waaxugcbu7amfkbenoa
Route-CSR1 - ocid1.routetable.oc1………pg6onzg4zevkqhh3dqlbl4r4a
Route-CSR2 - ocid1.routetable.oc1………6g3yhhcynvo4att4yewoetsimbtha
The actual command is stated below where subnet is the hub subnet and route
is the route table to be assigned

oci network subnet update --subnet-id <subnet OCID> 
--route-table-id  <route OCID> --auth instance_principal

[guestshell@guestshell ~]$vi Route-CSR1
oci network subnet update --subnet-id ocid1.subnet.oc1…….5t4waaxugcbu7amfkbenoa 
--route-table-id ocid1.routetable.oc1………pg6onzg4zevkqhh3dqlbl4r4a --auth instance_principal

[guestshell@guestshell ~]$vi Route-CSR2
oci network subnet update --subnet-id ocid1.subnet.oc1…….5t4waaxugcbu7amfkbenoa 
--route-table-id ocid1.routetable.oc1………6g3yhhcynvo4att4yewoetsimbtha --auth instance_principal

7. Test OCI CLI to make sure it is still working by running the command below . You should see your tenancy as we tested when OCI CLI was installed

bash test-oci-cli

8. To test the two new files and make sure the route is assignment properly, log into Oracle Console, select Networking from the hamburger menu, select Virtual Cloud Networks, select the proper VCN, and select the proper subnet where the routes will be applied to. In this case the hub subnet. As you can see Route-CSR1 is assigned to the hub subnet.

Run the Route-CSR2 script from the guest shell

[guestshell@guestshell ~]$ bash Route-CSR2
/home/guestshell/lib/oracle-cli/lib/python2.7/site-packages/cryptography
/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: 
Support for your Python version is deprecated. The next version of 
cryptography will remove support. Please upgrade to a release 
(2.7.7+) that supports hmac.compare_digest as soon as possible.
  utils.PersistentlyDeprecated2018,
{
  "data": {
    "availability-domain": null, 
    "cidr-block": "10.228.208.0/28", 
    "compartment-id": "ocid1.compartment.oc1……….x4f4ey6mgrdrgynvzzq", 
    "defined-tags": {}, 
    "dhcp-options-id": "ocid1.dhcpoptions.oc1……….xev6afdb33ijut7eccua", 
    "display-name": "Hub", 
    "dns-label": "f...", 
    "freeform-tags": {}, 
    "id": "ocid1.subnet.oc1…….5t4waaxugcbu7amfkbenoa", 
    "ipv6-cidr-block": null, 
    "ipv6-public-cidr-block": null, 
    "ipv6-virtual-router-ip": null, 
    "lifecycle-state": "AVAILABLE", 
    "prohibit-public-ip-on-vnic": false, 
    "route-table-id": "ocid1.routetable.oc1………6g3yhhcynvo4att4yewoetsimbtha", 
    "security-list-ids": [
      "ocid1.securitylist.oc1………..ewpv5ts3ldw323cx6x7alm62q"
    ], 
    "subnet-domain-name": "…….oraclevcn.com", 
    "time-created": "2020-04-17T15:23:22.351000+00:00", 
    "vcn-id": "ocid1.vcn.oc1……….4frktwgtgq2jxsba7iccqcojp6psvqkq", 
    "virtual-router-ip": "10.228.208.1", 
    "virtual-router-mac": "00:00:17:71:B7:F7"
  }, 
  "etag": "6d029fef"
}
[guestshell@guestshell ~]$
Refresh the Oracle Console, now Route-CSR2 is assigned to the hub subnet

Run Route-CSR1 script from the guest shell to test and make sure both commands work properly

[guestshell@guestshell ~]$ bash Route-CSR1
/home/guestshell/lib/oracle-cli/lib/python2.7/site-packages/cryptography
/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: 
Support for your Python version is deprecated. The next version of 
cryptography will remove support. Please upgrade to a release 
(2.7.7+) that supports hmac.compare_digest as soon as possible.
  utils.PersistentlyDeprecated2018,
{
  "data": {
    "availability-domain": null, 
    "cidr-block": "10.228.208.0/28", 
    "compartment-id": "ocid1.compartment.oc1……….f4ey6mgrdrgynvzzq", 
    "defined-tags": {}, 
    "dhcp-options-id": "ocid1.dhcpoptions.oc1………….vqxev6afdb33ijut7eccua", 
    "display-name": "Hub", 
    "dns-label": "f...", 
    "freeform-tags": {}, 
    "id": " ocid1.subnet.oc1…….5t4waaxugcbu7amfkbenoa ", 
    "ipv6-cidr-block": null, 
    "ipv6-public-cidr-block": null, 
    "ipv6-virtual-router-ip": null, 
    "lifecycle-state": "AVAILABLE", 
    "prohibit-public-ip-on-vnic": false, 
    "route-table-id": "ocid1.routetable.oc1………pg6onzg4zevkqhh3dqlbl4r4a", 
    "security-list-ids": [
      "ocid1.securitylist.oc1………pv5ts3ldw323cx6x7alm62q"
    ], 
    "subnet-domain-name": "…….oraclevcn.com", 
    "time-created": "2020-04-17T15:23:22.351000+00:00", 
    "vcn-id": "ocid1.vcn.oc1………ktwgtgq2jxsba7iccqcojp6psvqkq", 
    "virtual-router-ip": "10.228.208.1", 
    "virtual-router-mac": "00:00:17:71:B7:F7"
  }, 
  "etag": "50923f1c"
}
[guestshell@guestshell ~]$
Refresh the Oracle Console , now route assigned is Route-CSR1

9. If you are not able to move the route assigned to the hub subnet as show above please check the following:

  • Check the instance principal configuration to make sure the correct OCIDs for the routers are entered in the Dynamic Group
  • Check the policy has the proper commands and compartment where the hub subnet and route tables are part of
  • Check the Router-CSR1 and Router-CSR2 OCIDs and that the commands are entered correctly

HA Options

Warning: This is not a true HA solution because you could run into situations where the two routers are not 100% healthy and based on the monitoring, triggering, and scripting process traffic might be redirected to an unhealthy router.

 

Now you have everything in place let’s move into the two options available. This section describes the monitoring and triggering process for each option.

Option 1 - GRE Tunnel

There are a couple documents  (see reference section) from Cisco to deploy this on a cloud environment and the solution is to create a GRE tunnel between the two routers, enable EIGRP to run across the tunnel with BFD enabled for fast convergence. The logic for this option is that when the tunnel goes down, it will generate a message in the log and EEM will use that message to trigger the route move via the scripts previously created.

From tests performed this is not the perfect situation because it only monitors the tunnel between the two routers. If any of the other interfaces on the router goes down this solution will not detect them resulting on an unhealthy router still selected as the master. Before proceeding with the configuration for this option, review the second option to see if it’s a better fit for your solution.

1. Create a GRE tunnel. The tunnel destination is the IP address of GigabitEthernet1 of the other router

CSR1 CSR2
interface Tunnel1
 ip address 192.168.101.1 255.255.255.252
 bfd interval 500 min_rx 500 multiplier 3
 tunnel source GigabitEthernet1
 tunnel destination 10.228.208.58
interface Tunnel1
 ip address 192.168.101.2 255.255.255.252
 bfd interval 500 min_rx 500 multiplier 3
 tunnel source GigabitEthernet1
 tunnel destination 10.228.208.55

2. Check your tunnel is up

CSR2#ping 192.168.101.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.101.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
CSR2#

CSR2#show ip int br
Interface              IP-Address      OK? Method Status                Protocol
GigabitEthernet1       10.228.208.58   YES DHCP   up                    up      
GigabitEthernet2       10.228.208.8    YES manual up                    up      
Tunnel1                192.168.101.2   YES manual up                    up      
VirtualPortGroup0      10.1.1.1        YES NVRAM  up                    up      
CSR2#

3. Enable EIGRP and BFD over the tunnel interface on both routers

router eigrp 1
 bfd all-interfaces
 network 192.168.101.0 0.0.0.3

4. Check BFD is working. You should be able to see the neighbor through the tunnel

CSR1#show bfd neighbors 
IPv4 Sessions
NeighAddr                              LD/RD         RH/RS     State     Int
192.168.101.2                        4097/4097       Up        Up        Tu1
CSR1#

CSR2#show bfd neighbors 
IPv4 Sessions
NeighAddr                              LD/RD         RH/RS     State     Int
192.168.101.1                        4097/4097       Up        Up        Tu1
CSR2#

5. Configure EEM to look for a message – In this option EEM is the engine that will perform the monitoring as it will look for a particular message in the log. As a test, shut down Interface Tunnel1 on CSR1 and check the logs on CSR2. This is the first message it appears

*Jun 15 18:04:19.810: %BFDFSM-6-BFD_SESS_DOWN: BFD-SYSLOG: BFD session ld:4097 
handle:1,is going Down Reason: ECHO FAILURE

6. Configure EEM on each router to execute the script (triggering) when the BFD message above appears in the log. Action 2.0 in the config below executes the proper script to update the route table associated to the hub subnet

CSR1 CSR2
event manager applet OCI-CLI
event syslog pattern "BFDFSM-6-BFD_SESS_DOWN"
action 1.0 cli command "enable"
action 2.0 cli command "guest shell run bash Route-CSR1”
action 3.0 cli command "exit”
action 4.0 cli command “end”
end
event manager applet OCI-CLI
event syslog pattern "BFDFSM-6-BFD_SESS_DOWN"
action 1.0 cli command "enable"
action 2.0 cli command "guest shell run bash Route-CSR2”
action 3.0 cli command "exit”
action 4.0 cli command “end”
end

7. For testing, start with CSR1 as the active path. Log into the Oracle Console and check the route table assigned to the hub subnet it should be Route-CSR1. If it is not assign Route-CSR1 to it to match this blog

8. For testing shut down Interface Tunnel1 on CSR1, CSR2 should detect the BFD session going down and will execute the script and change the routing table to Route-CSR2

!BFD neighbor is  down

CSR1#show bfd neighbors 
CSR1#

CSR2#show log
*Jun 15 18:31:24.590: %BFDFSM-6-BFD_SESS_DOWN: BFD-SYSLOG: BFD session ld:4097 handle:
 1,is going Down Reason: ECHO FAILURE
*Jun 15 18:31:24.591: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed,  ld:4097 
 neigh proc:EIGRP, handle:1 act
*Jun 15 18:31:24.592: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 192.168.101.1 (Tunnel1)
 is down: BFD peer down notified
CSR2#

!BFD neighbor is  down
CSR2#show bfd neighbors 
CSR2#

9. Once Interface Tunnel1 is back online, routing does not move back to CSR1. You can move it manually from any of the CSRs by running the command below. There is no pre-empt option on this solution. You will have to analyze if there is any advantage of moving the route back to CSR1.

guestshell run bash Route-CSR1

10. A drawback of this option is if an interface on the active router is down, the other router is not aware of it and will move the route. For example, shutdown GigabitEthernet2 on CSR1, CSR2 does not see any messages as there is nothing monitoring the other interfaces on the neighbor router. This solution will work only if interface Tunnel1 goes down or GigabitEthernet1 goes down because the tunnel is built over GigabitEthernet1. You could also trigger the script if you see this message “LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2” but now the script is executed from the router where the failure occurred. You will have to monitor all the interfaces. If this is the case then what is the point of monitoring the tunnel?

Option 2 - IP SLA

This option uses IP SLA to ping the interfaces from one router to the other router and if any of the pings fail then the router performing the pings will execute the script to update the routing table. The pings are performed from both routers as each router is monitoring the other one. The track command is used to monitor the state of IP SLA and EEM is used to track the state of track statements and make the call to the script.

Same as the previous option this is not a perfect solution as there might be cases where the routers own problems might make the IP SLA to fail. For this reason this option is also tracking the local interface to make sure the problem is not caused by the local interface going down.

Note: Only when the IP SLA to the remote location is down but the local interface is still up then the script will be executed.

 

1. Configure IP SLA and tracking – This is the Monitoring section for this solution. In this process the objective is to track several elements. For each interface perform an IP SLAs to ping the other router interface IP address. Then use the track command to check the status of the IP SLA command. This is done for every interface on the router and it is the first element tracked. Next track the status of the interface generating the ping to make sure the problem is not caused by the local router. This is the second element to be tracked. This is done in all the interfaces except for the primary interface because if the primary interface (GigabitEthernet1) is down then the router will not be able to communicate with OCI API so there is no point of tracking the local primary interface. The configuration uses standard timers and settings. You can modify them based on your requirements.

CSR1 CSR2
!This is the IP SLA and track for the red interface (primary). 

ip sla 2
 icmp-echo 10.228.208.58 source-interface GigabitEthernet1
 frequency 10
ip sla schedule 2 life forever start-time now
!
track 2 ip sla 2 reachability
 delay down 8 up 10
!This is the IP SLA and track for the red interface (primary).

ip sla 2
 icmp-echo 10.228.208.55 source-interface GigabitEthernet1
 frequency 10
ip sla schedule 2 life forever start-time now
!
track 2 ip sla 2 reachability
 delay down 8 up 10

For the other interfaces as mentioned above we are tracking two elements. These two elements are joint under track 4 which is a Boolean list. The concept of this list is somehow confusing. See the reference section at the end of the blog for the options and results of the Boolean list.

CSR1 CSR2
! This is the IP SLA and track to ping GigabitEthernet2 (green interface) on CSR2

ip sla 1
 icmp-echo 10.228.208.8 source-interface GigabitEthernet2
 frequency 10
ip sla schedule 1 life forever start-time now

track 1 ip sla 1 reachability
 delay down 8 up 10
!
! This tracks the source interface for IP SLA 1 which is GigabitEthernet2

track 3 interface GigabitEthernet2 ip routing
!
! This track joins track 1 and track3 so only when track1 is down and track 3 is still up the script is called

track 4 list boolean and
 object 1 not
 object 3
! This is the IP SLA and track to ping GigabitEthernet2 (green interface) on CSR1

ip sla 1
 icmp-echo 10.228.208.3 source-interface GigabitEthernet2
 frequency 10
ip sla schedule 1 life forever start-time now

track 1 ip sla 1 reachability
 delay down 8 up 10
!
! This tracks the source interface for IP SLA 1 which is GigabitEthernet2

track 3 interface GigabitEthernet2 ip routing
!
! This track joins track 1 and track3 so only when track1 is down and track 3 is still up the script is called

track 4 list boolean and
 object 1 not
 object 3

2. Configure EEM to execute the script if the IP SLAs fail. This is the triggering mechanism in the solution. Action 2.0 executes the script via the guest shell.

Note that each interface has its own Route tables. In this lab we are focusing on GigabitEthernet2 and we created the route tables for it. In a real situation you also need two route tables for GigabitEthernet1. The statements below are calling a script for the purpose of showing the process.

CSR1 CSR2
!EEM for the main interface.
 
event manager applet PING-DOWN-Main
event track 2 state down
action 1.0 cli command "enable"
action 2.0 cli command "guest shell run bash G1-Route-CSR1”
action 3.0 cli command "exit”
action 4.0 cli command “end”
!EEM for the main interface

event manager applet PING-DOWN-Main
event track 2 state down
action 1.0 cli command "enable"
action 2.0 cli command "guest shell run bash G1-Route-CSR2”
action 3.0 cli command "exit”
action 4.0 cli command “end”
!EEM for track 4 for GigabitEthernet2

event manager applet PING-DOWN
 event track 4 state up
 action 1.0 cli command "enable"
 action 2.0 cli command "guest shell run bash Route-CSR1"
 action 3.0 cli command "exit"
 action 4.0 cli command "end"
!EEM for track 4 for GigabitEthernet2

event manager applet PING-DOWN
 event track 4 state up
 action 1.0 cli command "enable"
 action 2.0 cli command "guest shell run bash Route-CSR2"
 action 3.0 cli command "exit"
 action 4.0 cli command "end"

3. Testing, the hub subnet will start with Route-CSR2. Shut down interface GigabitEthernet2 on CSR2. IP SLA 1 on CSR1 fails while GigabitEthernet2 on CSR1 is still up and trigger EEM to change the route to Route-CSR1

!Before the test

CSR1#show track
Track 1
  IP SLA 1 reachability
  Reachability is Up
    62 changes, last change 00:07:20
  Delay up 10 secs, down 8 secs
  Latest operation return code: OK
  Latest RTT (millisecs) 1
  Tracked by:
    Track List 4
Track 3
  Interface GigabitEthernet2 ip routing
  IP routing is Up
    11 changes, last change 00:07:46
  Tracked by:
    Track List 4
Track 4
  List boolean and
  Boolean AND is Down -> Default state
    7 changes, last change 00:07:19
    object 1 not Up
    object 3 Up
  Tracked by:
    EEM applet PING-DOWN 
CSR1#

CSR2#conf t
CSR2(config)#int gi2
CSR2(config-if)#shut
CSR2(config-if)#

CSR1#show track
Track 1
  IP SLA 1 reachability
  Reachability is Down
    63 changes, last change 00:00:07
  Delay up 10 secs, down 8 secs
  Latest operation return code: Timeout
  Tracked by:
    Track List 4
Track 3
  Interface GigabitEthernet2 ip routing
  IP routing is Up
    11 changes, last change 00:08:46
  Tracked by:
    Track List 4
Track 4
  List boolean and
  Boolean AND is Up -> With the up state will trigger the script
    8 changes, last change 00:00:06
    object 1 not Down
    object 3 Up
  Tracked by:
    EEM applet PING-DOWN 
CSR1#

!These are messages that appear in the log when the state of the track function changes

*Jun 16 23:32:26.904: %TRACK-6-STATE: 1 ip sla 1 reachability Up -> Down
*Jun 16 23:32:27.136: %TRACK-6-STATE: 4 list boolean and Down -> Up

4. Route successfully  moved to CSR1

5. Note: When GigabitEthernet2 on CSR2 is back on line, it will execute the shell as track4 will be in the up state. These are the messages seen in CSR2. This means the route will move back to Router-CSR2. Similar to a preempt option because the tracking algorithm not but design

*Jun 16 23:30:19.902: %TRACK-6-STATE: 3 interface Gi2 ip routing Up -> Down
*Jun 16 23:30:21.900: %LINK-5-CHANGED: Interface GigabitEthernet2, changed state to 
 administratively down
*Jun 16 23:30:22.900: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2, changed 
 state to down
*Jun 16 23:30:40.367: %TRACK-6-STATE: 1 ip sla 1 reachability Up -> Down
*Jun 16 23:32:26.629: %TRACK-6-STATE: 3 interface Gi2 ip routing Down -> Up
*Jun 16 23:32:27.201: %TRACK-6-STATE: 4 list boolean and Down -> Up
*Jun 16 23:32:28.056: %LINK-3-UPDOWN: Interface GigabitEthernet2, changed state to up
*Jun 16 23:32:29.057: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2, changed 
 state to up
*Jun 16 23:32:52.370: %TRACK-6-STATE: 1 ip sla 1 reachability Down -> Up
*Jun 16 23:32:53.215: %TRACK-6-STATE: 4 list boolean and Up -> Down

As mentioned above this process is not perfect but gives you an automatic option for failover. It takes from 10-15s until EEM kicks in because the track statements have a 10s delay by default, you can reduce the time by playing with the delay options available for the track commands.

Conclusion

As mentioned through the document this is not the perfect solution to do HA in the cloud with Cisco CSR1000v as it is more of an external process as opposed to a native process to the IOS. For other cloud providers Cisco has developed a connector to interact with the cloud directly and it uses the GRE tunnel solution. Perhaps this could also be resolved if Cisco would support unicast for VRRP.

Reference

Track Boolean Command

The track boolean command has two options OR | AND, in this case the AND option is used as we need a specific condition to match. For the IP SLA tracking the script only needs to be executed when the ping fails (object 1 - track 1) and the local interface is still up (object 3 - track 3).

On the table below you can see the different combinations for this command in order to find the right setting. The state of track 4 shown below is for CSR1

  • First Column – The configuration combination for track 4
  • Second Column – The normal state of track 4 when everything is up and running
  • Third Column – Test 1, the state of track 4 as soon as GigabitEthernet2 is shut down on CSR2
  • Fourth Column – Test 2, the state of track 4 as soon as GigabitEthernet2 is shut down on CSR1, while GigabitEthernet2 on CSR2 is still up.
  • Fifth Column – Test 2, the state of track 4 after the delay timer has expired for track 1 (it fails)

The highlighted option is what we need as the condition for track 4 is UP when track 1 fails (Down) and track 3 has not failed (Up) and that is what we are monitoring via EEM.

Expression Normal Failure - When CSR2 G2 Is shut Shut CSR1 G2 down and CSR2 G2 is up After couple secs then track 1 goes down
track 4 list boolean and
 object 1 not
 object 3 not
            
Track 4
  List boolean and
  Boolean AND is Down
    1 change, last change 00:15:47
    object 1 not Up
    object 3 not Up
Track 4
  List boolean and
  Boolean AND is Down
    1 change, last change 00:16:55
    object 1 not Down
    object 3 not Up
Track 4
  List boolean and
  Boolean AND is Down
    3 changes, last change 00:00:18
    object 1 not Up
    object 3 not Down
Track 4
  List boolean and
  Boolean AND is Up
    4 changes, last change 00:00:09
    object 1 not Down
    object 3 not Down
track 4 list boolean and
 object 1
 object 3 not
			
Track 4
  List boolean and
  Boolean AND is Down
    1 change, last change 00:00:58
    object 1 Up
    object 3 not Up
Track 4
  List boolean and
  Boolean AND is Down
    1 change, last change 00:01:47
    object 1 Down
    object 3 not Up
Track 4
  List boolean and
  Boolean AND is Up
    2 changes, last change 00:00:02
    object 1 Up
    object 3 not Down
Track 4
  List boolean and
  Boolean AND is Down
    3 changes, last change 00:00:50
    object 1 Down
    object 3 not Down
track 4 list boolean and
 object 1
 object 3
			
Track 4
  List boolean and
  Boolean AND is Up
    2 changes, last change 00:00:10
    object 1 Up
    object 3 Up
Track 4
  List boolean and
  Boolean AND is Down
    3 changes, last change 00:00:01
    object 1 Down
    object 3 Up
Track 4
  List boolean and
  Boolean AND is Down
    5 changes, last change 00:00:04
    object 1 Up
    object 3 Down
Track 4
  List boolean and
  Boolean AND is Down
    5 changes, last change 00:00:42
    object 1 Down
    object 3 Down
track 4 list boolean and
 object 1 not
 object 3
			
Track 4
  List boolean and
  Boolean AND is Down
    3 changes, last change 00:00:54
    object 1 not Up
    object 3 Up
Track 4
  List boolean and
  Boolean AND is Up
    4 changes, last change 00:00:01
    object 1 not Down
    object 3 Up
Track 4
  List boolean and
  Boolean AND is Down
    5 changes, last change 00:00:16
    object 1 not Up
    object 3 Down
Track 4
  List boolean and
  Boolean AND is Down
    5 changes, last change 00:00:51
    object 1 not Down
    object 3 Down

 

Cisco References

https://www.cisco.com/c/en/us/td/docs/routers/csr1000/software/azu/b_csr1000config-azure/b_csr1000config-azure_chapter_01001.html#id_74411

https://www.cisco.com/c/en/us/support/docs/cloud-systems-management/prime-access-registrar/213601-csr1000v-ha-redundancy-deployment-guide.html

Check other blogs by Javier Ramirez

Javier Ramirez

Principal Cloud Solution Architect


Previous Post

Reference Architecture - Taleo Recruiting Cloud Data Replication into ADW : Using ODI Marketplace

Matthieu Lombard | 19 min read

Next Post


Knowledge Graph Modeling: Introduction to gist (and buckets)

Michael J. Sullivan | 4 min read