X

Best Practices from Oracle Development's A‑Team

Solaris on Exalogic - Effect of VNIC over eoib0 and eoib1

Leo Yuen
Cloud Solutions Architect

Introduction

This post explains why probe-based failure detection is required if there is VNIC created over eoib0 and eoib1 on a compute node running Solaris.

Main Article

There are lots of reason for customer to create VNIC over eoib0 and eoib1 on a compute node running Solaris, two typical examples are

  • compute node needs to connect to a VLAN over the EoIB network
  • there are containers running on the compute node that require 10GbE connectivity

We talked about why Transitive Probe-based Failure Detection is required in previous blog entry, the focus was on the link between IB gateway and customer's 10GbE infrastructure.

In fact, if there are VNIC created over eoib0 and eoib1, there is a chance that bond1 will not fail over even if the link between compute node and IB gateway goes down!

Here is a simple test to illustrate this scenario:

First of all, let's create a VNIC over eoib0 using the following command:

root@el01cn01:~#dladm create-vnic -l eoib0 vnic0

That's what the IPMP groups look like:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       yes     bond1       --mb---   up        disabled  ok
eoib1       no      bond1       is-----   up        disabled  ok
bond0_0     yes     bond0       --mb---   up        disabled  ok
bond0_1     no      bond0       is-----   up        disabled  ok

Then we take the link down between compute node and the IB gateway where eoib0 is located, following is what we get:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       yes     bond1       --mb---   up        disabled  ok
eoib1       no      bond1       is-----   up        disabled  ok
bond0_0     no      bond0       -------   down      disabled  failed
bond0_1     yes     bond0       -smb---   up        disabled  ok

Notice that bond0 has failover but not bond1. Even the LINK status is still up for eoib0, it has actually lost connectivity to the 10GbE network.

Obviously, the reason behind this behavior is related to the vnic0 that we created over eoib0, from the operating system point of view, the link between eoib0 and vnic0 is still up, therefore no failover of bond1 occurred.

This is another good reason why probe-based failure detection is required.

 

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Recent Content