Best Practices from Oracle Development's A‑Team

Solaris on Exalogic - Transitive Probe-based Failure Detection

Leo Yuen
Cloud Solutions Architect


This blog describes how to enable transitive probe-based failure detection.

Main Article

On Exalogic, no matter which supported Operating Systems that a compute node is running, it relies on the IB gateways (NM2-GW) to provide both internal (IPoIB) and external (EoIB) network connectivity. Each compute node is physically connected to two IB gateways by copper cables, the IB gateway in turn is connected to customer's 10GbE infrastructure, typically a level 2 switch.

By default, only link-based failure detection for IPMP group is enabled on compute node running Solaris. This default setting remains the same even if a compute node has been upgraded from Solaris 11 Express to Solaris 11.1 on X2-2 hardware or on X3-2 hardware where Solaris 11.1 can be installed directly.

The limitation of link-based failure detection is that it cannot detect failure over the link between IB gateway and customer's infrastructure, that means even if that link goes down, bond1 will not fail over and therefore 10GbE connectivity to the compute node is lost.

In fact, there exists a scenario where even the link between compute node and IB gateway failed, bond1 will not fail over but that's a topic for another blog entry.

For customers running Solaris 11 Express, the solution is enable Probe-based Failure Detection, the downside of this solution is we need 2 additional IP addresses for each IPMP group. It could be a challenge for customers running tight on IP addresses.

On Solaris 11.1, we have a better solution called Transitive Probe-based Failure Detection and it does not require additional IP addresses to be assigned to the IPMP group members.

To enable Transitive Probe-based Failure Detection, run the following commands on a compute node:

#svccfg -s svc:/network/ipmp setprop config/transitive-probing=true #svcadm refresh svc:/network/ipmp:default

If default gateway is already configured for bond1, it will be used as the target system, otherwise you will need to create a host route to a particular system that you would like to probe.

To check if Transitive  Probe-based Failure Detection is working, run the following command:

root@el01cn01:~# ipmpstat -t INTERFACE   MODE       TESTADDR            TARGETS eoib1       transitive <eoib1>             <eoib0> eoib0       routes     el01cn01-pub bond0_1     transitive <bond0_1>           <bond0_0> bond0_0     multicast  el01cn01-priv  el01cn05-priv el01cn04-priv el01cn02-priv el01cn06-priv el01sn-priv

See the Solaris official documentation here on how to specify a target system for probe-based failure detection.


Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Recent Content