It was quite some time since I didn’t wrote anything related to IPSec (by the way, one of my favorite topics) but few weeks ago I handled some cases implying IPSec asymmetric traffic. Specifically, my cases used IPSec connections with static routing (so, no BGP involved at all). As we know, for redundancy purposes both IPSec tunnels forming the IPSec Connection were configured and in the UP state for redundancy. That is what we are recommending and it is part from the connectivity best practices.
After my customers configured the IPSec tunnels, the traffic started to have an intermittent working behavior and the connections hanged after some time of activity. At the very first sight the entire OCI configuration looked good, without any points that needs an immediate attention or re-configuration. That is fine, but the issue is not solved at all. A quick action that we tried (I need to be honest and say that I suspected this from the very beginning) was to disable one of the IPSec tunnels from the connection and verify if the traffic from On-premises to OCI was simply restored? And yes, it was restored.
As a side note, there might be some other issues triggering the intermittent traffic or connection hangs like PMTUD but will have a slightly different behavior. Disabling one of the IPSec tunnels from the connection had resolved our issue but we lost the redundancy status. We do not want to lose the redundancy status, right?
So, that being said, what we can do to resolve our issue without losing the redundancy status? Let’s find out.
We discussed about how OCI manages the asymmetric traffic we know that it is allowed. So, we might think that if we have two IPSec tunnels in the UP state and static routing configured without ECMP, OCI can use any of the tunnels for initiating a session to on-premises or responding to the traffic initiated from on-premises hosts to OCI VMs because the asymmetric IPSec traffic is allowed on the OCI side.
Statement: However, in regards to IPSec and static routing when both IPSec tunnels are in the UP state on DRGv2, asymmetric IPSec traffic cannot occur on the OCI side.
We are not taking into consideration the combination between IPSec and FastConnect where in certain circumstances the FastConnect is preferred and where the traffic might arrive to OCI over the IPSec tunnel and OCI can respond to FastConnect – thus the asymmetric traffic will occur simply because OCI will permit it.
Going back to our above statement, let’s verify and demonstrate it.
Our very simple scenario is formed by one IPSec Connection with two IPSec tunnels in the UP state and using static routing to 192.168.0.0/24 on-premises destination. As we know, we cannot configure static routing per tunnel basis but for IPSec Connection, so, the same destination will be reachable over both IPSec tunnels but not at the same time (if ECMP is disabled which actually is the default state).
Importing the IPSec routes into the VCN attachment route table will result in:

Alright, above we have a very important information. The DRG prefers to use Tunnel1 (marked as the Active route) to send the VCN initiated connections to on-premises or to respond. Tunnel2 will be used only and only Tunnel1 will enter the down because at that point it will be the Active one.
So, how the IPSec asymmetric traffic might occur on the OCI side since all the time one tunnel will be Active and another one marked as Conflict? This is the first evidence that using this scenario asymmetric IPSec traffic cannot happen on the OCI side.
For sure, just one evidence is not enough, a second evidence is truly necessary. I ran for 24 hours a test consisting of multiple sessions initiated from the VCN to on-premises and vice-versa and a packet capture ran on the CPE side on Tunnel2. The scope was to verify if Tunnel2 is ever used by the DRG (as you guessed, I tweaked the routing on the CPE side for using only Tunnel1 and to not have any other traffic on the Tunnel2 – only the potential traffic generated by the DRG):

After 24 hours our CPE tcpdump filter was not able to capture any packet because the DRG does not use the IPSec tunnel marked as Conflict. This is the second evidence and the last one which demonstrates that the IPSec asymmetric traffic cannot be triggered by the DRG when using two IPSec tunnels with static routing as in the case depicted above.
It is obvious now that the asymmetric IPSec traffic with stating routing can be triggered by the on-premises CPE. How? That is really very easy, it might send the traffic to OCI VCN via Tunnel2 and OCI will respond via Tunnel1 (as per the Active route), this is IPSec asymmetric traffic and the on-premises CPE if it is a firewall (mostly will be) by default will drop it. There can be some configurations that can be applied on the firewall CPE to disable the IPSec asymmetric traffic check, but in most of the cases the customers doesn’t want to allow the asymmetric traffic from security considerations, which is perfectly fine.
The desired solution is to migrate to BGP over IPSec but if this is not an option then we need to implement on the CPE a very simple and reliable configuration, the most specific vs less specific routing over the two IPSec tunnels. In a nutshell, the customer will configure more specific routes to OCI VCN over Tunnel1 (to match the OCI Active tunnel) and less specific or a summary over Tunnel2.
In this way, if Tunnel1 is up and running it will always be preferred by both DRG and CPE and the asymmetric IPSec traffic will just become a story to tell.
