A routing scenario, is an asymmetric traffic path allowed in OCI? – Part Two

Some time ago we discussed about an important topic regarding the asymmetric traffic, if it is allowed or not in OCI. The conclusion we had after the analyze of the case was the asymmetric traffic specific for that use case is permitted in OCI.

One question still remains, does OCI permits all the scenarios where the asymmetric traffic appears? I will present you a second case of asymmetric traffic and as we go forward, we will conclude if it is permitted or not. This very interesting case was raised by one of the customers few weeks back.

The networking topology I will use for explanation is a simple version of what my customer is using, but enough for our scope, check it below:

network topology

The VM in the Shared Services VCN, 10.0.0.0/29 subnet at 10.0.0.2 IP address, is about to access via the OCI NFW at 10.0.0.12 in the same VCN but different subnet, the API Gateway in the API Gateway VCN, 10.1.0.0/25 subnet at 10.1.0.20 IP address. At the very first sight this looks a very simple case.

Following, I will present you the routing configuration that was in place when I received the case.

a) Subnet 10.0.0.0/29 is using a route table with the following route rule:

rt1

This is needed since the request is all the traffic should be inspected by the OCI NFW.

b) Subnet 10.0.0.8/29 where the OCI NFW is created is using a route table with the following route rule:

rt2

The OCI NFW after the traffic is analyzed and if it matches a permit rule, it sends it to the DRG for forwarding.

c) The traffic arrives at the Shared Services DRG VCN attachment and the DRG is using the below attachment route table routing rule to send the traffic to the API Gateway at 10.1.0.20:

rt3

So, until now, all is looking just fine and the API Gateway should receive the traffic.

d) The returning traffic from the API Gateway is using the API Gateway DRG VCN attachment route table with the below routing rule:

rt4

e) And for the traffic to reach 10.0.0.2 via the OCI NFW, the DRG is using the DRG VCN route table with the following route rule:

rt5

Technically the default route from the above DRG VCN route table should send all the traffic to the OCI NFW.

Even if the routing configuration looks correct, the customer complained about the sessions to the API Gateway does not establish. Now, let’s verify what is happening.

connection hangs

So, it seems that at a certain point in the packet exchange, the connection just hangs. The curl and telnet commands are confirming that the connection is not responding further. The customer reported the curl command is not responding after the TLS Client hello. This is also what we have above.

A normal step would be to verify the OCI NFW logs and check if the policy was used and how the connection looks like over the OCI NFW:

nfw log1

After analyzing the OCI NFW logs something got our attention, the bytes received counter being 0. We can see the bytes sent but nothing received back. Since the DRG is using the VCN route table above with the default route to send all the traffic to the OCI NFW, we should see the bytes received not 0, however, that counter is 0.

In order to continue the troubleshooting process and have a bit more visibility, let’s create a VM in the same subnet with the API Gateway and perform some tests. The VM used for troubleshooting owns the 10.1.0.105 IP address, the one depicted in our networking topology above.

Initiating a connection from 10.0.0.2 to the newly created VM at 10.1.0.105 and running tcpdump on both machines, we can find a very valuable information:

traffic1

During the TCP 3W-H we can observe the TCP segment with the SYN flag set is received by the destination VM, it responds with the (SYN, ACK) and the (SYN,ACK) is received by the source. The source now sends the ACK in order to complete the TCP 3W-H. The issue revealed by the tcpdump is that the last TCP segment with the ACK flag set is not received by the destination and the destination keeps sending the (SYN, ACK). It is very clear now that the TCP 3W-H does not complete and the traffic on port 22 cannot start (if you notice that the telnet command cannot be closed with CTRL+C, this is an indication that the TCP 3W-H is not able to complete).

That being said, what is causing the TCP segment with the ACK flag set to not be received by the destination, where is the drop? The answer is at the OCI NFW due to the asymmetric traffic pattern, below are the details:

topology_1

The two green lines, 1 and 2 are the TCP SYN and TCP (SYN, ACK). Packet 2 is very important since this packet is not going through the NFW but it is forwarded directly by the DRG to 10.0.0.2. The NFW does not sees this packet. When the third packet (the red one which is the ACK) is sent by the source to complete the TCP 3W-H it will be sent to the NFW but NFW is dropping it. We are calling this scenario an Incomplete TCP Handshake.

The TCP Incomplete Handshake refers to a TCP 3W-H which does not complete successfully. For example, the 10.0.0.2 sends a SYN packet, 10.1.0.105 responds with SYN-ACK, but the 10.0.0.2 ACK packet never reaches 10.1.0.105. 10.1.0.105 is left in a half-open state, waiting for the final ACK.

We made very important steps forward to resolve the case, however there is still a question, why the OCI NFW is not receiving the TCP (SYN,ACK) causing the drop of the ACK? The answer to this last question will resolve entirely the case. We already have the answer to this question but it is not obvious. Let’s do an exercise and look at point e) from above:

e) And for the traffic to reach the OCI NFW, the DRG is using the DRG VCN route table with the following route rule:

rt5_1

When we looked at the route rule above we might think that all the traffic will be sent by the DRG to OCI NFW at 10.0.0.12 regardless what the traffic destination is, including 10.0.0.2 destination.

That is true for 99% of the cases but it not available here. Why? This is because of the specificity of the networking topology used here, the source VM is in the same VCN with the OCI NFW. You can replace for example the source VM with a public LBaaS for example and the results will be the same.

How the DRG processes the traffic in this scenario (with only de default route configured in the DRG VCN route table):

1. For any destination outside the Shared Services VCN, the DRG will use the default route and will send the traffic through the OCI NFW;

2. For any destination in the Shared Services VCN, the DRG will send the packet directly to the destination VM, the OCI NFW will not be used as next-hop;

The two above points are per the DRG design and these are not considered issues at all. However, how we can force the DRG to send the traffic to the OCI NFW even if the destination subnet is in the same VCN with the OCI NFW? We need to configure a more specific route for 10.0.0.0/29 subnet, as below (this falls under the Intra-VCN traffic inspection design):

rt6

Now, let’s give it a try and see if we can get to the API Gateway:

connection ok

Checking the OCI NFW logs to verify the bytes receives (previously was 0):

log2

It is crystal clear now that the OCI NFW does receive the TCP (SYN, ACK), it accepts the ACK and from there, the traffic is starting to flow as normal. Our problem is resolved now, it was just about a route rule. Enjoy!

A routing scenario, is an asymmetric traffic path allowed in OCI? – Part Two

Andrei Stoian

Master Principal Cloud Architect | North America Cloud Engineering

Using Resource Scheduler to Make Tag Defaults Self-Updating

OCI Private DNS - Best Practices

A routing scenario, is an asymmetric traffic path allowed in OCI? – Part Two

Authors

Andrei Stoian

Master Principal Cloud Architect | North America Cloud Engineering

Using Resource Scheduler to Make Tag Defaults Self-Updating

OCI Private DNS - Best Practices