Depending of the layer you might need to get the carrier, DC provider, and OCI involved as each party has a peace of the connection to troubleshoot.
The diagram below shows the two FastConnect options and the physical portion of it. For these two options Oracle provides a Letter of Authorization (LOA) from the Oracle Console for the customer to order the circuit or cross-connect and to be deliver to the proper termination point in the Oracle side.
Make sure the physical connectivity of the circuit is 100% provisioned end to end before continuing.
● For a cross-connect the data center provider will run a fiber from customer cage to Oracle’s cage (green line).
● When deploying a circuit (blue line) there are several entities that deliver a portion of the circuit and all of them have to finish provisioning their portion of the circuit before the circuit can be fully functional. For example if the long-haul carrier completes their portion of the circuit but if the last mile provider or the cross connect is not done then the circuit is not ready. Once all parties have confirmed the circuit is provisioned, they have performed some initial tests, and hands the circuit to the customer, then the first step is to verify physical connectivity end to end.
To verify physical connectivity, the customer needs to review their on-prem CPE (router or switch) interface/port to check the up/down status. The interface/port will provide details from the transceiver (optics hardware) if it is getting signal from the other end of the circuit.
Customer can also review the Oracle Console and check the light levels on the Oracle side as well. Under FastConnect select Cross-connect and on the information page on the right side it will show a picture like this
If the light levels are not good or strong enough customer needs to work with the provider to check the circuit and solve any provisioning issues otherwise the data transmission will be affected.
On this layer we are troubleshooting the virtual circuit that needs to be created over FastConnect. The virtual circuit could be private or public. This is only possible if the physical circuit (Layer 1) is good and set properly. At this layer the VLAN information is the most important information and both ends need to use the same information. In a long-haul circuit the VLAN information is provided by the carrier/customer which the customer will need to enter when creating the Virtual Circuit in the Oracle Console and configure on the on-prem CPE. If the circuit is clean, the customer and Oracle will see the MAC address from each other, they learn the MAC address and the ARP table is updated accordingly on both sides to start working with the upper layers.
With FastConnect with a third-party, the carrier serves multiple customers and will use QnQ to tunnel the traffic for each customer within their network and it will tag the traffic probably with a different VLAN id. The on-prem CPE and the Oracle router performs VLAN tagging using 802.1q encapsulation before sending it to the carrier.
With FastConnect colocation each virtual circuit created over the cross-connect needs to be tagged with a VLAN using 802.1q encapsulation.
Once Layer 2 is working your next step is to troubleshoot the Network layer. At this layer we are talking IP addresses so your first test will be to use ping and make sure you can ping the other end of the connection (on-prem CPE to Oracle Edge). With a ping test customer can also get the latency information between the two end points. This information is very useful to make sure that the latency will not affect any application that will use FastConnect. Remember to record this information as the baseline, in the feature if there are issues with the circuit you can return this information to verify how it was working when deployed. When configuring the Virtual Circuit in the Oracle Console the customer or Oracle provides a /30 or /31 subnet depending if it is a private or public virtual circuit to address the BGP peers. It is a good idea to use these IPs to test layer 3 connectivity. Once this is confirmed then customer can configure the BGP session and make sure the peering relationship is established and both ends exchange route information.
If this layer is successfully tested you could also do some initial tests from a host within on-prem to a host within your Virtual Cloud Network (VCN) but you could run into issues for the next layer.
If layer 3 connectivity is successful the next step is to troubleshoot the Transport layer. At this layer we are talking about TCP, UDP, ICMP and other protocols. As mentioned at the end of the previous section you can perform a simple test from a host within on-prem to a host on your VCN. Ping might not work initially because ports are blocked on either end, you could also try SSH. If end to end connectivity is not working here is a list of items to check:
1) Is the traffic allowed? - Check at both ends if firewalls or access-lists, and security lists or network security groups on the Oracle side are allowing the traffic between on-prem and OCI. Make sure the ports are open for the applications on both directions. VMs within OCI also have a firewall which can be blocking traffic. Check to see the iptables or firewalld to see if it is allowing/blocking the traffic
2) Routing - Check routing on both ends of FastConnect. On-prem the CPE should be advertising the OCI routes to on-prem. On the OCI side the subnet route table needs a route for on-prem networks pointing to the DRG.
Customer can also perform additional tests before moving the circuit in production to make sure the circuit is solid and provides the contracted capacity. Customer can use iPerf to test the capacity of the circuit. If you run into into performance issues with the circuit, work with the carrier to clean the circuit or provide the contracted bandwidth.
To perform an iPerf test customer requires two hosts or VMs, one at each end of the connection. Setup one side as the server side and the other as the client side and use different options to load traffic into the circuit for testing. The customer can modify the window size, the traffic type (TCP or UDP), and send single or multiple streams of traffic. This test will provide a good baseline as the capabilities of the circuit which can be use at a later time as a reference in case there are problems with the circuit.
For iPerf to work the firewalls and security lists needs to allow the traffic. By default it uses TCP or UDP port 5001.
Here are some options that can be used with iPerf, this is not the complete list.
-p Specifies the port to use
-t The time in seconds to transmit for
-P The number of simultaneous connections to make to the server. Default is 1
-u Use UDP rather than TCP
-b Set target bandwidth to n bits/sec (default 1 Mbit/sec for UDP, unlimited for TCP)
For example assuming on-prem is the client and OCI is the server side
● Connect to server 10.20.20.20 on TCP port 5001 for 120s and send 10 streams
iperf3 -c 10.20.20.20 -p 5001 -t 120 -P 10
● Connect to server 10.20.20.20 on UDP port 5001 for 120s with a bit rate of 100Mbits
iperf3 -c 10.20.20.20 -u -p 5001 -t 120 -b 100m
Performance can also be impacted due to the MTU size of the connection specially during file transfers. Customer can also modify the MTU size in the connection. OCI VMs by default are set with 9000-byte MTU. Usually Windows hosts are set to 150o byte MTU. Typically circuits are set to 1500-byte MTU. Customer will need to check the on-prem network to see what is the maximum MTU supported and if it can support more then make the necessary changes to increase the MTU along the full path. Hosts on-prem also need checking to see what they are set to as both ends will negotiate by default to the smallest MTU in the path. Customer also needs to work with the carrier to request them to increase the MTU on the circuit if their network supports it.
As you move through the different layers you should be able to troubleshoot the circuit to make it functional and establish connectivity between on-prem and OCI. Sometimes multiple issues are present within each layer, make sure there are no other issues before moving to the next layer. If you have redundant circuits, troubleshooting should be performed on each circuit. You need to check your routing to make sure the tests you are performing is flowing through the FastConnect in question.
Make sure you always check the latest documentation and guides for FastConnect on our public documentation