In this article i want to focus on the troubleshooting of a LB connection.
The topology used has a Public LB in a Public Subnet and the back-ends are in the same VCN in a Private Subnet. This topology can is depicted bellow.
The LB has a HTTP Listener on port 80 and the health-checks against back-ends can be done on HTTP or TCP.
The Health check are done from the internal IP addresses of the LB nodes (primary and secondary) that are part of the subnet that was assigned during provisioning. The IP addresses are the next available IP from the subnet and unfortunately, we do not know those ip addresses when we provision the LB and we can't reserve them. A good practice would be to have a small subnet only for the LB and in this way we will know the internal ip addresses.
The only way to get the internal IP addresses of the LB is by doing a tcpdump on the back-end servers and listen for the health checks packets. In our case is the TCP 80. Please note that on a VM based on OEL Linux you will see HTTP traffic for 169.254.169.254. This is used to deliver the VM metrics to the control plane.
If we do not see any traffic on the health-check port, we need to check the security lists/NSGs and the local firewall on the instance.
Checked the NSG attached to the LB for a rule to permit the egress traffic towards the back-end subnet. In the security list attached to the subnet where the back-end server is located, check if traffic on port 80 is permit-ed.
On the local VM, check the local firewall. In my case it is the firewalld.
Identify the active zone:
List the configuration for the active zone:
firewall-cmd --list-all --zone=public
Add the http service:
firewall-cmd --zone=public --add-service=http
firewall-cmd --zone=public --permanent --add-service=http
Do again the tcpdump and exclude the host used for metrics:
tcpdump -nni ens3 port 80 and not host 169.254.169.254
You can notice that the private ip addresses of the loadbalancer are 192.168.20.3 and 192.168.20.4
To trouble shoot the traffic processed by the LB, beside using tcpdump on the back-end server, we can also use the OCI logs. They can be activated by using the following:
Navigate to Logging > Log Management > Log Groups and create a log group:
Open the group that was just created and navigate to Logs. Click "Enable Log" and choose the LoadBalancer service, and the LB as a resource:
Navigate to Log Management > Log Group > Logs and notice the access logs from the loadbalancer:
There are 3 logs entries for each access. If we expand each entry we can see what was done each time. You will see: the IP address of the back-end processing the request, the IP address of the LB, the IP Address of the client and the type of the request.
In the screenshot below you can see that the client requested the "/" of the website.
In the screenshot below you can see that the client requested the "nginx-logo.png".
If we look in the nginx folder we can see the following files:
When the client requests the "/", the web server will return the content of the index.html. Looking inside this file in the html code we can see that there is a section that mention the png file that we saw in the access logs from the LB.
alt="[ Powered by nginx ]"
width="121" height="32" /></a>
alt="[ Powered by Fedora ]"
width="88" height="31" /></a>
Examining the logs we can identify the exact files accessed and the back-end server which responded to the request.
In this post i showed the method to identify the private ip addresses of the load balancer and explored the access log of the LB.