To communicate with any process, you need to know at least 3 things about it - IP address, port number and protocol. Fusion Applications comprises many running processes. End users communicate directly with only a handful of them, but all processes communicate with other processes to provide necessary services. Understanding various IP addresses and listen ports is very important to effectively troubleshoot communication between various components and identifying where the problem lies.
We will start by describing some of the key concepts used in this post. We will then demonstrate their relationships and a way to get details about them so that you can use them while troubleshooting.
IP address - an identifier for a device on a TCP/IP network. Servers usually have one or more Network Interface Cards (NICs). In the simplest configuration, each NIC has one IP address. This is usually referred to as the physical IP address and is also the IP address mapped to the network hostname in the Domain Name System (DNS).
Virtual IP (VIP) - an IP address that does not correspond to a physical network interface. You can assign multiple IP addresses to a single NIC card. These virtual interfaces show up as eth0:1, eth0:2 etc. The reason to use VIP instead of physical IP is easier portability. In case of hardware failure, you can assign the VIP to a different server and bring your application there.
Host Name - the name assigned to a server. This name is usually defined in corporate and optionally in public DNS and maps to the physical IP address of the server. In FA, we refer to two different types of hostnames - physical and abstract.
Physical hostname is the one defined in DNS and recognized across the network.
Abstract hostname is like a nickname or an alias - you can assign multiple nicknames to the same IP address. For example, a person may officially be recognized as Richard, but his friends may call him Rick, Dick or even Richie. In FA, abstract hostname to IP address mapping is defined in the hosts file instead of DNS so that the alias to IP address mapping can be kept private to only a particular instance of Fusion Application.
Listen Port - serves as an endpoint in an operating system for many types of communication. It is not a hardware device, but a logical construct that identifies a service or process. As an example, an HTTP server listens on port 80. A port number is unique to a process on a device. No two processes can share the same port number at the same time.
Ephemeral Port - a short-lived port for communications allocated automatically from a predefined range by the IP software. When a process makes a connection to another process on its listen port, the originating process is assigned a temporary port. It is also called the foreign port.
Think of a telephone communication. If you have to reach an individual and you do not wish to use your personal number since you want it to be available for incoming calls, you can use any available public telephone to make the call. This public phone also has a telephone number. It remains busy so long as you are on the call. As soon as you hang up, it becomes available for use by any other individual. Ephemeral ports are the equivalent of public telephones in this example.
Listen Address - the combination of IP address and listen port. A process can listen on various IP addresses, but usually on only one port. E.g., DB listener listens on port 1521 by default. In the most common configuration, a process either listens on a single IP address, or all available IP addresses.
Let us see how to use a useful and simple command "netstat" to identify and understand the addresses and their relationships.
In the simplest usage, netstat prints the following columns:
|Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name|
The important ones for this post are the following:
Proto: Protocol used for this port
Local Address: The listen address. The first part is the IP address; second is the port number
Foreign Address: The address of the remote end point of this socket. Think of it as the phone number of the public telephone you used to call your contact. Similar to the listen address, the first part is the IP address; second is the port number
State: State can have various values. The important ones are:
LISTEN: The socket is listening for incoming connections. Foreign address is not relevant for this line
ESTABLISHED: The socket has an established connection. Foreign address in the address of the remote end point of the socket.
CLOSE_WAIT: The remote end has shut down, waiting for the socket to close.
Let us now look at some example outputs of the netstat command. In the first example below, let us look at some listen sockets:
| Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
1. tcp 0 0 :::10214 :::* LISTEN 4229/java
2. tcp 0 0 0.0.0.0:10206 0.0.0.0:* LISTEN 4230/nqsserver
3. tcp 0 0 10.228.136.10:10622 0.0.0.0:* LISTEN 3501/httpd.worker
All 3 of the lines above show that the respective processes are listening for an incoming request on a particular address. This is evident from the value under the "State" column.
In lines 1 and 2, the processes are listening on all available IP addresses. This shows up in netstat as either "0.0.0.0" or a string of empty delimiters - ":::". The listen port is 10214 and 10206 for java and nqsserver respectively. This means you can connect to these programs using any IP address that is bound to any interface on that server, so long as you use the right port number
In line 3, the process is listening on port 10622 only on IP address 10.228.136.10. This means that any request coming on a different IP address will not reach the process, even if the IP address is of the same host.
This is particularly important when you use VIP for enabling dynamic failover of various components, such as SOA Server. In this case, telnet to physical_ip:port_number will fail, but to virtual_ip:port_number will succeed.
Let's now look at another example of some established connections:
| Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
1. tcp 0 0 10.228.136.10:10212 10.228.136.10:39709 ESTABLISHED 4232/nqsclustercont
2. tcp 0 0 10.228.136.10:12927 10.233.24.88:5575 ESTABLISHED 3524/httpd.worker
3. tcp 0 0 ::ffff:10.228.136.10:29024 ::ffff:10.233.24.88:1521 ESTABLISHED 19757/java
Established connections show details of both ends of the socket. Based on our analogy above, "Local Address" and "Foreign Address" are 2 ends of a telephone connection. However, just by looking at a single line of netstat output for a socket, it cannot be determined which one is the listen port and which one is the connection requestor.
Please note that "Foreign address" doesn't have to be a different host, or even a different IP address. It is simply the other end of socket connection and hence is a different process, which may be running on the same host or a different one. The local address always refers to an IP address on the same host.
To determine which one is the listen port and which one is ephemeral port, you need to determine if there is a LISTEN socket for that particular IP address. To do that, make sure you are on the host that the IP address is tied to, and run the following command:
|netstat -anp | grep <local_address_port_number> | grep LISTEN|
If it returns a LISTEN socket, then that is the process to which a client is connected. The client information can be found by running a similar command on the host referred to in "Foreign Address."
If a LISTEN socket is not returned, the Foreign Address refers to the LISTEN socket and this line of netstat output refers to a client connecting to a remote process.
Let's now try to apply this understanding to troubleshoot a simple problem - unable to access a web page.
You are trying to access the WebLogic administration console of CommonDomain and are unable to do so. To access the page, you type a URL similar to:
Note: The values used in this scenario are for demonstration purpose. You should substitute the appropriate values based on your environment
The first step is to identify which component has a problem. For that you need to understand the components involved. In a typical enterprise deployment of FA, a Load Balancer (LBR) sits in front of the HTTP Server, which in turn communicates with the WebLogic servers. Since the console application is deployed on the AdminServer, HTTP Server in turn talks to AdminServer of CommonDomain.
Here is a graphical representation of this flow:
As you can see, the request flows through the LBR and the HTTP Server before reaching the AdminServer.
When you are unable to access a web page, the following are some of the common reasons:
1. Problem with name resolution
2. Problem with network layer preventing communication between 2 components
3. One or more components are down or unresponsive
We can use some basic tools to identify where the problem is. These tools are ping, telnet, netstat, lsof and ps. Once we have identified which component has a problem, we can figure out what is causing it. In this post, we will keep our focus on finding which component has a problem.
So let us walk through the request flow:
1. Check the name resolution to hostname/VIP in the URL (in our example common-internal.mycompany.com). Use the ping utility to determine if you can resolve the name, and contact the IP address. Since your browser is trying to contact the server, you will run ping utility on the desktop or device which is accessing this URL
Ping may not work if ICMP is disabled, but it will return the IP address that the name resolves to. Make sure this is the correct IP address. If it is not, the problem is with name resolution. If it returns the correct IP address, please move to the next step.
2. Make sure the port you are trying to reach on the hostname/VIP is reachable. Similar to #1 above, you will run telnet utility on the device you are trying to access the URL from.
|telnet common-internal.mycompany.com 80|
If telnet is unsuccessful, it means LBR cannot be reached on the HTTP port. This could be due to 2 reasons:
a. LBR is not listening or is down
b. Firewall or network issues are stopping this communication
If telnet is successful, please move to the next step.
3. The next step is to make sure LBR is configured properly and is routing requests to the HTTP Server(s) on the right port. Since LBR is a component outside of the FA stack and is usually managed by the network/security team, troubleshooting it is outside of the scope of this document. Make sure the team managing LBR confirms the configuration as well as reachability of the web server.
4. Another way to eliminate LBR and figure out if the problem is within FA stack and/or network communication between the components of the FA stack only is to directly access the URL from the web server. The initial URL we used resolved to LBR. We now need to figure out how to directly access this URL from the HTTP server. This can be done by changing the hostname in the URL to the hostname of one of the HTTP servers. Also change the port number in the URL to that of the listen address of the appropriate Virtual Host of the HTTP Server.
The virtual host configuration is stored in one of the files under $INSTANCE_HOME/config/OHS/<component_name>/moduleconf directory. These files are named FusionVirtualHost_<domain_short_form>.conf, where domain_short_form is fs for CommonDomain, hcm for HCMDomain, and so on.
Since common-internal is used for CommonDomain, we will look at FusionVirtualHost_fs.conf. The first few lines in this file specify the listen addresses (one for HTTP requests and one for HTTPS):
|## Fusion Applications Virtual Host Configuration
The <VirtualHost> section specifies the VirtualHost configuration and this can be used to identify the mapping between LBR port and HTTP server port:
|#Internal virtual host for fs
<VirtualHost fusionhost.mycompany.com:10613 >
So common-internal.mycompany.com maps to fusionhost.mycompany.com:10613
Now we can change the URL and try to access it directly. The new URL will be http://fusionhost.mycompany.com:10613/console. Please note that your organization may block direct access to servers on ports other than SSH from the desktops. In this case, you can access this URL from a browser running on the HTTP server itself.
If this URL works, the problem is with components before the HTTP server, namely, LBR and Desktop and the network between them.
If this URL also doesn't work, please move to the next step.
5. Now we need to check whether Oracle HTTP Server (OHS) is working of not.
a. Check if it is running. Use "opmnctl status -l"
b. Check if it is listening on the port of interest - in this case, port 10613.
|TestCha||-bash-3.2$ netstat -anp | grep 10613 | grep LISTEN(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)tcp 0 0 10.228.136.10:10613 0.0.0.0:* LISTEN 3501/httpd.worker
If the above commands are unsuccessful, we need to troubleshoot why OHS is not running.
If the above commands are successful, please move to the next step.
6. Now let's focus our attention to the final component of the flow - WebLogic AdminServer. The first step is to identify the listen address. By default, FA provisioning engine enables AdminServer of a particular domain to listen on the IP address tied to the hostname (physical or abstract) specified in the response file created using the provisioning wizard. Enterprise Deployment Guide for FA recommends changing this listen address to a VIP so that AdminServer can be manually failed over in a highly available environment. Similar steps are recommended for automatic migration of SOA Servers of each domain and BI Server. Once the change to listen address is made, HTTP Server configuration needs to be edited to point to the new address.
So let's determine if HTTP Server is configured properly and can communicate with the WebLogic Server - in this case AdminServer of the CommonDomain.
First determine where does the Location "/console" is configured to. For CommonDomain, open FusionVirtualHost_fs.conf and look for "/console" under the internal virtual host section.
|## Context roots for application consoleapp
<Location /console >
In our example, we have configured HTTP Server to direct incoming requests for "/console" to WLS running on host fusionhost.mycompany.com and port 7001.
First, we will verify connectivity to AdminServer.
Please make sure that this hostname maps to the right IP address.
Now use telnet to connect with AdminServer
|telnet fusionhost.mycompany.com 7001|
If telnet succeeds, make sure port 7001 is not accidentally in use by a different process. To do so, login to the host where AdminServer is supposed to run and issue the following command:
|netstat -anp | grep 7001 | grep LISTEN|
|-bash-3.2$ netstat -anp | grep 7001 | grep LISTEN
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 ::ffff:10.228.136.10:7001 :::* LISTEN 14853/java
Make sure the process returned by the above command is actually the AdminServer.
If telnet fails, then it could point to the following:
a. HTTP Server is not configured to point to the correct listen address. Look at AdminServer configuration and make sure the Listen Address matches with HTTP server configuration.
b. AdminServer is either not running or not responding. Look at the AdminServer logs to determine it is healthy.
c. Network issues or firewall is blocking the communication.
This concludes our troubleshooting of a failure of a web page request. As you can see, understanding listen addresses and rudimentary network tools is important in troubleshooting communication issues in Fusion Applications. This knowledge can be applied to non-HTTP requests; to products other than FA - even non-Oracle products.