Understanding connectivity issues in an ODI environment requires a good grasp of the ODI architecture, as well a good understanding of where to look for information when such problems arise. We will review all these elements to help with diagnosis and resolution.
How to Understand and Diagnose ODI Connectivity Issues
We will start with a quick review of the ODI architecture to make sure that all components are in place when we will look for connection issues. Then we will look at connection issues that can be experienced on the agent or studio side.
When considering ODI connectivity issues, we have to take into consideration all the elements represented in Figure 1:
Figure 1: Components of the ODI architecture
We will start with a quick review of these components:
ODI repository: this is the foundation of the ODI architecture. If the connection to the repository fails, nothing else will work. The ODI documentation has very detailed recommendations to make sure that the repository is always available, in particular here: http://docs.oracle.com/cd/E25054_01/core.1111/e10106/odi.htm#BGBGJGGA
ODI Agent: the agent is the orchestrator of all executions. As such, the agent will not work if it cannot connect to the repository. Whether you are using a standalone Agent or a JEE agent, the necessary JDBC drivers must be in place to connect to all sources and targets no matter what mechanisms are used to transfer data to and from these systems: these drivers will be used to send code (SQL code or scripts) that the different systems must execute. For ODI 11.1.1.x the drivers for the standalone agents will be under the /drivers directory under the agent installation folder. ODI also ships with DataDirect JDBC drivers. For ODI 11.1.1.x, these drivers are installed under %ODI_HOME%/odi_misc.
ODI Studio: the studio will not work if it cannot connect to the repository. In addition, some developers will connect directly to some source and target components directly from the Studio, if only to reverse-engineer metadata or to view data from specific tables. For these operations, the necessary JDBC drivers must be in place for source and target systems. Custom drivers for the studio must be installed under %APPDATA%\odi\oracledi\userlib
Source and target databases: no particular requirements as long as they are accessible to the Agent (and Studio if needed).
Firewalls: it is recommended to identify all firewalls that control traffic to and from the different elements described here (repository, agents, studio, sources and targets) in order to accelerate troubleshooting in case of connectivity issues.
Connectivity issues will manifest themselves either on the studio side or on the agent side. The following step by step instructions will help us make sure that connectivity is possible to and from all the different components.
All the necessary connections for the Studio to operate properly are illustrated in Figure 2 below.
Figure 2: Required connections for ODI Studio
To validate that all connections are working properly, run the following tests:
Test that you can connect to the repository: if you start the studio and it refuses to connect, then either the database hosting the repository is down, or there is a network issue preventing you from connecting.
Test the connection to the agent from the Topology navigator to make sure that the agent is up and running and that you can connect to that agent from the Studio, as shown in figure 3 below.
Figure 3: Testing the agent from the Topology navigator
To check whether you can connect directly to the source and target databases, try to view data from the models or interfaces (warning: choose tables with a limited number of records or it will take a while for the rows to be displayed) as illustrated in figure 4.
Figure 4: Right-click on a Datastore to view data
All the necessary connections for the Agent (JEE or Standalone) to operate properly are illustrated in Figure 5 below.
Figure 5: Required connections for ODI Agents
To validate that all connections are working properly, run the following tests:
Test the connection to the agent from Topology (as we saw earlier in figure 3) to make sure that it is up and running and that you can connect to that agent from the Studio. If that test fails, make sure that the agent is up and running, that it was started on the port number defined in Topology and that no firewall prevents the connection between the Studio and the agent.
Make sure that the agent can connect back to the repository. An easy test is to create a very basic package that only contains a “variable refresh” that will run a simple query against that database (for instance “select sysdate from dual” on Oracle). Generate a scenario from that package and run this scenario with the agent. Use the ODI Operator to view the ODI logs: as long as something appears for this execution (successful or not –all we are validating at this point is the connection to the repository) then the agent does indeed connect properly to the repository.
Can the agent connect to the source and target databases? You can test the connection to all physical servers from Topology and run that test through the agent: in the Topology navigator, open the definition of a physical data server, and click the Test Connection button as shown in figure 6.
Figure 6: Test database connection
Then when prompted for the connection test, select the agent in the drop down as shown in figure 7.
Figure 7: Using an agent to test the database connection
This will validate that the agent can connect to the database. You will have to repeat this test for every database in the environment.
The most efficient data transfers will be performed with database utilities and direct connections from server to server, even if these operations are orchestrated by the ODI Agent. For these techniques to work, it is imperative that the source and target servers can see one another. It is recommended to try and ping the target servers from the source servers, and to ping the source servers from the target servers. This will guarantee that data can move from server to server in the most efficient manner.
Connectivity to flat files can also present some challenges. In this case, typical connectivity issues all come down to privilege issues. Whether you are trying to access a file from the Studio or with an agent, the user that started to program (Studio or agent) must have sufficient privileges on the Operating System to access the file. An easy way to validate this is to connect to the operating system using that same user name and to try and edit the file from a command line. If this operation fails, then ODI access to the file will be similarly restricted.
Testing that you can connect from one element of the architecture to the next may not be enough to guarantee that all connectivity issues are at bay. Any of the connections we have validated above can be severed at any point in time: firewalls and databases can be configured to timeout when a connection looks idle for a while and these timeouts will just end-up in a connection loss from an ETL perspective.
Regardless of which connection is lost, understanding what information to look for is key to understanding what is happening.
The most frustrating connection loss is when the connection between the agent and the repository is severed by a 3rd party component (Firewall or database). If the agent loses the connection to the repository, there is no way for it to update the logs in that repository and indicate that there is a connection issue. SQL code that is being executed by the databases will continue to run. It typically looks like the agent “hangs”. The agent will try to re-connect to the repository, but if that connection is gone for good, you may eventually have to restart the agent. When the agent restarts, it will mark the stale sessions as bad (this restart process is described in details in section 7.2.1 and 7.2.2 of the ODI documentation available here: http://docs.oracle.com/cd/E25054_01/core.1111/e10106/odi.htm#BGBEGCBB)
One way to identify such connectivity problems is to monitor the agent stderr output. Note that pinging the agent (from ODI Topology with the “Agent Test” or with the ODI tool ODIPingAgent) can still be successful: all these tests are validating is that the agent is up and running, but they do not validate that the agent can connect to the repository. In our current use case, the agent is indeed up and running.
Running a small scenario, similar to the one we have described earlier when testing the agent’s connectivity to the repository will be a better validation of the ability of the agent to connect to the repository and the databases.
This type of problem is easier to diagnose since error messages will be properly reported in the ODI logs (the agent can report the errors to the repository in this case).
The error messages will differ from one JDBC driver to the next, but can typically contain one of these errors:
The connection does not exist. Connection reset by peer. (…) before connection was unexpectedly lost.
These messages are usually issued by the JDBC drivers and then relayed as such by ODI. Here is an example of such a message from an AS400 JDBC driver:
ODI-1241: Oracle Data Integrator tool execution fails. Caused By: java.sql.SQLException: The connection does not exist. at com.ibm.as400.access.JDError.throwSQLException(JDError.java:382) at com.ibm.as400.access.AS400JDBCConnection.checkOpen(AS400JDBCConnection.java:394) at com.ibm.as400.access.AS400JDBCConnection.sendAndReceive(AS400JDBCConnection.java:2570) at com.ibm.as400.access.AS400JDBCStatement.close(AS400JDBCStatement.java:434) at com.ibm.as400.access.AS400JDBCPreparedStatement.close(AS400JDBCPreparedStatement.java:436) at com.sunopsis.sql.SnpsQuery.close(SnpsQuery.java:386) at com.sunopsis.dwg.tools.WaitForData.actionExecute(WaitForData.java:691)
Note the prefix to the error messages:
com.ibm.as400.access.AS400JDBCConnection This indicates that the error has been identified by an AS400 JDBC driver. And that driver reports that it just lost connection to the database itself…
The most common reasons for network connection loss are:
Firewalls that drop the connection: most firewalls sever connections after a certain amount of idle time (defaults of 10 and 30 minutes are observed quite regularly). Since the ODI agent will send SQL code to be executed by the database and wait for the completion of that code before issuing any other commands, it is very possible that the connection remains idle for some time.
The same is true with database timeout parameters: many databases will sever what looks like an idle connection. In an ETL process, ODI may run some code on the source system, then on the target system, and then try and cleanup some temporary data on the source system once everything has been committed to the target database (this will be the case for some CDC implementations for instance). If the integration time on the target side is superior to the database timeout on the source side, the source connection will be long gone when ODI will try and do its house cleaning.
If you are interested in learning more on this subject, Oracle support has a very comprehensive guide to troubleshoot connectivity errors. Look for note ID 850014.1 at http://support.oracle.com.
For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit “Oracle A-Team Chronicles for ODI”.