Installing Oracle Data Integrator (ODI) on Amazon Elastic MapReduce (EMR)

Introduction

This article demonstrates how to install Oracle Data Integrator (ODI) on the Amazon Elastic MapReduce (EMR) cloud service.  Amazon EMR is a big data cloud service, available on the Amazon Web Services (AWS) cloud computing services.

ODI is well documented to run on both the Cloudera and Hortonworks distributions of Hadoop.  ODI can also run on the distributions of Hadoop found on the Amazon EMR cloud service.  This is the second article of four publications that shows how to install, configure, and use ODI on the Amazon EMR cloud service:

 

For a demonstration of how to leverage ODI on Amazon EMR, go to “Webcast: Leveraging Oracle Data Integrator (ODI) with Amazon Elastic MapReduce (EMR).”  Additionally, an ODI 12.2.1 repository with examples of how to leverage ODI with Amazon EMR can be found at “ODI Repository Sample for Amazon Elastic MapReduce (EMR).”

 

Installing Oracle Data Integrator (ODI) on Amazon Elastic MapReduce (EMR)

 

Prior installing ODI on the Amazon EMR cloud service, users must prepare and configure the Amazon EMR cluster for the ODI installation.  To prepare the Amazon EMR cloud service for ODI installation, go to “Preparing Amazon Elastic MapReduce (EMR) for Oracle Data Integrator (ODI).”

 Once users have prepared the Amazon EMR cloud service for ODI, users need to download from Oracle the ODI and the Java SDK installation files.  Go to the Oracle Data Integrator Downloads page, and download the ODI installation files.  Also, go to the Oracle Java Downloads page, and download the Oracle Java SDK installation file – specifically – the Java SE Development Kit tar.gz file.  For a complete list of Oracle certified Java versions for ODI, go to “Oracle Fusion Middleware Certification Matrix for Oracle Data Integrator.”

Install a secured file transfer protocol (SFTP) tool such as FileZilla to transfer the ODI and the Java SDK installation files into the master node of the EMR cluster.  Also, copy any additional jar files such as the JDBC driver file (ojdbc6.jar) required by Sqoop to connect to SQL databases such as the Oracle database.  Figure 1 below shows the ODI and the Java SDK installation files.

 

Figure 1 - Copying Binaries to the Amazon EMR Master Node

Figure 1 – Copying Binaries to the Amazon EMR Master Node

Note 1 - Copying Binaries to the Amazon EMR Master Node

 

When using SFTP tools to transfer files into the master node of the EMR cluster, a PuTTY private key file (.ppk) is required.  Configure the SFTP connection with the PuTTY private key file before attempting to connect and transfer files into the master node of the EMR cluster.

Using the SFTP tool, login into the master node of the EMR cluster, and proceed to transfer the ODI and Java SDK installation files as shown on Figure 1 above.  Use the hadoop user of the EMR cluster to login into the master node – the hadoop user does not require a password.

Configuring Sqoop in the Master Node of the EMR Cluster

 

Amazon EMR 4.5.0 includes the Sqoop utility.  ODI users can use Sqoop to transfer data between Amazon RDS and Amazon EMR.  In order to use ODI with the Amazon distribution of Sqoop, two configurations must be performed:  add the location of Sqoop in the .bash_profile file of the hadoop user, and add the location of the Amazon Java in the sqoop-env.sh file.

Login into the master node of the EMR cluster, and add the location of the Sqoop tool in the .bash_profile file of the hadoop user as shown on Figure 2 below.

 

Figure 2 - Adding Sqoop in bash_profile

Figure 2 – Adding Sqoop in bash_profile

Modify the Sqoop configuration file called /usr/lib/sqoop/conf/sqoop-env.sh and set JAVA_HOME to the Amazon distribution of Java as shown on Figure 3 below.  This step is required in order to launch the Sqoop tool with the correct Java, the Amazon distribution of Java.  Notice that the ODI Agent uses the Oracle Java SDK to launch ODI tasks, but the Sqoop tool must use the Amazon distribution of Java to run successfully.  The following example shows how to modify the sqoop-env.sh:

sudo vi sqoop-env.sh

 

Figure 3 - Setting Java Home in the Sqoop Config File

Figure 3 – Setting Java Home in the Sqoop Config File

Copy any necessary JDBC driver files to /usr/lib/sqoop/lib directory.    These driver files are required by Sqoop in order to connect to SQL databases such as the Oracle database.  The following example copies the ojdbc6.jar file from the hadoop home directory to the sqoop lib directory:

sudo cp /home/hadoop/ojdbc6.jar /usr/lib/sqoop/lib

 

Installing X-Window Software to Access the EMR Master Node

 

In order to install ODI in the master node of the EMR cluster, an X-Window software is required.  The X-Window software allows programs such as the ODI install to run on the EMR master node, but the display of the ODI install can be forwarded to a client computer.  This strategy is known as X11 forwarding.  The X11 forwarding traffic can also be tunneled over SSH; thus, users can securely access the master node of the EMR cluster, and install ODI.

The X11 forwarding strategy can be used to install the ODI binaries, create an ODI repository, and configure the ODI agent in the master node of the EMR cluster.  Additionally, the X11 forwarding strategy can be used to run the ODI Studio in the master node as well.  This strategy offers performance benefits when accessing the ODI repository and the ODI Studio from a remote computer.

Before using the X-Window emulator, use PuTTY to login into the master node of the EMR cluster and install the X11 Authority package as shown on Figure 4 below.  The X11 Authority package enables X11 forwarding between the master node of the EMR cluster and the X-Window client.

 

Figure 4 – Installing X11 in the Amazon EMR Master Node

Figure 4 – Installing X11 in the Amazon EMR Master Node

Once the X11 Authority installation is complete, choose an X-Window emulator software such as Cygwin/X and install it in your client computer.  Launch the X-Windows emulator software in your client computer, and run an X-Window terminal to access the master node of the EMR cluster.  To start X11 forwarding, export the DISPLAY of the X-Window terminal and use SSH to connect to the master node as show on Figure 5 below.  Specify the location and name of the key pair file.  Here is an example of how to connect to the master node using SSH:

ssh -X hadoop@ec2-54-200-52-29.us-west-2.compute.amazonaws.com -i ODIKeyPair.pem

 

Figure 5 – Accessing the Amazon EMR Master Node with an X11 Terminal

Figure 5 – Accessing the Amazon EMR Master Node with an X11 Terminal

 

Installing ODI in the Master Node of the EMR Cluster

 

Using an X-Window terminal, login into the master node of the EMR cluster and identify the Amazon Elastic Block Store (EBS) volume where ODI binaries will be installed, as shown on Figure 6 below.  It is recommended to add an EBS volume on the master node of the Amazon EMR cluster to host the ODI binaries.  For additional information on how to add an EBS volume, go to “Preparing Amazon Elastic MapReduce (EMR) for Oracle Data Integrator (ODI).”

Use the Linux command df –h to identify the EBS volume.

 

Figure 6 – Locating the EBS Volume Drive

Figure 6 – Locating the EBS Volume Drive

Create a directory in this EBS volume, and copy the ODI and Oracle Java installation files from the hadoop home directory to the EBS volume directory.  Then, install the Oracle Java SDK in the new EBS volume directory.  The following example installs the Oracle Java SDK file called jdk-8u51-linux-x64.gz:

tar zxvf jdk-8u51-linux-x64.gz

 

To install ODI, follow the ODI installation instructions found at “Oracle Fusion Middleware Installing and Configuring Oracle Data Integrator.”  Use an X-Window terminal to run the ODI installer in the master node of the EMR cluster, but forward the screens of the ODI installer to the client computer as shown on Figure 7 below.  For an example of how to use an X-Window terminal to login into the master node of the EMR cluster, go to the following section of this article: “Installing X-Window Emulator to Access the EMR Master Node.”

 

Figure 7 – Installing ODI in the Amazon EMR Master Node

Figure 7 – Installing ODI in the Amazon EMR Master Node

When installing ODI, select the Standalone Installation type as shown on Figure 8 below.  Ensure that ODI gets installed on the new EBS volume.

 

Figure 8 – Selecting the ODI Standalone Installation Type

Figure 8 – Selecting the ODI Standalone Installation Type

 

Once ODI has been installed, return to the Amazon AWS console, and add a new outbound rule for the master node to access the RDS database instance that was created on a previous section of this article.  Using the security group of the master node, add a new outbound RDS rule as shown on Figure 9 below.

 

Figure 9 - Outbound RDS Security Rule for the Master Node

Figure 9 – Outbound RDS Security Rule for the Master Node

Proceed to create an ODI repository by launching the Oracle RCU utility.  Set the correct Java, Oracle Java SDK, prior launching the RCU utility.  Use the RDS database instance created on a previous section of this article to host the ODI repository and ODI schemas.

Once the ODI repository is created, launch the Fusion Middleware Configuration Wizard to create a new ODI domain and configure an ODI standalone agent in the master node of the EMR cluster as shown on Figure 10 below.

 

Figure 10 – Installing the ODI Standalone Agent

Figure 10 – Installing the ODI Standalone Agent

When configuring the ODI standalone agent, specify the Master Public DNS of the EMR cluster as shown on Figure 11 below.  The Master Public DNS can be found in the Elastic MapReduce page of the EMR cluster.

 

Figure 11 – Configuring the ODI Standalone Agent

Figure 11 – Configuring the ODI Standalone Agent

Login to the new ODI repository, and select the ODI Topology.  In the ODI Topology, proceed to create the Physical ODI standalone agent.  For the Agent’s host, specify the Master Public DNS of the EMR cluster as well.  Launch the ODI standalone agent in the master node of the EMR cluster.  Test the agent from the ODI Topology.

For additional information on how to create an ODI master and work repository, go to “Creating the ODI Master and Work Repository Schemas.”  For additional information on how to create an ODI standalone agent, go to “Configuring the Standalone Domain for the Standalone Agent.”  For information on how to create an Oracle database instance on Amazon RDS, go to “Preparing Amazon Elastic MapReduce (EMR) for Oracle Data Integrator (ODI)”, under section “Creating the Amazon RDS Instance.”

 

 

Conclusion

 

ODI is well documented to run on both the Cloudera and Hortonworks distributions of Hadoop.  ODI can also run on the distributions of Hadoop found on the Amazon EMR cloud service.  This article demonstrates how to install ODI on the Amazon Elastic MapReduce (EMR) cloud service.

For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-team Chronicles for Oracle Data Integrator (ODI).”

ODI Related Cloud Articles

 

Preparing Amazon Elastic MapReduce (EMR) for Oracle Data Integrator (ODI)

Configuring Oracle Data Integrator (ODI) for Amazon Elastic MapReduce (EMR)

Using Oracle Data Integrator (ODI) with Amazon Elastic MapReduce (EMR)

Webcast: Leveraging Oracle Data Integrator (ODI) with Amazon Elastic MapReduce (EMR)

ODI Repository Sample for Amazon Elastic MapReduce (EMR)

Integrating Oracle Data Integrator (ODI) On-Premise with Oracle Cloud Services

Add Your Comment