X

Best Practices from Oracle Development's A‑Team

Configuring Oracle Data Integrator for Oracle Big Data Cloud: Environment Configuration

Introduction

 

This article discusses how to configure the Oracle Data Integrator (ODI) for Oracle Big Data Cloud (BDC) – specifically - how to configure the BDC environment for ODI, so users can design Big Data workloads with ODI and execute them on BDC.  ODI offers out of the box integration with Big Data technologies such as Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and Apache Kafka, among others.  ODI supports both distributions of Hadoop: Hortonworks Data Platform (HDP), and Cloudera Enterprise Data Hub (CDH).  Additionally, ODI can also be used on other distributions of Hadoop such as Amazon Elastic MapReduce (EMR).

For additional information on how to use ODI with BDC, go to “Using Oracle Data Integrator with Oracle Big Data Cloud.”  A pre-recorded live demonstration that supports this discussion can be found at the following Oracle Data Integration webcast: “Mastering Oracle Data Integrator with Big Data Cloud.”

 

HDFS Directories for ODI

 

In order to execute Big Data workloads with ODI on BDC, users must configure the Big Data data severs such as Hadoop, HDFS, Hive, Pig, and Park, among others.  These data servers are configured in the ODI Topology.  Some of these data servers – such as Hadoop and HDFS – require access to HDFS directories.  These directories, if they don’t exist – they must be created before configuring the ODI data servers.  The following two sections shows how to create the required HDFS directories for ODI on BDC.  This configuration must be done before configuring the ODI data servers in the ODI topology.

 

HDFS Initialization Directory

 

ODI requires the configuration of an ODI Hadoop data server in order to execute the ODI Big Data workloads -  this configuration is done in the ODI Topology.  The ODI Hadoop data server stores configuration and metadata of the Big Data cluster in a HDFS directory.  This HDFS directory, known as the ODI HDFS Root, must exist before configuring and initializing the ODI Hadoop data server.  If this HDFS directory does not exist, then create the new HDFS directory.  For instance, to create a new HDFS directory called /user/oracle/odi, perform the following OS commands:

 

sudo su hdfs

hadoop fs -mkdir /user/oracle/odi

hadoop fs -chown -R oracle:oracle /user/oracle/odi

 

The above OS commands create the new HDFS directory and grant privileges to the oracle user on the new HDFS directory.

 

Other HDFS Directories

 

ODI requires the configuration of an ODI HDFS data server in order to manage HDFS files.  The HDFS file system can also be used by ODI as a staging area. Thus, the HDFS directory must exist before configuring the ODI HDFS data server.  If the HDFS directory does not exist, then create the new HDFS directory.  For instance, to create a new HDFS directory called /user/oracle/staging, perform the following OS commands:

 

sudo su hdfs

hadoop fs -mkdir /user/oracle/staging

hadoop fs -chown -R oracle:oracle /user/oracle/staging

 

The above OS commands create the new HDFS directory and grant privileges to the oracle user on the new HDFS directory.  If required, use the above commands to create additional HDFS directories.

 

Installing Sqoop on the Big Data Cloud Instance

 

When this document was written, Sqoop was not part of the BDC toolset.  If users want to use Sqoop with ODI to load SQL data into BDC, Table 1, below, shows a list of instructions and sample commands that can be used to install Sqoop on BDC:

Download and install Sqoop on the same node where the ODI Standalone agent is installed.  Also, download the necessary jdbc drivers for the SQL database that Sqoop will use as the source or target database – Sqoop requires a jdbc driver in order to connect to the SQL database.  Table 1, below, shows the jdbc driver (ojdbc7.jar) for the Oracle database.

 

Sqoop Installation Steps
sudo su root

 

wget -c http://www.eu.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 

tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 

mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha  /usr/lib/sqoop

chown -R oracle:oracle /usr/lib/sqoop

cd /usr/lib/sqoop/conf

cp sqoop-env-template.sh sqoop-env.sh

vi sqoop-env.sh

Modify the following in the sqoop-env.sh file:

------------------------------------------------------------------------------------

#Set path to where bin/hadoop is available

export HADOOP_COMMON_HOME=/usr/hdp/current/hadoop-client

#Set path to where hadoop-*-core.jar is available

export HADOOP_MAPRED_HOME=/usr/hdp/current/hadoop-mapreduce-client

#set the path to where bin/hbase is available

export HBASE_HOME=/usr/hdp/current/hbase-client

#Set the path to where bin/hive is available

export HIVE_HOME=/usr/hdp/current/hive-client

#Set the path for where zookeper config dir is

export ZOOKEEPER_HOME=/usr/hdp/current/zookeeper-client

export ZOOCFGDIR=/usr/hdp/current/zookeeper-client/conf

#Set the path for $HCAT_HOME

export HCAT_HOME=/usr/hdp/current/hive-webhcat

-----------------------------------------------------------------------------------

Save your changes.

Download and copy the jdbc driver for your SQL database (i.e. Oracle) into sqoop/lib:

cp /tmp/ojdbc7.jar .

chown -R oracle:oracle /usr/lib/sqoop/lib/ojdbc7.jar

export SQOOP_HOME=/usr/lib/sqoop

export PATH=$SQOOP_HOME/bin:$PATH

 Table 1 – Sqoop Installation Instructions for Big Data Cloud

 

 

BDC Operating System Variables Required for ODI

 

In order to successfully execute Big Data workloads with ODI on BDC, the ODI Standalone agent needs access to the Hadoop libraries, jars, and tools found on BDC.  Table 2, below, shows an example of how to set the OS environment variables, so the ODI Standalone agent can access the Hadoop environment on BDC.  Using the OS command-line of the BDC node, where the ODI Standalone agent will be installed, add these variables in the .bashrc profile file of the oracle user.

 

 

Hadoop Environment Variables for ODI

HADOOP_HOME=/usr/hdp/2.4.2.0-258/hadoop

HADOOP_CONF_DIR=/usr/hdp/2.4.2.0-258/hadoop/conf

HADOOP_MAPRED_HOME=/usr/hdp/2.4.2.0-258/hadoop-mapreduce

HCAT_HOME=/usr/hdp/2.4.2.0-258/hive-hcatalog

HIVE_HOME=/usr/hdp/2.4.2.0-258/hive

HIVE_CONF_DIR=/etc/hive/conf

SPARK_JAR=hdfs://bigdataodi.compute.oraclecloud.internal:8020/usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar

PIG_CLASSPATH=:/usr/hdp/2.4.2.0-258/pig/lib/jython-standalone-2.5.3.jar

ODI_ADDITIONAL_CLASSPATH='/usr/hdp/2.4.2.0-258/hive/lib/*:/usr/hdp/2.4.2.0-258/hadoop/client/*:/usr/hdp/2.4.2.0-258/hadoop/conf'

ODI_HIVE_SESSION_JARS=/usr/hdp/2.4.2.0-258/hive/lib/hive-contrib.jar

export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hive-hcatalog-core*.jar:\

$HCAT_HOME/share/hive-hcatalog/hcatalog-pig-adapter*.jar:\

$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\

$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\

$HIVE_HOME/lib/jdo-api-*.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:\

$HADOOP_HOME/lib/slf4j-api-*.jar

export PIG_OPTS=-Dhive.metastore.uris=thrift://bigdataodi.compute.oraclecloud.internal:9083

export SQOOP_HOME=/usr/lib/sqoop

export HADOOP_COMMON_HOME=$HADOOP_HOME

PATH=$SQOOP_HOME/bin:$PATH

Table 2  Big Data Cloud Hadoop Environment Variables for ODI

 

Copy and paste the content of Table 1, above, into a text editor such as vi, and change the Hadoop version, the compute node name, and the port number.  Also, verify the directories for each variable, and adjust the directory names if required.  Then, source the variables before launching the ODI Standalone agent.  The ODI Standalone agent installation is discussed in the next section of this article.

 

Installing the ODI Standalone Agent on BDC

 

In order to execute Big Data workloads with ODI on BDC, users must install and configure at least one ODI standalone agent on the BDC cluster.  The ODI Standalone agent can be installed on one node of the BDC cluster.  The ODI Standalone agent must have access to all the Hadoop tools and features of the BDC cluster, so that ODI workloads can run successfully.

Use the following steps to install the ODI Standalone agent:

  • Using the opc user of the BDC instance, login into the master node of the BDC cluster, and install ODI.
  • BDC provides a directory called /u01/bdcsce, where users can install additional utilities and software such as ODI - use this directory for your ODI installation.
  • Once the agent installation is complete, it is recommended to run the ODI agent with the oracle user already installed and configured on the BDC instance. Thus, change the ownership of the ODI directory to the oracle user.  For instance, if the directory is called middleware, change the ownership of this directory as follow:

chgrp -R oracle /u01/bdcsce/middleware

chown -R oracle /u01/bdcsce/middleware

 

  • When installing the ODI agent on BDC, follow all the default prompts of the Fusion Middleware Installer. Make a note of the Agent’s name, and directory location.

 

  • Launch the ODI agent as follow:

sudo su oracle

cd /u01/bdcsce/middleware/user_projects/domains/odi/bin

nohup ./agent.sh -NAME=BigDataODIAgent1 >/dev/null 2>&1 &

 

Proceed to configure the ODI Topology for the following BDC technologies: Hadoop, Hive, Spark, Pig, and HDFS.  For information on how to perform this task, go to “Configuring Oracle Data Integrator for Big Data Cloud: Topology Configuration.”

 

Conclusion

 

ODI offers out of the box integration with Big Data technologies such as Apache Hadoop, Apache Spark, Apache Hive, and Apache Pig, among others.  ODI supports both distributions of Hadoop:  Hortonworks Data Platform (HDP), and Cloudera Enterprise Data Hub (CDH).  Additionally, ODI can also be used on other distributions of Hadoop such as Amazon Elastic MapReduce (EMR).  This article discussed how to configure the BDC environment for ODI.

For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-team Chronicles for Oracle Data Integrator (ODI).”

 

ODI Related Articles

Using Oracle Data Integrator with Oracle Big Data Cloud

Configuring Oracle Data Integrator for Big Data Cloud: Standard Configuration

Configuring Oracle Data Integrator for Big Data Cloud: High-Availability Configuration

Configuring Oracle Data Integrator for Big Data Cloud: Topology Configuration

Webcast: “Mastering Oracle Data Integrator with Big Data Cloud Service - Compute Edition.”

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha