This article discusses how to configure the Oracle Data Integrator (ODI) for Oracle Big Data Cloud (BDC) – specifically - how to configure the BDC environment for ODI, so users can design Big Data workloads with ODI and execute them on BDC. ODI offers out of the box integration with Big Data technologies such as Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and Apache Kafka, among others. ODI supports both distributions of Hadoop: Hortonworks Data Platform (HDP), and Cloudera Enterprise Data Hub (CDH). Additionally, ODI can also be used on other distributions of Hadoop such as Amazon Elastic MapReduce (EMR).
For additional information on how to use ODI with BDC, go to “Using Oracle Data Integrator with Oracle Big Data Cloud.” A pre-recorded live demonstration that supports this discussion can be found at the following Oracle Data Integration webcast: “Mastering Oracle Data Integrator with Big Data Cloud.”
In order to execute Big Data workloads with ODI on BDC, users must configure the Big Data data severs such as Hadoop, HDFS, Hive, Pig, and Park, among others. These data servers are configured in the ODI Topology. Some of these data servers – such as Hadoop and HDFS – require access to HDFS directories. These directories, if they don’t exist – they must be created before configuring the ODI data servers. The following two sections shows how to create the required HDFS directories for ODI on BDC. This configuration must be done before configuring the ODI data servers in the ODI topology.
ODI requires the configuration of an ODI Hadoop data server in order to execute the ODI Big Data workloads - this configuration is done in the ODI Topology. The ODI Hadoop data server stores configuration and metadata of the Big Data cluster in a HDFS directory. This HDFS directory, known as the ODI HDFS Root, must exist before configuring and initializing the ODI Hadoop data server. If this HDFS directory does not exist, then create the new HDFS directory. For instance, to create a new HDFS directory called /user/oracle/odi, perform the following OS commands:
sudo su hdfs
hadoop fs -mkdir /user/oracle/odi
hadoop fs -chown -R oracle:oracle /user/oracle/odi
The above OS commands create the new HDFS directory and grant privileges to the oracle user on the new HDFS directory.
ODI requires the configuration of an ODI HDFS data server in order to manage HDFS files. The HDFS file system can also be used by ODI as a staging area. Thus, the HDFS directory must exist before configuring the ODI HDFS data server. If the HDFS directory does not exist, then create the new HDFS directory. For instance, to create a new HDFS directory called /user/oracle/staging, perform the following OS commands:
sudo su hdfs
hadoop fs -mkdir /user/oracle/staging
hadoop fs -chown -R oracle:oracle /user/oracle/staging
The above OS commands create the new HDFS directory and grant privileges to the oracle user on the new HDFS directory. If required, use the above commands to create additional HDFS directories.
When this document was written, Sqoop was not part of the BDC toolset. If users want to use Sqoop with ODI to load SQL data into BDC, Table 1, below, shows a list of instructions and sample commands that can be used to install Sqoop on BDC:
Download and install Sqoop on the same node where the ODI Standalone agent is installed. Also, download the necessary jdbc drivers for the SQL database that Sqoop will use as the source or target database – Sqoop requires a jdbc driver in order to connect to the SQL database. Table 1, below, shows the jdbc driver (ojdbc7.jar) for the Oracle database.
|Sqoop Installation Steps|
|sudo su root
wget -c http://www.eu.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /usr/lib/sqoop
chown -R oracle:oracle /usr/lib/sqoop
cp sqoop-env-template.sh sqoop-env.sh
Modify the following in the sqoop-env.sh file:
#Set path to where bin/hadoop is available
#Set path to where hadoop-*-core.jar is available
#set the path to where bin/hbase is available
#Set the path to where bin/hive is available
#Set the path for where zookeper config dir is
#Set the path for $HCAT_HOME
Save your changes.
Download and copy the jdbc driver for your SQL database (i.e. Oracle) into sqoop/lib:
cp /tmp/ojdbc7.jar .
chown -R oracle:oracle /usr/lib/sqoop/lib/ojdbc7.jar
Table 1 – Sqoop Installation Instructions for Big Data Cloud
In order to successfully execute Big Data workloads with ODI on BDC, the ODI Standalone agent needs access to the Hadoop libraries, jars, and tools found on BDC. Table 2, below, shows an example of how to set the OS environment variables, so the ODI Standalone agent can access the Hadoop environment on BDC. Using the OS command-line of the BDC node, where the ODI Standalone agent will be installed, add these variables in the .bashrc profile file of the oracle user.
|Hadoop Environment Variables for ODI|
Table 2 – Big Data Cloud Hadoop Environment Variables for ODI
Copy and paste the content of Table 1, above, into a text editor such as vi, and change the Hadoop version, the compute node name, and the port number. Also, verify the directories for each variable, and adjust the directory names if required. Then, source the variables before launching the ODI Standalone agent. The ODI Standalone agent installation is discussed in the next section of this article.
In order to execute Big Data workloads with ODI on BDC, users must install and configure at least one ODI standalone agent on the BDC cluster. The ODI Standalone agent can be installed on one node of the BDC cluster. The ODI Standalone agent must have access to all the Hadoop tools and features of the BDC cluster, so that ODI workloads can run successfully.
Use the following steps to install the ODI Standalone agent:
chgrp -R oracle /u01/bdcsce/middleware
chown -R oracle /u01/bdcsce/middleware
sudo su oracle
nohup ./agent.sh -NAME=BigDataODIAgent1 >/dev/null 2>&1 &
Proceed to configure the ODI Topology for the following BDC technologies: Hadoop, Hive, Spark, Pig, and HDFS. For information on how to perform this task, go to “Configuring Oracle Data Integrator for Big Data Cloud: Topology Configuration.”
ODI offers out of the box integration with Big Data technologies such as Apache Hadoop, Apache Spark, Apache Hive, and Apache Pig, among others. ODI supports both distributions of Hadoop: Hortonworks Data Platform (HDP), and Cloudera Enterprise Data Hub (CDH). Additionally, ODI can also be used on other distributions of Hadoop such as Amazon Elastic MapReduce (EMR). This article discussed how to configure the BDC environment for ODI.
For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit “Oracle A-team Chronicles for Oracle Data Integrator (ODI).”