This article discusses how to configure Oracle Data Integrator (ODI) for Oracle Big Data Cloud (BDC) using the ODI Enterprise or High-Availability installation. ODI offers out of the box integration with Big Data technologies such as Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and Apache Kafka, among others. ODI supports both distributions of Hadoop: Hortonworks Data Platform (HDP), and Cloudera Enterprise Data Hub (CDH). Additionally, ODI can also be used on other distributions of Hadoop such as Amazon Elastic MapReduce (EMR).
For additional information on how to use ODI with BDC, go to “Using Oracle Data Integrator with Oracle Big Data Cloud.” A pre-recorded live demonstration that supports this discussion can be found at the following Oracle Data Integration webcast: “Mastering Oracle Data Integrator with Big Data Cloud.”
In order to use ODI with BDC, users can install and configure ODI in one of two ways: ODI Standalone or ODI with High-Availability. The ODI Standalone configuration requires the installation and configuration of the ODI Standalone agent in an instance of BDC. The ODI with High-Availability configuration is an extension of the ODI Standalone configuration, but it uses the ODI J2EE agent as an orchestrator for Big Data workloads. The following sections of this article provide a guideline for installing and configuring ODI on BDC using the ODI High-Availability configuration.
For additional information on how to install and configure ODI for BDC using the ODI Standard configuration, go to “Configuring Oracle Data Integrator for Big Data Cloud: Standard Configuration.”
The ODI High-Availability configuration is an extension of the ODI Standalone configuration. Under this configuration, both ODI agents, the ODI Standalone and the ODI J2EE agent, are installed and configured on two cloud services in order to achieve high-availability.
The ODI Standalone agent is installed and configured on an instance of BDC – at least two ODI standalone agents are recommended. The ODI J2EE agent is installed and configured on an instance of the Oracle Java Cloud Service (JCS) – at least two WebLogic managed servers (one ODI J2EE agent on each managed server) are recommended. The ODI High-Availability configuration allows users to submit both Big Data and non-Big Data ODI workloads directly to the load balancer on JCS. The load balancer distributes the ODI workloads among the J2EE agents on JCS. If the ODI workloads are Big Data workloads, then the J2EE agent sends the Big Data workloads to the Standalone agent on BDC for execution. Figure 1, below, illustrates the ODI High-Availability configuration for Big Data Cloud.
The ODI High-Availability Configuration for BDC, on Figure 2, below, requires an on-premises license of Oracle Data Integrator for Big Data. Thus, users must download the ODI installer from the Oracle Middleware Data Integrator Download Site. The ODI Cloud Service (ODICS) found on Java Cloud Service cannot be used for this configuration.
Figure 1 – Configuring ODI High-Availability for Big Data Cloud
To install and configure ODI for BDC, using the ODI High-Availability configuration, follow these steps:
To host an ODI repository, users must provision an instance of a SQL database. Oracle Cloud offers MySQL Cloud Service and Oracle Database Cloud Service. The following instructions use the Oracle Database Cloud Service to host the ODI repository:
The ODI Studio is the user interface that ODI offers to perform the ETL development. It is recommended to perform this development in a compute resource such as Oracle Compute Classic. By installing ODI Studio on Oracle Compute Classic, ODI users have the flexibility of having an ODI Studio installation that is independent of the ODI agent installation, and it provides more scalability when more developers are added into the ETL project. Use the following instructions to provision an instance of Oracle Compute Classic and install the ODI Studio on this instance:
To configure ODI for high-availability , use the following instructions to provision an instance of JCS and install the ODI J2EE agents:
Use the following instructions to provision an instance of BDC and install the ODI Standalone agent:
The ODI High-Availability configuration requires additional access rules between cloud instances, so the instances can communicate with each other. For instance, the ODI Standalone agent, on BDC, must access the ODI repository on DBCS. Also, the ODI J2EE agent must access the BDC instance in order to orchestrate Big Data workloads with the ODI Standalone agent. Thus, follow these instructions in order to configure additional access rules between cloud instances:
Once the access rules have been configured, users can launch the ODI Studio on Compute Classic, and start their ETL development work.
ODI offers out of the box integration with Big Data technologies such as Apache Hadoop, Apache Spark, Apache Hive, Apache Pig, and Apache Kafka, among others. ODI supports both distributions of Hadoop: Hortonworks Data Platform (HDP), and Cloudera Enterprise Data Hub (CDH). Additionally, ODI can also be used on other distributions of Hadoop such as Amazon Elastic MapReduce (EMR). This article discussed how to configure ODI with BDC using the ODI High-Availability configuration.
For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit “Oracle A-team Chronicles for Oracle Data Integrator (ODI).”