With this blog, we aim to guide Oracle customers through the ETL/ELT tools available on Oracle Cloud Infrastructure (OCI). Whether you're a developer, data engineer, or an IT decision-maker, selecting the right tool is crucial for optimizing your data workflows.
We'll delve into the intricacies of various ETL/ELT tools offered by OCI. By understanding the strengths and weaknesses of each tool.
At Oracle, we offer three main ETL/ELT tools tailored to various data integration needs:
While our primary focus is on comparing ETL/ELT tools, one of the most frequently asked questions around how Oracle Data Integrator (ODI) is licensed on Oracle Cloud.
On OCI, you can find ODI on the Marketplace under the name "ODI Classic". Deployment and installation are facilitated through an automatic Terraform script tailored to your configuration requirements. Within the Marketplace, you'll encounter two distinct versions: the Free edition and the BYOL (Bring Your Own License) option.
The Free version is suitable for users whose primary target is the Oracle Databases Cloud Service (Example: Autonomous Database, DB Systems, ExaCS, ExaCC). It provides essential functionality without the need for an additional license. Conversely, the BYOL version provides to users who already possess an ODI license and wish to leverage it within the cloud environment or customers that want to use non-oracle targets.
You can find the license details on Oracle PaaS and IaaS Universal Credits Service Descriptions document. On page 92, you find the note 9:
9: This Cloud Service is an eligible target for the replication of data using Oracle Data Integrator, which is available on the Oracle Cloud Marketplace.
Another common question, it's around the data transforms deployment. For that question, you can find this blog wrote by Jayant Mahto the ODI and Data Transforms Product Manager. In order to keep it simple, we are just going to compare Data Transforms as one unique product, without specifying the deployment model.
When comparing the architecture of ODI and OCI Data Integration, it's essential to understand their fundamental differences and how they influence various aspects of data processing.
To simplify and only for architecture proposes, when I mention ODI I'm including the ODI Classic and Data Transforms since both shared the same architecture. Also, I'm not going to detail every single piece of ODI, and simplify as much as I can.
ODI works on a repository-based architecture. It relies on two repositories: the work repository and the master repository, where all metadata and configurations are stored. During the execution of ETL/ELT processes, ODI's performance heavily depends on the underlying database where the SQL code is executed. While this architecture offers familiarity and robustness, factors such as database performance and latency can significantly impact overall performance.
In contrast, OCI Data Integration leverages Apache Spark, an open-source unified analytics engine for large scale data processing. Managed by Oracle, OCI Data Integration abstracts away the complexities of Spark, eliminating the need for customers to possess expertise in languages such as Scala, Java or Python. When executing ETL/ELT processes, the code is sent to a Spark Cluster, where it is queued and subsequently executed. This architecture offers scalability and agility, enabling seamless processing of large volumes of data while reducing the burden on customers to manage infrastructure and resources.
Below is a table comparing the features of the three ETL/ELT tools: ODI, OCI Data Integration, and Oracle Data Transforms.
Features |
ODI Classic Marketplace |
Data Transforms |
OCI-Data Integration |
---|---|---|---|
User Interface |
Studio (Thick client) |
Browser-based |
Browser-based |
Capability |
ELT/ETL push-down |
ELT Push-down |
Powered by ETL or E-LT push-down |
Learning Curve |
Steep learning curve |
User friendly. Easier than ODI Classic and OCI DI. |
User friendly. UI can be improved. |
Scalability |
It doesn’t scale automatically |
It doesn’t scale automatically |
It doesn’t scale automatically |
Deployment Model |
Marketplace image deployment – Customer managed |
Fully Managed |
Fully Managed |
Integration Approach |
Declarative design, low code approach. |
Graphical low-code approach |
Graphical low-code approach |
Hosting Environment |
OCI |
OCI |
OCI |
Connectivity |
65 supported sources and targets. However, the license is free when the target is an Oracle DB Cloud. |
More than 100 connectors. Check this link to know what is supported as a source and target. |
SaaS Apps Heterogeneous Cloud Databases, Data Lakes, On-prem sources, Big Data Target: Oracle DB on cloud, MySQL on cloud and Hive. |
Data Load Targets |
65 supported sources and targets. However, the license is free when the target is an Oracle DB Cloud. |
Amazon Aurora, Amazon Redshift, Azure SQL Database, Azure Synapse Analytics, Cassandra, DataStax, EnterpriseDB, FinancialForce, Hypersonic SQL, IBM DB2 Hosted, IBM DB2 UDB, IBM DB2 Warehouse, IBM DB2/400, Informix, Microsoft Dynamics 365, Microsoft SharePoint, Microsoft SQL Server, Mongo DB, MySQL, MySQL Heatwave,Oracle DB, Oracle Analytics Cloud, Oracle Business Intelligence Cloud (BICC) Connector, Oracle EBS, Oracle Fusion ERP, Oracle Fusion Sales, Oracle Fusion Service, Oracle GoldenGate – OCI, Oracle Marketing Cloud, Oracle NetSuite, Oracle Object Storage, Oracle People Soft, Oracle Sales Cloud, Oracle Service Cloud, Oracle SIEBEL, PostgreSQL, Salesforce.com, Snowflake, Sybase As Anywhere, Sybase as Enterprise, Sybase AS IQ, Veeva CRM. |
Oracle DB on cloud, MySQL, Hive, Object Storage. |
BICC integration |
Yes |
Yes |
Yes |
Oracle Managed |
No |
Yes |
Yes |
Native integration with Oracle Cloud Infrastructure and SaaS |
No |
No |
It can integrate with OCI Data Flow, Data Catalog, OCI Functions and OCI Lake. |
Data Transformations |
Expression, Aggregate, Distinct, Filter, Join, Union, Minus, Intersect, Lookup, Pivot, Sort, Split, Subquery, Filter, Table Function, Unpivot, Spatial and Graph functions. |
Expression,Aggregate, Distinct,,Filter,Join, Union, Minus, Intersect, Lookup, Pivot, Sort, Split, Subquery, Filter, Table Function, Unpivot, Spatial and Graph functions. |
Expression,Aggregate, Distinct,,Filter,Join, Union, Minus, Intersect, Lookup, Pivot, Sort, Split, Filter, Unpivot, Spatial and Graph functions. OCI Functions, Flatten operator. |
Error Handling |
ODI Classic has Check Knowledge Modules(CKM) that allow you to manage data integrity errors and logging them in error tables. On packages you can define the behavior of the process when it fails. Example: Retry x times, send an email, stop immediately, ignore and continue the process. |
No out of the box error handling mechanism for data integrity. |
No out of the box error handling mechanism for data integrity. |
Extensibility and Customizations |
ODI Knowledge Modules (KM) are highly customizable and can be changed to fufil specif requirements. |
Not possible. |
Not possible. |
Automation and Scripting |
In ODI you can use Groovy that uses ODI SDK. Example: Automation and creation of ODI mappings can be achieved through groovy scripts. Other supported languages are supported on ODI procedures (SQL, ODI Tools, Jython, Groovy, NetRexx, and Java BeanShell). |
Python SDK (roadmap) |
OCI DI has a rich set of Rest APIs that can be integrated in 3rd party tools. SDK(Java, Python, Typescript, Javascript, .NET, Go and Ruby), CLI and PL/SQL SDK.
|
Orchestration |
ODI classic has workflows where ETL/ELT process can be orchestrated and scheduled. Dependcy across processes can be managed. Loops can be implemented through variables and packages. |
Data Transforms has workflows where Data Flows can be orchestrated and scheduled. |
OCI DI Pipelines provide orchestration and scheduling capabilities. However, it still has some limitations for complex dependencies. Example: For each. Pipelines are really slow and we have limits on number tasks inside of a pipeline (19). |
Data Lineage |
No* *OEMM that is under CA can provide it. |
No. |
It’s integrated with OCI Data Catalog. |
Variables |
Yes. |
Not supported yet (it is planned and as of May 2024 is on the Roadmap: until released this could change) |
You can use parameters, but it doesn’t have the same functionality as variables in ODI Classic. |
Execution Monitoring |
ODI Operator tab contains all the logs. |
ODT Logs tab contains all the logs |
Limited logs. It doesn’t contain the code (SQL) |
Maintenance |
Customer has to upgrade and maintain the VM instance. |
Fully Managed. |
Fully Managed. |
Send emails |
Yes. |
No* *It can use database packages to send emails |
No* *It can use database packages to send emails |
Custom JDBC |
Yes |
No |
No |
Call Rest APIs |
ODI supports rest APIs with some limitations. Oauth2 is not supported among other authorization protocols. |
Not out of the box. As a workaround, it can use the SQL component on Workflows and call rest APIs using PL/SQL. |
Only supports Resource principal as authentication method. Very rich support for Rest API call. · Success condition · Polling and termination condition for a no-wait REST calls |
Rest APIs as a source |
ODI supports rest APIs but with some limitations. Oauth2 is not supported among other authorization protocols. It requires to save the output of rest call on CSV file and they use it as a source in ODI mapping. It’s not an automated process. |
No |
Yes, it supports basic and Oauth2 authentication. Currently, manifest file is mandatory. |
HA configuration |
Yes (manual) |
Oracle Managed |
Oracle Managed |
Disaster Recovery configuration |
Yes (manual) |
No (Check Oracle SLA) |
No (Check Oracle SLA) |
Version Control |
Yes. ODI Classic can be integrated with GitHub and Apache Subversion. |
No |
No (roadmap). It supports export and import of projects. |
Network customization |
Yes |
No |
Yes |
Licensing |
The the license is free when the target is an Oracle DB Cloud. If any other target non-Oracle DB cloud is used, then on-premise licence for ODI is required. The customer pays the VM. |
There is no license. The customer pays the ECPUs assigned to VM. |
3 metrics:
|
Load Data |
Local files, DataBase & Cloud Storage |
DataBase & Cloud Storage |
DataBase & Cloud Storage |
Supported Objects |
CSV, JSON (limited) |
CSV |
CSV, JSON, Parquet, Avro, XLSX (Excel) |
Supported Compression |
Not supported out of the box |
Not supported out of the box |
Gzip, Bzip, Deflate, Lz4, Snappy |
Integration Flexibility |
Integration with SaaS(BICC) |
Integrated with Autonomous database and SaaS (BICC). |
Native integration with Oracle Cloud Infrastructure and SaaS (BICC) |
Security and Compliance |
ODI Security and LDAP. |
Integrated with autonomous. Need to review. |
Integrated with OCI IAM for authentication and authorization Uses OCI Vault to store and encrypt sensitive information like passwords, wallet files etc.Regulatory compliance: FedRAMP High, HIPAA, HITRUST CSF, PCI DSS, SOC1, 2 &3 |
Low-code approach |
Yes |
Yes |
Yes |
Automated schema drift protection |
No |
No |
Yes |
Reusable templates and dataflows |
Yes |
No |
Yes (Limited) |
Data flow validation |
Yes |
Yes |
Yes |
Automatic ETL or E-LT method selection |
No |
No |
Yes
|
We hope that you have enjoyed this blog and that it can help you decide to choose the right ETL/ELT tool for your use cases.
Elói is a Cloud Solution Architect for Data Integration and Analytics. He has experience in developing ELT / ETL projects and delivering analytics solutions to multiple customers across different industries.
The Oracle A-Team is a central, outbound, highly technical team of enterprise architects, solution specialists, and software engineers.
Jérôme is a Solution Architect for the A-Team with a focus on Data Integration, Big Data and Analytics.
Next Post