Oracle Data Integrator Best Practices: Performing Incremental Updates in the Oracle Autonomous Data Warehouse Cloud

 Introduction

This article discusses the best practices that users should follow when using Oracle Data Integration Platform Cloud (DIPC) – specifically Oracle Data Integrator (ODI) – to perform incremental updates in the Oracle Autonomous Data Warehouse Cloud Service (ADWC).

ADWC is a fully-managed and pre-configured Oracle database cloud service that is tuned and optimized for data warehouse workloads.  As an autonomous cloud service, ADWC does not require database administration or database tuning – the service automatically optimizes the database for high-performance queries; thus, users do not need to consider any details about parallelism, partitioning, indexing, or compression.  ADWC is built upon the Oracle Database, hence, business intelligence (BI) applications and data integration tools that support the Oracle Database also support ADWC.  ADWC is designed as a load-and-go cloud service – users provision the service, define tables, load data, and then run their queries.

Technology Stack Overview

To perform data loads in ADWC, Oracle offers various tools such as ADWC Packages, Data Pump, and SQL*Loader, among others.  For customers wanting to utilize these ADWC tools and integrate data with other data stores (on-premises or in the Cloud), Oracle offers DIPC.  DIPC is a cloud-based unified data platform that offers batch data processing, real-time data streaming, enterprise data quality, and data governance capabilities.  The batch data-processing technology that DIPC uses is ODI, the Oracle EL-T (Extract-Load and Transform) data integration tool.  To perform self-service data discovery, predictive analytics, and business intelligence reporting in ADWC, Oracle offers Oracle Analytics CloudFigure 1, below, shows these three Oracle cloud services:  DIPC, ADWC, and OAC.  In this example, DIPC, specifically ODI, is used to perform batch data loads in ADWC.  OAC is then used for business intelligence (BI) analysis and end-user ad hoc queries.

 

Figure 1 - Data Integration Platform Cloud, Autonomous Data Warehouse Cloud and Analytics Cloud

Figure 1 – Data Integration Platform Cloud, Autonomous Data Warehouse Cloud and Analytics Cloud

 

ODI offers Knowledge Modules (KMs) to perform data loads, data transformations, and data validation.  KMs are code templates designed to perform a specific task in the overall data integration process.  KMs are technology-specific, and they are optimized for each of the technologies supported by ODI.  Out-of-the-box, ODI provides numerous KMs to extract, load, and transform data in the Oracle database.  Since ADWC is based on the Oracle database, ODI users can use KMs that are based on the Oracle technology to load and transform data in ADWC.  KMs fall into six categories:  Reverse-Engineering, Check, Loading, Integration, Journalizing, and Service.  This article addresses the Integration KMs that provide the best performance options when performing incremental updates with ODI in ADWC.

The examples discussed in this article can also be used with other flavors of ODI, including Oracle Data Integrator (ODI) 12c, and Oracle Data Integrator Cloud Service (ODICS).  This article is one of three articles that discuss best practices when using ODI with ADWC:

Oracle Data Integrator Best Practices: Loading External Datasets into the Oracle Autonomous Data Warehouse Cloud Service

Oracle Data Integrator Best Practices: Performing the Initial Data Load of a Data Warehouse in the Oracle Autonomous Data Warehouse Cloud Service

Using Oracle Data Integrator to Perform Incremental Updates in the Autonomous Data Warehouse Cloud Service

 

The incremental-update operations of an ADWC data warehouse includes adding new records and updating existing records on dimensions, facts, and aggregates.  The new rows and the records to be updated are usually located on on-premises data sources, other database cloud services, or on cloud-based object stores such as Oracle Object Store or Amazon S3.  A staging area (database schema) is usually created on ADWC to load and transform data before changes are applied to the ADWC data warehouse.  The staging area is then used as the source dataset for the incremental-update operations of the ADWC data warehouse.  When performing incremental-update operations in an ADWC data warehouse, ODI recommends two Integration Knowledge Modules (IKMs):  IKM Oracle Merge, and IKM Oracle UpdateTable 1, below, shows these two IKMs.

 

Table 1 - ODI Integration Knowledge Modules for Incremental Updates in ADWC

Table 1 – ODI Integration Knowledge Modules for Incremental Updates in ADWC

The IKM Oracle Merge offers the best performance option when both operations – inserts and updates – are required.  This IKM uses the Oracle MERGE statement to select rows from an ADWC source table and perform the insert and the update operations with a single SQL statement.  Figure 2, below, shows an example.

 

Figure 2 - Incremental Upload of an ADWC Table with ODI

Figure 2 – Incremental Upload of an ADWC Table with ODI

Figure 2 above, shows how the IKM Oracle Merge performs the incremental update operation of a warehouse dimension called w_product_d.  In this example, the IKM uses a condition to determine whether to update or to insert into the dimension table.   The condition is based on the natural key, prod_id, of the dimension table.  When there is no match, new rows are inserted into the dimension table using a database sequence.  The database sequence has been previously created by the user.  When there is a match, the update operation updates the target columns of the dimension table with the new values.

The IKM Oracle Merge can also be used for updates-only operations.  Alternatively, ODI users can use the IKM Oracle Update for updates-only operations, but the IKM Oracle Merge offers the best performance when updating tables in an ADWC data warehouse.  The IKM Oracle Update uses the Oracle UPDATE statement to perform the update operation based on an update key, and it uses subqueries to detect the rows that needs to be updated.  Figure 3, below shows an example of how the IKM Oracle Update performs an update operation for the w_product_d dimension:

 

Figure 3 - Updates-Only Operation with the IKM Oracle Update

Figure 3 – Updates-Only Operation with the IKM Oracle Update

The IKM Oracle Update can be used for small target tables where updates are only required.  However, as a best practice, it is recommended to test both the IKM Oracle Merge and the IKM Oracle Update to determine which IKM provides the best performance.

 

Conclusion

This article discusses the best practices that users should follow when using Oracle Data Integration Platform – specifically Oracle Data Integrator – to perform incremental updates in an ADWC data warehouse.  For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-team Chronicles for Oracle Data Integrator (ODI).”

ODI Related Articles

Oracle Data Integrator Best Practices: Loading External Datasets into the Oracle Autonomous Data Warehouse Cloud Service

Oracle Data Integrator Best Practices: Performing the Initial Data Load of a Data Warehouse in the Oracle Autonomous Data Warehouse Cloud Service

Oracle Data Integrator (ODI) 12c

Oracle Autonomous Data Warehouse Cloud Service (ADWC)

Oracle Data Integration Platform Cloud (DIPC)

Oracle Analytics Cloud

Oracle Data Integrator Knowledge Modules (KMs)

Oracle Object Store

Oracle SQL*Loader

Oracle Data Pump

Oracle DBMS_CLOUD Utilities

Oracle Database SQL Language Reference Guide

Data Warehouse Late Arriving Dimensions

Connecting Oracle Data Integrator Studio to the Autonomous Data Warehouse Cloud

Connecting Oracle Data Integrator Studio to the Exadata Express Cloud

Database Cloud Service (DBCS)

Integrating Oracle Data Integrator (ODI) On-Premises with Cloud Services

Using ODI Loading Knowledge Modules on the Oracle Database Cloud Service (DBCS)

Oracle Data Integrator Best Practices:  Using Check-Knowledge Modules On Both On-Premises and on Cloud Services

Oracle Data Integrator Best Practices:  Using Loading Knowledge Modules on Both On-Premises and on Cloud Services

 

 

Add Your Comment