Oracle Data Integrator Best Practices: Performing the Initial Data Load of a Data Warehouse in the Oracle Autonomous Data Warehouse Cloud

Introduction

This article discusses the best practices that users should follow when using Oracle Data Integration Platform Cloud (DIPC) – specifically Oracle Data Integrator (ODI) – to perform the initial data load of a data warehouse in the Oracle Autonomous Data Warehouse Cloud (ADWC).

ADWC is a fully-managed and pre-configured Oracle database cloud service that is tuned and optimized for data warehouse workloads.  As an autonomous cloud service, ADWC does not require database administration or database tuning – the service automatically optimizes the database for high-performance queries; thus, users do not need to consider any details about parallelism, partitioning, indexing, or compression.  ADWC is built upon the Oracle Database, hence, business intelligence (BI) applications and data integration tools that support Oracle Database also support ADWC.  ADWC is designed as a load-and-go cloud service – users provision the service, define tables, load data, and then run queries.

 

Technology Stack Overview

 

To perform data loads in ADWC, Oracle offers DIPC.  DIPC is a cloud-based unified platform that offers batch data processing, real-time data streaming, enterprise data quality, and data governance capabilities.  The batch data-processing technology that DIPC uses is ODI, the Oracle EL-T (Extract-Load and Transform) data integration tool.  To perform self-service data discovery, predictive analytics, and business intelligence reporting in ADWC, Oracle offers Oracle Analytics CloudFigure 1, below, shows these three Oracle cloud services:  DIPC, ADWC, and OAC.  In this example, DIPC, specifically ODI, is used to perform batch data loads in ADWCS.  OAC is then used for business intelligence (BI) analysis and end-user ad hoc queries.

 

 

Figure 1 - Data Integration Platform, Autonomous Data Warehouse, and Analytics Cloud

Figure 1 – Data Integration Platform, Autonomous Data Warehouse, and Analytics Cloud

 

ODI offers Knowledge Modules (KMs) to perform data loads, data transformations, and data validation.  KMs are code templates designed to perform a specific task in the overall data integration process.  KMs are technology-specific, and they are optimized for each of the technologies supported by ODI.  Out-of-the-box, ODI provides numerous KMs to extract, load, and transform data in the Oracle database.  Since ADWC is based on the Oracle database, ODI users can design workloads and use the Oracle KMs to perform data loads in ADWC.  KMs fall into six categories:  Reverse-Engineering, Check, Loading, Integration, Journalizing, and Service.  This article addresses the Integration KMs that provide the best performance options when performing the initial data load of a data warehouse in ADWCS with ODI.

The examples discussed in this article can also be used with other flavors of ODI, including Oracle Data Integrator (ODI) 12c, and Oracle Data Integrator Cloud Service (ODICS).  This article is one of three articles that discuss best practices when using ODI with ADWCS:

Oracle Data Integrator Best Practices: Performing Incremental Updates in the Oracle Autonomous Data Warehouse Cloud

Oracle Data Integrator Best Practices: Loading External Datasets into the Oracle Autonomous Data Warehouse Cloud

 

Using ODI to Perform Initial Data Loads in ADWC

 

The initial data load of a data warehouse in ADWC includes populating warehouse tables such as dimensions, facts, and aggregates.  The sources for these warehouse tables are usually located on on-premises data sources, other database cloud services, or on cloud object stores such as Oracle Object Store or Amazon S3.  A landing zone or staging area is usually created in ADWC to load and transform data from these data sources before inserting data into the ADWC data warehouse.  Using ODI, users can populate data from these data sources into this staging area using Oracle tools such as SQL*Loader, Data Pump, and DBMS_CLOUD Utilities, among others.  This staging area can then be used as the source data area for the initial data load of a data warehouse in ADWC.  When performing an initial data load of a data warehouse in ADWC, ODI recommends two IKMs:  IKM Oracle Insert, and IKM Oracle Multi-Insert.  These IKMs support data insert operations to load data directly into a single table or into multiple tables.  Table 1, below, shows these two IKMs:

 

 

Table 1 - Integration Knowledge Modules for Initial Data Loads in ADWCS

Table 1 – Integration Knowledge Modules for Initial Data Loads in ADWC

 

For a single table insert operation, the IKM Oracle Insert offers the best performance option when executing an initial data load from an ADWC source dataset into an ADWC target table.  The IKM Oracle Insert uses the INSERT INTO clause described on the Oracle Database SQL Language Reference Guide to specify the target table into which the database is to insert data.  The source and the target tables can be in the same schema or in different schemas of an ADWC instance.  By default, the IKM Oracle Insert offers options to specify database hints such as APPEND, so users can tell the ADWC optimizer to perform a direct-path insert operation.  Figure 2, below illustrates an example.

 

 

Figure 2 - Initial Data Load of a Dimension Table

Figure 2 – Initial Data Load of a Dimension Table

 

When executing the initial data load of a fact table, the IKM Oracle Insert also offers the best performance option.  This IKM can perform the fastest initial data load of a fact table.   ODI supports ANSI SQL and Oracle SQL syntax; therefore, the IKM Oracle Insert can generate SQL code on either syntax.  Additionally, ODI supports inner and outer joins conditions; thus, users can perform the initial data load of a fact table that requires outer join conditions with late arriving dimensionsFigure 3, below, shows the code generated by the IKM Oracle Insert for the initial data load of a fact table in ADWC.  In this example, the user has selected to use the Oracle SQL syntax, and both the customer dimension (w_customer_d) and the product dimension (w_product_d) are late arriving dimensions.

 

 

Figure 3 - Initial Data Load of a Fact Table

Figure 3 – Initial Data Load of a Fact Table

 

For the initial data load of an aggregate warehouse table, ODI recommends to use the ODI Aggregate Component in conjunction with the IKM Oracle Insert.  This combination provides the best performance option when executing an initial data load of a warehouse aggregate table in ADWC.  The following example on Figure 4, below, illustrates how ODI aggregates data using the ODI Aggregate Component and the IKM Oracle Insert:

 

 

Figure 4 - Initial Data Load of an Aggregate Table

Figure 4 – Initial Data Load of an Aggregate Table

 

In this example, illustrated on Figure 4, above, the aggregate table (w_cust_summary_a) is populated using three source tables:  a fact table (w_orders_f), and two dimension tables (w_customer_d and w_date_d).

For a multi-table insert operation – where multiple rows can be added into one or multiple tables with a single INSERT statement – the IKM Oracle Multi-Insert provides the best performance option.  The IKM Oracle Multi-Insert uses the INSERT ALL clause to insert data into one or multiple tables using only one SQL statement.  Figure 5, below, illustrates an example.

 

 

Figure 5 - Initial Data Load of a Multi-Table Insert Operation

Figure 5 – Initial Data Load of a Multi-Table Insert Operation

 

Figure 5, above, shows how the IKM Oracle Multi-Insert generates an INSERT ALL SQL statement to perform a multi-insert operation for two target tables:  w_large_orders_f, and w_small_orders_f.  The multi-insert operation selects data from a single source dataset:  w_orders_f.  Small orders with a revenue amount less than 100 are inserted into w_small_orders_f, and large orders with a revenue amount greater than 2500 are inserted into w_large_orders_f.

 

 

Setting the Cache Size of an Oracle Database Sequence for an Initial Data Load

 

In a data warehouse, dimensions require unique surrogate keys in order to maintain referential integrity between dimensions and facts tables.  These surrogate keys are usually populated when the dimensions are populated with new records.  It is a common practice to use Oracle database sequences to populate the surrogate keys of warehouse dimensions.  During the initial data load of a large warehouse dimension, if the surrogate key of the dimension is populated with an Oracle sequence, it is recommended to set the cache size of the Oracle sequence to a large size.  This will significantly improve the initial data load of a large dimension.  The cache size can then be adjusted for incremental-update operations.  By default, when an Oracle sequence is created in ADWC, the cache size of the sequence is set to 2.

The following example shows how to create an Oracle sequence for a large dimension in ADWC.  The cache size for this sequence is 50,000.  The initial data load of this dimension is 3 million rows:

 

CREATE SEQUENCE “SEQ_W_CUSTOMER_D” MINVALUE 1000 MAXVALUE 9999999999999 INCREMENT BY 1 START WITH 1000 CACHE 50000 NOORDER NOCYCLE ;

Conclusion

This article discusses the best practices that users should follow when using Oracle Data Integration Platform – specifically Oracle Data Integrator – to perform the initial data load of a data warehouse in ADWC.  For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-team Chronicles for Oracle Data Integrator (ODI).”

ODI Related Articles

Oracle Data Integrator Best Practices: Performing Incremental Updates in the Oracle Autonomous Data Warehouse Cloud

Oracle Data Integrator Best Practices: Loading External Datasets into the Oracle Autonomous Data Warehouse Cloud

Oracle Data Integrator (ODI) 12c

Oracle Autonomous Data Warehouse Cloud Service (ADWCS)

Oracle Data Integration Platform Cloud (DIPC)

Oracle Analytics Cloud

Oracle Data Integrator Knowledge Modules (KMs)

Oracle Object Store

Oracle SQL*Loader

Oracle Data Pump

Oracle DBMS_CLOUD Utilities

Oracle Database SQL Language Reference Guide

Data Warehouse Late Arriving Dimensions

Connecting Oracle Data Integrator Studio to the Autonomous Data Warehouse Cloud

Connecting Oracle Data Integrator Studio to the Exadata Express Cloud

Database Cloud Service (DBCS)

Integrating Oracle Data Integrator (ODI) On-Premises with Cloud Services

Using ODI Loading Knowledge Modules on the Oracle Database Cloud Service (DBCS)

Oracle Data Integrator Best Practices:  Using Check-Knowledge Modules On Both On-Premises and on Cloud Services

Oracle Data Integrator Best Practices:  Using Loading Knowledge Modules on Both On-Premises and on Cloud Services

 

Add Your Comment