Oracle Data Integrator Best Practices: Performing the Initial Data Load of a Data Warehouse in the Oracle Autonomous Data Warehouse Cloud

March 29, 2018 | 7 minute read
Text Size 100%:

Validation

Content validated on 4/14/2021 with

  • ODI Version 12.2.1.4.200721.1540

  • ADW Version Oracle Database 19c Enterprise Edition Release - Production Version 19.5.0.0.0

Introduction

This article discusses the best practices that users should follow when using Oracle Data Integrator (ODI) Marketplace - to perform the initial data load of a data warehouse in the Oracle Autonomous Data Warehouse Cloud (ADW).

ADW is a fully-managed and pre-configured Oracle database cloud service that is tuned and optimized for data warehouse workloads.  As an autonomous cloud service, ADW does not require database administration or database tuning – the service automatically optimizes the database for high-performance queries; thus, users do not need to consider any details about parallelism, partitioning, indexing, or compression.  ADW is built upon the Oracle Database, hence, business intelligence (BI) applications and data integration tools that support Oracle Database also support ADW.  ADW is designed as a load-and-go cloud service – users provision the service, define tables, load data, and then run queries.

 

Technology Stack Overview

 

To perform data loads in ADW, Oracle offers Oracle Data Integrator (ODI) Marketplace.  ODI Marketplace is a cloud-based platform that offers batch data processing capabilities. To perform self-service data discovery, predictive analytics, and business intelligence reporting in ADW, Oracle offers Oracle Analytics CloudFigure 1, below, shows these three Oracle cloud services:  ODI Marketplace, ADW, and OAC.  In this example, ODI is used to perform batch data loads in ADW.  OAC is then used for business intelligence (BI) analysis and end-user ad hoc queries.

 

 

Figure 1 - Data Integration Platform, Autonomous Data Warehouse, and Analytics Cloud

 

ODI offers Knowledge Modules (KMs) to perform data loads, data transformations, and data validation.  KMs are code templates designed to perform a specific task in the overall data integration process.  KMs are technology-specific, and they are optimized for each of the technologies supported by ODI.  Out-of-the-box, ODI provides numerous KMs to extract, load, and transform data in the Oracle database.  Since ADWC is based on the Oracle database, ODI users can design workloads and use the Oracle KMs to perform data loads in ADW.  KMs fall into six categories:  Reverse-Engineering, Check, Loading, Integration, Journalizing, and Service.  This article addresses the Integration KMs that provide the best performance options when performing the initial data load of a data warehouse in ADW with ODI.

This article is one of three articles that discuss best practices when using ODI with ADW:

Oracle Data Integrator Best Practices: Performing Incremental Updates in the Oracle Autonomous Data Warehouse Cloud

Oracle Data Integrator Best Practices: Loading External Datasets into the Oracle Autonomous Data Warehouse Cloud

 

Using ODI to Perform Initial Data Loads in ADW

 

The initial data load of a data warehouse in ADW includes populating warehouse tables such as dimensions, facts, and aggregates.  The sources for these warehouse tables are usually located on on-premises data sources, other database cloud services, or on cloud object stores such as Oracle Object Storage or Amazon S3.  A landing zone or staging area is usually created in ADW to load and transform data from these data sources before inserting data into the ADW data warehouse.  Using ODI, users can populate data from these data sources into this staging area using Oracle tools such as SQL*Loader, Data Pump, and DBMS_CLOUD Utilities, among others.  This staging area can then be used as the source data area for the initial data load of a data warehouse in ADWC.  When performing an initial data load of a data warehouse in ADW, ODI recommends two IKMs:  IKM Oracle Insert, and IKM Oracle Multi-Insert.  These IKMs support data insert operations to load data directly into a single table or into multiple tables.  Table 1, below, shows these two IKMs:

 

 

Table 1 - Integration Knowledge Modules for Initial Data Loads in ADWCS

Table 1 - Integration Knowledge Modules for Initial Data Loads in ADWC

 

For a single table insert operation, the IKM Oracle Insert offers the best performance option when executing an initial data load from an ADW source dataset into an ADW target table.  The IKM Oracle Insert uses the INSERT INTO clause described on the Oracle Database SQL Language Reference Guide to specify the target table into which the database is to insert data.  The source and the target tables can be in the same schema or in different schemas of an ADW instance.  By default, the IKM Oracle Insert offers options to specify database hints such as APPEND, so users can tell the ADW optimizer to perform a direct-path insert operation.  Figure 2, below illustrates an example.

 

 

Figure 2 - Initial Data Load of a Dimension Table

Figure 2 - Initial Data Load of a Dimension Table

 

When executing the initial data load of a fact table, the IKM Oracle Insert also offers the best performance option.  This IKM can perform the fastest initial data load of a fact table.   ODI supports ANSI SQL and Oracle SQL syntax; therefore, the IKM Oracle Insert can generate SQL code on either syntax.  Additionally, ODI supports inner and outer joins conditions; thus, users can perform the initial data load of a fact table that requires outer join conditions with late arriving dimensionsFigure 3, below, shows the code generated by the IKM Oracle Insert for the initial data load of a fact table in ADW.  In this example, the user has selected to use the Oracle SQL syntax, and both the customer dimension (w_customer_d) and the product dimension (w_product_d) are late arriving dimensions.

 

 

Figure 3 - Initial Data Load of a Fact Table

Figure 3 - Initial Data Load of a Fact Table

 

For the initial data load of an aggregate warehouse table, ODI recommends to use the ODI Aggregate Component in conjunction with the IKM Oracle Insert.  This combination provides the best performance option when executing an initial data load of a warehouse aggregate table in ADW.  The following example on Figure 4, below, illustrates how ODI aggregates data using the ODI Aggregate Component and the IKM Oracle Insert:

 

 

Figure 4 - Initial Data Load of an Aggregate Table

Figure 4 - Initial Data Load of an Aggregate Table

 

In this example, illustrated on Figure 4, above, the aggregate table (w_cust_summary_a) is populated using three source tables:  a fact table (w_orders_f), and two dimension tables (w_customer_d and w_date_d).

For a multi-table insert operation – where multiple rows can be added into one or multiple tables with a single INSERT statement - the IKM Oracle Multi-Insert provides the best performance option.  The IKM Oracle Multi-Insert uses the INSERT ALL clause to insert data into one or multiple tables using only one SQL statement.  Figure 5, below, illustrates an example.

 

 

Figure 5 - Initial Data Load of a Multi-Table Insert Operation

Figure 5 - Initial Data Load of a Multi-Table Insert Operation[/caption

 

Figure 5, above, shows how the IKM Oracle Multi-Insert generates an INSERT ALL SQL statement to perform a multi-insert operation for two target tables:  w_large_orders_f, and w_small_orders_f.  The multi-insert operation selects data from a single source dataset:  w_orders_f.  Small orders with a revenue amount less than 100 are inserted into w_small_orders_f, and large orders with a revenue amount greater than 2500 are inserted into w_large_orders_f.

 

 

Setting the Cache Size of an Oracle Database Sequence for an Initial Data Load

 

In a data warehouse, dimensions require unique surrogate keys in order to maintain referential integrity between dimensions and facts tables.  These surrogate keys are usually populated when the dimensions are populated with new records.  It is a common practice to use Oracle database sequences to populate the surrogate keys of warehouse dimensions.  During the initial data load of a large warehouse dimension, if the surrogate key of the dimension is populated with an Oracle sequence, it is recommended to set the cache size of the Oracle sequence to a large size.  This will significantly improve the initial data load of a large dimension.  The cache size can then be adjusted for incremental-update operations.  By default, when an Oracle sequence is created in ADW, the cache size of the sequence is set to 2.

The following example shows how to create an Oracle sequence for a large dimension in ADW.  The cache size for this sequence is 50,000.  The initial data load of this dimension is 3 million rows:

 

CREATE SEQUENCE "SEQ_W_CUSTOMER_D" MINVALUE 1000 MAXVALUE 9999999999999 INCREMENT BY 1 START WITH 1000 CACHE 50000 NOORDER NOCYCLE ;

 

Conclusion

This article discusses the best practices that users should follow when using Oracle Data Integrator Marketplace- to perform the initial data load of a data warehouse in ADW.  For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-team Chronicles for Oracle Data Integrator (ODI).”

Benjamin Perez-Goytia


Previous Post

Functions-as-a-Service - What SOA Should Have Been

Sherwood Zern | 7 min read

Next Post


Intro to Graphs at Oracle

Michael J. Sullivan | 9 min read