Oracle Data Integrator Best Practices: Using Check Knowledge Modules on both On-Premises and Cloud Computing

July 21, 2017 | 9 minute read
Text Size 100%:

Validation

Content validated on 4/14/2021 with

  • ODI Version 12.2.1.4.200721.1540

Introduction

 

This article discusses the best practices for selecting and using Oracle Data Integrator (ODI) Check Knowledge Modules (CKMs) on both on-premises data servers and cloud data services.  In traditional ETL tools, when an ETL data flow is designed, ETL developers must design steps to perform data validation and remove the records that do not pass validation.  The task of designing and implementing steps to validate data can increase the overall time it takes to implement an ETL project.  In ODI, Check Knowledge Modules (CKMs) are code templates that can be used to validate the source dataset of an ODI mapping during its execution.  CKMs can also validate the content of the datastores in an ODI model.

When using CKMs, ODI developers do not need to implement steps to validate a source dataset - CKMs already provide a predefined set of tasks to perform data validation.  Thus, CKMs can reduce the time it takes to design and implement data validation in an ETL project.  CKMs are recommended in cases where a source dataset may contain invalid records and these invalid records must be removed before or after they are loaded into a target datastore.  CKMs should be used wisely - the steps to validate data may increase the overall time that takes to execute an ODI mapping.

CHKs are one of seven categories of Knowledge Modules (KMs) found in ODI.  Other categories of KMs include: Loading, Reverse-Engineering, Integration, Extract, Journalizing, and Service.  This article focuses on the selection and use of CKMs.    If you would like to learn more about other types of KMs, go to “Oracle Data Integrator (ODI) Knowledge Modules (KMs).”

 

Using Check Knowledge Modules

 

In ODI, users can import metadata from a data server into an ODI model using the ODI Reverse-Engineering process.  The import operation populates the name of the selected datastores – along with their attributes – from the data server into the ODI model.  If the datastores from the data server contain data constraints such as keys, references, and conditions, then these data constraints are imported into the ODI model as well.  In the ODI model, users can define additional data constraints, and these data constraints do not need to physically exist in the data server – they can exist in the ODI repository only.  When an ODI mapping is executed, the CKM uses the data constraints of the target datastore to enforce data integrity.  CKMs can also be used to validate and enforce data integrity in an ODI model.  The next sections of this article discuss how to use CKMs in both ODI mappings and ODI models.

 

Using Check Knowledge Modules in ODI Mappings

 

When an ODI mapping is executed, a flow-dataset is created to hold the selected records from the source datastores of the mapping.  The flow-dataset can be stored in a temporary table or it can be uploaded directly into the target datastore of the mapping.  If the flow-dataset is stored in a temporary table, a CKM can be used to validate the content of the temporary table.  This data validation occurs before data is loaded into the target datastore.  This strategy is known as Flow Control.    However, if the flow-dataset is loaded directly into the target datastore, then a CKM can be used to validate the target datastore instead.  This data validation occurs after data is loaded into the target datastore.  This strategy is known as Static Control.

In an ODI mapping, CKMs are assigned to target datastores - one CKM per datastore.  CKMs are optional and they should be assigned to target datastores only if data validation is required.  CKMs are assigned in the Physical Design section of an ODI mapping, specifically, in the Properties section of the target datastore.  Figure 1 below, shows the physical design of an ODI mapping:

 

 

Figure 1 - Physical Design of an ODI Mapping

Figure 1 - Physical Design of an ODI Mapping

 

In this example, Figure 1 above, the physical design of this mapping contains a single target datastore, W_ORDERS_F.  This target datastore is an Oracle table that requires data validation before data is loaded into this table.  This table has data constraints such as primary key, foreign keys, and checks.  Thus, a CKM can be used to ensure that the records found in the flow-dataset of the mapping meet the constraints of this target datastore.  Since this target datastore is an Oracle table, the CKM Oracle is used for this data validation.  Figure 2 below shows the KM options for the CKM Oracle:

 

Figure 2 - Check Knowledge Module for the Oracle Technology

Figure 2 - Check Knowledge Module for the Oracle Technology

The CKM Oracle, on Figure 2 above, checks that records found in the flow-dataset of the mapping are consistent with the defined constraints of the target datastore.  The validation includes primary key, foreign keys, and checks.  If invalid records are found, they are copied from the flow-dataset into an error table.  Then, the invalid records are removed from the flow-dataset and only valid records are loaded into the target datastore.  The error table is created during the execution of the mapping.

 

Tip 1 - Using Check Knowledge Modules in ODI

 

In an ODI mapping, the invocation of the CKM is controlled by the Integration Knowledge Module (IKM) assigned to the target datastore.  The IKM must have one of two options in order to invoke the CKM:  FLOW_CONTROL, and STATIC_CONTROL.  These two options are available in a selected number of IKMs.  The FLOW_CONTROL option performs data validation against the flow-dataset that has been stored in a temporary table.  The temporary table, known as the I$ table, is created by the IKM if this option is selected.  The STATIC_CONTROL option loads the flow-dataset directly into the target datastore.  The I$ table is not created with this option.  This option validates data against the target datastore once data is loaded into the table.  Figure 3, below, shows these options in the IKM Oracle Incremental Update:

 

 

Figure 3 - IKM Flow Control Option and Static Control Option

Figure 3 - IKM Flow Control Option and Static Control Option

 

By default, the FLOW_CONTROL option is enabled inthis IKM, as shown on Figure 3 above.  Thus, when using this IKM, a CKM must be selected.

 

Using Check Knowledge Modules in ODI Models

 

CKMs can also be used to validate and remove invalid records from datastores in an ODI model.  When using a CKM to check an ODI model, the CKM uses the error table created by the ODI mapping during execution to look for invalid records. For each invalid record found in the error table, the CKM searches for the same invalid record in the datastore where it was originally inserted, and if it is found, it is deleted.  Figure 4 below shows how to select and use a CKM in an ODI model.  First, select the Control option.  Then, select a CKM based on the technology of the ODI model.  Select the Check Model option to validate the content of the datastores in the ODI model.

 

Figure 4 - Using Check Knowledge Modules in ODI Models

Figure 4 - Using Check Knowledge Modules in ODI Models

 

Tip 2 - Using Check Knowledge Modules in ODI

 

Check Knowledge Modules Best Practices

 

CKMs are useful knowledge modules that can be used to enforce data integrity during the execution of an ODI mapping.  CKMs can also be used to validate data in an ODI model.  However, data validation can add additional time to the overall execution of an ODI mapping.  Thus, when selecting and using CKMs in ODI, the following best practices should be observed:

  • Before selecting and using CKMs, first ensure that the data constraints of the datastores that require data validation have been added into the ODI model.  Data constraints can be added into an ODI model using the ODI Reverse-Engineering process, or manually by the user.
  • Use a CKM if the flow-dataset of a mapping requires data validation and invalid records cannot be inserted into the target datastore of the mapping.  Also, use a CKM if the target datastore of the mapping allows for invalid records, but the invalid records must be removed after the data upload operation.
  • Use a CKM to prevent mapping-execution failures due to invalid records.  Use a CKM if the target datastore of the mapping has data constraints and invalid records cannot be loaded – ODI mappings that follow this practice should never fail due to invalid data.
  • Use additional data validation techniques such as data conversion functions to capture and fix invalid data during the execution of an ODI mapping.  These functions may be available in your technology already, and you may not need to use a CKM to validate data.   For instance, Oracle offers a variety of SQL functions that can be used to capture and fix invalid data.  For example, use the Oracle NVL function to capture null values and replace them with valid values.  Figure 5, below, shows an example:

 

Figure 5 - Fixing Data Value Issues with the ODI Expression Editor

Figure 5 - Fixing Data Value Issues with the ODI Expression Editor

 

  • In this example, on Figure 5 above, the Oracle NVL function is used to validate and fix invalid values for three attributes: ADDRESS, CITY, and STATE.  This Oracle function replaces a null value with a user-defined string.  In this example, a null value is replaced with a valid string that has been stored in an ODI variable called DefaultNADescription.  This strategy offers two benefits:

1. Invalid records are captured and fixed before creating the flow-dataset of the mapping.

2. If these attributes are the only attributes requiring data validation, then a CKM is not required.

 

 

Tip 3 - Using Check Knowledge Modules in ODI

 

  • Use ODI User Functions to design reusable data validation rules.  ODI user functions improve code-sharing, reusability, and facilitate the maintenance of data validation rules.  When using ODI user functions, the same data validation rules can be used in more than one ODI mapping.  For more information on how to use ODI user functions, go to “Working with ODI User Functions.”
  • CKMs reject invalid records from a source dataset, but they do not fix them.  Create a process to fix and recycle rejected records.  Once the rejected records have been fixed, select an IKM that can recycle the fixed records - a selected number of IKMs have options to recycle rejected records from previous mapping executions.  For instance, the IKM Oracle Incremental Update has an option called RECYCLE_ERRORS that can be used to recycle rejected records.
  • CKMs can be used on technologies such as Oracle, Teradata, and Hive.  Some of these technologies are now available as cloud data services, and CKMs can validate data on those cloud services as well.  For SQL databases, ODI offers the CKM SQL, which can be used on both on-premises and cloud databases.  For Oracle, ODI offers the CKM Oracle, which can be used on both on-premises Oracle databases and Oracle cloud database services.
  • When designing ODI mappings to perform data upload operations between on-premises data servers and cloud data services, users can use CKMs to validate data on either platform.  The location of the data validation should be based on performance, and available resources.

For additional information on how to use ODI check knowledge modules, go to “Using Oracle Data Integrator (ODI) Check Knowledge Modules.”

 

Conclusion

CKMs are code templates that can be used to perform data validation during the execution of an ODI mapping.  CKMs can also be used to validate the content of the datastores in an ODI model.  When using CKMs, ODI developers do not need to implement steps to validate a source dataset - CKMs already provide a predefined set of tasks to perform data validation.  Thus, CKMs can reduce the time it takes to develop and implement data validation rules in an ETL project.  CKMs should be used wisely - the steps to validate data may increase the overall time that takes to execute an ODI mapping.

CKMs offer a comprehensive set of features to validate data on both on-premises data servers and cloud data services.  However, data validation and data quality may require additional data governance.  Consider using other Oracle products and services to perform data management, data profiling, data auditing, and data cleansing.  For additional information on Oracle data quality products, go to “Overview of Oracle Enterprise Data Quality.”

Benjamin Perez-Goytia


Previous Post

Creating a CI/CD pipeline between Jenkins and Mobile Cloud Services

Yannick Ongena | 13 min read

Next Post


Setup of PaaS Computes (SOACS/MFTCS/DBCS) over IP Network for VPN Connectivity

Shub Lahiri | 17 min read