Content validated on 4/14/2021 with
ODI Version 12.2.1.4.200721.1540
This article discusses the best practices for selecting and using Oracle Data Integrator (ODI) Check Knowledge Modules (CKMs) on both on-premises data servers and cloud data services. In traditional ETL tools, when an ETL data flow is designed, ETL developers must design steps to perform data validation and remove the records that do not pass validation. The task of designing and implementing steps to validate data can increase the overall time it takes to implement an ETL project. In ODI, Check Knowledge Modules (CKMs) are code templates that can be used to validate the source dataset of an ODI mapping during its execution. CKMs can also validate the content of the datastores in an ODI model.
When using CKMs, ODI developers do not need to implement steps to validate a source dataset - CKMs already provide a predefined set of tasks to perform data validation. Thus, CKMs can reduce the time it takes to design and implement data validation in an ETL project. CKMs are recommended in cases where a source dataset may contain invalid records and these invalid records must be removed before or after they are loaded into a target datastore. CKMs should be used wisely - the steps to validate data may increase the overall time that takes to execute an ODI mapping.
CHKs are one of seven categories of Knowledge Modules (KMs) found in ODI. Other categories of KMs include: Loading, Reverse-Engineering, Integration, Extract, Journalizing, and Service. This article focuses on the selection and use of CKMs. If you would like to learn more about other types of KMs, go to “Oracle Data Integrator (ODI) Knowledge Modules (KMs).”
In ODI, users can import metadata from a data server into an ODI model using the ODI Reverse-Engineering process. The import operation populates the name of the selected datastores – along with their attributes – from the data server into the ODI model. If the datastores from the data server contain data constraints such as keys, references, and conditions, then these data constraints are imported into the ODI model as well. In the ODI model, users can define additional data constraints, and these data constraints do not need to physically exist in the data server – they can exist in the ODI repository only. When an ODI mapping is executed, the CKM uses the data constraints of the target datastore to enforce data integrity. CKMs can also be used to validate and enforce data integrity in an ODI model. The next sections of this article discuss how to use CKMs in both ODI mappings and ODI models.
When an ODI mapping is executed, a flow-dataset is created to hold the selected records from the source datastores of the mapping. The flow-dataset can be stored in a temporary table or it can be uploaded directly into the target datastore of the mapping. If the flow-dataset is stored in a temporary table, a CKM can be used to validate the content of the temporary table. This data validation occurs before data is loaded into the target datastore. This strategy is known as Flow Control. However, if the flow-dataset is loaded directly into the target datastore, then a CKM can be used to validate the target datastore instead. This data validation occurs after data is loaded into the target datastore. This strategy is known as Static Control.
In an ODI mapping, CKMs are assigned to target datastores - one CKM per datastore. CKMs are optional and they should be assigned to target datastores only if data validation is required. CKMs are assigned in the Physical Design section of an ODI mapping, specifically, in the Properties section of the target datastore. Figure 1 below, shows the physical design of an ODI mapping:
Figure 1 - Physical Design of an ODI Mapping
In this example, Figure 1 above, the physical design of this mapping contains a single target datastore, W_ORDERS_F. This target datastore is an Oracle table that requires data validation before data is loaded into this table. This table has data constraints such as primary key, foreign keys, and checks. Thus, a CKM can be used to ensure that the records found in the flow-dataset of the mapping meet the constraints of this target datastore. Since this target datastore is an Oracle table, the CKM Oracle is used for this data validation. Figure 2 below shows the KM options for the CKM Oracle:
Figure 2 - Check Knowledge Module for the Oracle Technology
The CKM Oracle, on Figure 2 above, checks that records found in the flow-dataset of the mapping are consistent with the defined constraints of the target datastore. The validation includes primary key, foreign keys, and checks. If invalid records are found, they are copied from the flow-dataset into an error table. Then, the invalid records are removed from the flow-dataset and only valid records are loaded into the target datastore. The error table is created during the execution of the mapping.
In an ODI mapping, the invocation of the CKM is controlled by the Integration Knowledge Module (IKM) assigned to the target datastore. The IKM must have one of two options in order to invoke the CKM: FLOW_CONTROL, and STATIC_CONTROL. These two options are available in a selected number of IKMs. The FLOW_CONTROL option performs data validation against the flow-dataset that has been stored in a temporary table. The temporary table, known as the I$ table, is created by the IKM if this option is selected. The STATIC_CONTROL option loads the flow-dataset directly into the target datastore. The I$ table is not created with this option. This option validates data against the target datastore once data is loaded into the table. Figure 3, below, shows these options in the IKM Oracle Incremental Update:
Figure 3 - IKM Flow Control Option and Static Control Option
By default, the FLOW_CONTROL option is enabled inthis IKM, as shown on Figure 3 above. Thus, when using this IKM, a CKM must be selected.
CKMs can also be used to validate and remove invalid records from datastores in an ODI model. When using a CKM to check an ODI model, the CKM uses the error table created by the ODI mapping during execution to look for invalid records. For each invalid record found in the error table, the CKM searches for the same invalid record in the datastore where it was originally inserted, and if it is found, it is deleted. Figure 4 below shows how to select and use a CKM in an ODI model. First, select the Control option. Then, select a CKM based on the technology of the ODI model. Select the Check Model option to validate the content of the datastores in the ODI model.
Figure 4 - Using Check Knowledge Modules in ODI Models
CKMs are useful knowledge modules that can be used to enforce data integrity during the execution of an ODI mapping. CKMs can also be used to validate data in an ODI model. However, data validation can add additional time to the overall execution of an ODI mapping. Thus, when selecting and using CKMs in ODI, the following best practices should be observed:
Figure 5 - Fixing Data Value Issues with the ODI Expression Editor
1. Invalid records are captured and fixed before creating the flow-dataset of the mapping.
2. If these attributes are the only attributes requiring data validation, then a CKM is not required.
For additional information on how to use ODI check knowledge modules, go to “Using Oracle Data Integrator (ODI) Check Knowledge Modules.”
CKMs are code templates that can be used to perform data validation during the execution of an ODI mapping. CKMs can also be used to validate the content of the datastores in an ODI model. When using CKMs, ODI developers do not need to implement steps to validate a source dataset - CKMs already provide a predefined set of tasks to perform data validation. Thus, CKMs can reduce the time it takes to develop and implement data validation rules in an ETL project. CKMs should be used wisely - the steps to validate data may increase the overall time that takes to execute an ODI mapping.
CKMs offer a comprehensive set of features to validate data on both on-premises data servers and cloud data services. However, data validation and data quality may require additional data governance. Consider using other Oracle products and services to perform data management, data profiling, data auditing, and data cleansing. For additional information on Oracle data quality products, go to “Overview of Oracle Enterprise Data Quality.”
Previous Post
Next Post