This article illustrates how to implement early arriving facts in Oracle Data Integrator (ODI). This article has two parts. Part I gives you an overview of a proof-of-concept created in ODI to implement an early arriving fact. Part II includes step-by-step instructions on how to implement an early arriving fact using Oracle database functions, ODI lookups, ODI user functions, and ODI custom knowledge modules. Also, Part II focuses on creating a reusable solution in ODI to effectively manage any type of early arriving fact or late arriving dimension.
If you wish to skip Part I of this article and would like to read about Part II, go to "Implementing Early Arriving Facts in ODI, Part II: Implementation Steps".
NOTE: The screenshots were originally designed for ODI 11g, but early arriving facts can also be implemented on ODI 12c using the same methodology described on this blog.
Implementing Early Arriving Facts in ODI, Part I: Proof of Concept Overview
Ralph Kimball, the leading visionary in the data warehouse industry, defines an early arriving fact as follow:
"An early arriving fact takes place when the activity measurement arrives at the data warehouse without its full context. In other words, the statuses of the dimensions attached to the activity measurement are ambiguous or unknown for some period of time."
In Extract-Transform-Load (ETL) terminology, an early arriving fact is when the ETL process performs a look up of a surrogate key in a dimension table using the natural key of a fact table, and no value is returned because the dimension record doesn't exist yet.
An early arriving fact is also known as a late arriving dimension because the dimensional member will arrive after the activity measurement.
In this article, both terms are being used: “early arriving fact” and “late arriving dimension.” But, they both refer to the same data warehouse event.
For example, the term “early arriving fact” will be used in this article in sections where the topic of discussion is a warehouse fact table. However, the term “late arriving dimension” will be used instead to describe activities that relate to a warehouse dimension table.
The best way to learn how to implement an early arriving fact in ODI is to develop a proof-of-concept, also known as a POC.
The POC for this article includes two main components: ODI repository and two database schemas. The database schemas include sample data, which can be used for testing purposes. If you would like to download a copy of this POC, please go to “ODI Repository Sample for Early Arriving Facts”.
The following sections describe all the components of this POC.
The data warehouse environment for this POC is divided into two main areas: the data mart and the staging area. The data mart is a star schema with 3 dimensions and 1 fact table. The staging area is the source data area for dimensions and fact tables.
Figure 1: Warehouse ETL Process Flow
The data mart for this POC includes three dimensions: customer, product, and status. There is only one fact: orders.
The customer dimension, defined in the sample database schema as W_CUSTOMER_D, is a Type-2 dimension. Type 2 dimensions are also known as slowly changing dimensions: they store and manage both current and historical data over time. This dimension is also a late arriving dimension: a delay on adding a new customer or updating an attribute of an existing customer may occur in this dimension. Also, it is possible that facts containing new customers may arrive in the data warehouse before adding the new customers into this dimension.
The product dimension, defined in the sample database schema as W_PRODUCT_D, is a Type-1 dimension. No history is being maintained about products; new products get added and existing data gets updated with changes from the source system. This dimension is a late arriving dimension, too. When new products are added in the source system, facts containing the new products may arrive in the data warehouse before adding the new products into this dimension.
The status dimension, defined in the sample database schema as W_STATUS_D, is a Type-1 dimension. This dimension is always current with data from the source system. No delays are expected between source system changes and the content of this dimension, and no early arriving facts are expected in regards to this dimension.
The orders fact contains measured activity about customer orders. This fact is considered an early arriving fact; its activity measurement may include customers and products that have not yet been added into the customer and product dimensions respectively.
The staging area includes 4 staging tables: customer, product, status, and orders.
Customer is the source table for the Customer Dimension. Product is the source table for the Product Dimension, and status is the source table for the Status Dimension. The Orders table is the source for the Orders Fact. The Orders table has the early arriving facts.
The ODI Repository consists of the following metadata:
Part II of this article will discuss in greater detail how to build the Warehouse.W_ORDERS_F interface. Figure 2 shows the interfaces used for this ODI project:
Figure 2: ODI Interfaces
This ODI project includes a series of ODI variables to manage default values such as dimension flags, dimension default values, and default date formats. Most of the variables are refreshed by selecting the corresponding default value from a table called W_DEFAULT_VALUES.
There are 2 ODI user functions for this project: LATE_CUSTOMER and LATE_PRODUCT. These user functions will be used in the mapping of the fact interface. Part II of this article dedicates a section to showing how to create these functions and build expressions with them in the fact interface.
ODI Knowledge Modules
Two ODI Integration Knowledge Modules (IKM) are used for this project:
Figure 3 shows the ODI variables, user functions, and knowledge modules used for this project.
Figure 3: ODI variables, functions, and knowledge modules
There are three ODI models: Dimensions, Facts, and Staging. Dimensions and Facts are both using the same logical schema called “Warehouse”. Model Staging uses a logical schema called “Staging”.
Figure 4 shows the organization of the ODI Models.
Figure 4: ODI Models
The ODI topology contains one physical architecture: Oracle. There is only one data server: Warehouse. The Warehouse and Staging areas are both in the same physical database, so only one ODI data sever is required.
Figure 5: ODI Topology Manager
An early arriving fact is a common predicament in a data warehouse. In this article, we covered the main components of a proof-of-concept (POC) created in ODI to address this issue. I strongly recommend downloading a copy of this POC, which can be found at this location: “ODI Repository Sample for Early Arriving Facts”. Part II of this article demonstrates how to implement some of the components of this POC. It is a great way to understand how to use ODI to address this issue and create a reusable solution for any type of early arriving fact and late arriving dimension. If you would like to read more about this subject, please go to "Implementing Early Arriving Facts in ODI, Part II: Implementation Steps".
For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit “Oracle A-Team Chronicles for ODI”.