This article includes step by step instructions on how to implement an early arriving fact using database functions, ODI lookups, ODI user functions, and ODI custom knowledge modules. In this article, I focus on creating a reusable solution in ODI that can be utilized for other early arriving facts in your data warehouse.
This is Part II of a two-parts article. Part I gives you an overview of the proof-of-concept (POC) I created in ODI to implement an early arriving fact. To read about Part I, go to "Implementing Early Arriving Facts in ODI, Part I: Proof of Concept Overview". To download a copy of the Proof of Concept (POC) I created for this article, please go to “ODI Repository Sample for Early Arriving Facts.”
NOTE: The screenshots were originally designed for ODI 11g, but early arriving facts can also be implemented on ODI 12c using the same methodology described on this blog.
Implementing Early Arriving Facts in ODI, Part II: Implementation Steps
My POC includes the following implementation steps:
Our first step is to create a series of database functions to handle late arriving dimensions. Since my database technology is Oracle, I will implement these functions using PL/SQL. My goal is to write the minimum amount of PL/SQL code required to effectively manage the late arriving dimensions. The rest of the logic will be handled by ODI. As I explained in part I of this article, there are 2 late arriving dimensions: customer dimension (type 2), and product dimension (type 1).
The logic to handle the missing dimension record is the same. Each function will take one parameter: the missing dimensional natural key. Each function will return one value: the warehouse key or surrogate key of the new dimension record. The functions will only be invoked if an early arriving fact record is detected.
Figure 1 illustrates the PL/SQL package definition for our 2 late arriving dimension functions:
Figure 1: PL/SQL Package Definition
Figure 2 illustrates the actual implementation of the PL/SQL function that I developed to handle the customer late arriving dimension. I am using the default table (W_DEFAULT_VALUES) to populate the values, but a list of hard-coded values can be used too.
Figure 2: PL/SQL function LATE_DIMS.D_CUSTOMER_DIM
The SQL exception of “-00001” is needed in case another interface, running at the same time, inserts the late arriving dimension record.
ODI Lookups is a great feature in ODI 11g that can be used to implement warehouse dimensions in interfaces where the target datastore is a warehouse fact table.
For more information on how to define Source Datastores and Lookups in ODI 11g, please see: “Defining Source Datastores and Lookups in ODI”
In our discussion, we are going to model our 3 dimensions (Customer, Status, and Product) as source lookups for the Orders Fact interface (Warehouse.W_ORDERS_F). The Source Datastore Area of our Orders Fact interface will be modeled as follow:
Figure 3 shows how to add the driving table called “ORDERS” (the staging table) into the source area of our Orders Fact Interface:
1. Locate your driving table in the “Models” section of ODI, and drag it into the source area of your interface.
2. In the Target area of your interface, map column ORDER_NUM with ORDERS.ORDER_NUM.
3. In the Source area of your interface, select the Lookup option to add your first lookup table.
Figure 3: Adding the Driving Table
The first lookup we would like to create is the Status Dimension. Figure 4 shows how to add the Status Dimension as an ODI lookup.
1. Locate and select the Status Dimension in the Lookup Table area.
2. Select Next.
Figure 4: Selecting the Status Dimension as Lookup
Proceed to create the Lookup condition as shown in Figure 5:
1. Select ORDER_STATUS from the source table (the driving table).
2. Select STATUS_CD from the Lookup table.
3. Select Join to create the lookup condition.
4. Set Staging as your “Execute On” option.
5. Select “SQL left-outer join in the from clause” as your Lookup Type.
6. Select Finish.
Figure 5: Lookup Condition
Figure 6 shows the Status Dimension Lookup in the source area of the Orders Fact interface.
Figure 6: Status Dimension Lookup
Repeat the same steps for the Product Dimension.
When creating a Lookup condition for our Customer dimension, an additional filter must be configured. Customer is a slowly changing dimension (Type 2), which means we must add an additional condition to filter by the active record of the customer. In our Customer dimension, the active record of a customer is the row where CURRENT_FLAG is equal to 1. Figure 7 shows how to add this filter:
1. In the Lookup Wizard screen, manually type the following condition in the Lookup Condition box:
AND W_CUSTOMER_D.CURRENT_FLAG = 1
2. The entire lookup condition must read as follow:
ORDERS.CUST_ID=W_CUSTOMER_D.CUST_ID AND W_CUSTOMER_D.CURRENT_FLAG = 1
Figure 7: Adding the Current Flag Filter
Figure 8 shows how the lookup tables and the driving table have been modeled in the Orders Fact interface (Warehouse.W_ORDERS_F):
Figure 8: Lookup and Driving table for interface "Warehouse.W_ORDERS_F"
Now that we have modeled our dimensions as ODI lookups, and implemented database functions to correctly handle late arriving dimensions, the next step is to map the dimension foreign keys of our early arriving fact table. Also, we need to find an efficient way to invoke the late arriving dimension functions when we encounter early arriving fact records.
In this section, I am going to illustrate how you can build and use ODI user functions to correctly map your fact keys and invoke the late arriving dimension functions when necessary. We can then use these ODI user functions in other interfaces that need to handle other early arriving facts.
The first step is to construct a database expression that correctly handles the early arriving fact record. I like to use a “SQL Case statement” to handle this case. As an example, let’s construct a SQL case statement that invokes the customer late arriving dimension function when an early fact record arrives and the customer does not exist in the customer dimension:
when W_CUSTOMER_D.ROW_WID is null then LATE_DIMS.D_CUSTOMER_DIM(ORDERS.CUST_ID)
ROW_WID (the customer surrogate key) is NULL when the customer does not exist in the customer dimension: a case of an early arriving fact record. In this case, the late arriving customer function is invoked to insert the missing customer. The function then returns the new customer surrogate key. If the customer already exists in the customer dimension, then the existing customer surrogate key is used.
Now, let proceed to implement the above SQL case statement as an ODI User function:
1. Select User Functions from the ODI project, right-click, and select “New User Function”.
2. Enter the function Name, and enter a new Group called “Late Arriving Dimensions” (you can call it anything).
3. In the Syntax box, enter the name of the function, and the parameters that the function will accept. In our example, we are using 2 parameters: SurrogateKey, and NaturalKey. Follow the syntax as shown on Figure 9.
Figure 9: Creating a new ODI user function
Let’s now implement the actual function as illustrated in Figure 10:
1. Select the “Implementations” tab
2. Select the “+” sign to create a new implementation
3. Enter the SQL case statement as follow:
WHEN $(SurrogateKey) IS NULL THEN LATE_DIMS.D_CUSTOMER_DIM($(NaturalKey))
4. Select “Oracle” as the “Linked Technology”. Select any other technology where SQL case statements are supported.
5. Select OK, and save your new ODI user function.
Figure 10: Implementation Syntax of function "LATE_CUSTOMER"
Repeat the previous steps again to create the ODI user function that will be used to map the Product Foreign Key of the Early Arriving Fact table. Figure 11 illustrates the implementation of this function:
Figure 11: Implementation of function "LATE_PRODUCT"
Now, we can use our new ODI user functions to map both the customer foreign key, and the product foreign key of our early arriving fact table.
Figure 12 shows the mapping section of our early arriving fact interface (Warehouse.W_ORDERS_F). Let’s proceed to map these columns:
1. Select “CUST_WID” column mapping.
2. Select the Expression Editor. We can directly type the expression in the Mapping Property box, but it is easier to construct the expression with the ODI Expression Editor.
Figure 12: Mapping the early arriving fact table
Figure 13 shows how to construct your expression with the Expression Editor window:
1. Locate the Project User Functions created in our previous section.
2. Double click on the user function called “LATE_CUSTOMER”. The Expression editor box should be filled with the actual function syntax. Replace the parameter names of the function with the actual column names as follow:
EXPRESSION: LATE_CUSTOMER(W_CUSTOMER_D.ROW_WID, ORDERS.CUST_ID)
3. Select Apply.
Figure 13: Mapping the Customer Surrogate Key
Repeat the steps to map PRODUCT_WID. Figure 14 shows the mapping of PRODUCT_WID:
Figure 14: Mapping the Product Surrogate Key
The last fact key we need to map is the STATUS_WID. Since the Status dimension is not a late arriving dimension, and we don’t expect to see any unknown statuses from early arriving facts, we map W_ORDERS_F.STATUS_WID with W_STATUS_D.ROW_WID. Figure 15 shows this mapping:
Figure 15: Mapping the Status Surrogate Key
Finally, let's map the rest of our early arriving fact columns as follow:
1. REVENUE_AMT is our measure, and it should be mapped to: ROUND(ORDERS.PRICE * ORDERS.QUANTITY,2)
2. CREATED_DT, and UPDATED_DT should be mapped to current date and time: SYSDATE.
Figure 16 shows the full mapping of our early arriving fact interface:
Figure 16: Full Mapping of the Early Arriving Fact Interface
Our mappings are now complete. Let’s run the Early Arriving fact interface and verify how ODI evaluates the ODI user functions. Figure 17 shows an example of how the “Insert flow into I$ table” step (from the IKM Oracle Incremental Update) looks like when running the Orders Fact interface. You can see how ODI substitutes the user functions (LATE_CUSTOMER, and LATE_PRODUCT) with the actual code: the CASE statements:
Figure 17: ODI user function substitution, IKM step "Insert of the I$ table"
This POC requires the modification of the IKM Oracle Incremental Update. Here is why we need to modify the KM:
To properly manage this restriction, we are going to create a new KM option in the IKM Oracle Incremental Update that will allow us to only execute step “sub-select inline view” if the KM option is set to “True”. By default, the option will be set to “True”, but for the Orders Fact interface, the option will be set to “false”.
Figure 18: INLINE_VIEW option for IKM Oracle Incremental Update
Save your new KM Option. Select the “Details” tab of the KM, and open the “Sub-Select Inline view” step as shown in Figure 19.
Figure 19: Modifying Step “sub-select inline view”
At the bottom of the step, expand “Options” as illustrated in the Figure 20. Unselect the “Always Execute” option and only check the “INLINE_VIEW” option. This mean that the step will only be executed if INLINE _VIEW option is set to “True” (which by default, this is how we implemented the option). The idea is that the behavior of the KM will not change. But for our Orders fact interface we are going to set this option to “false”.
Figure 20: Configuring the Execution of the INLINE_VIEW option
Finally, Notice that I renamed the KM by adding at the end “(Inline View Option)”. Save all your changes.
Close and open the Orders fact interface again (Warehouse.W_ORDERS_F). Go to the Flow tab of the interface, and select the new INLINE_VIEW option, and set it to “false” as shown Figure 21:
Figure 21: Setting to "false" the INLINE_VIEW option
Save your changes, and proceed to test your interface.
In order to test your early arriving fact, you must download and install the “ODI Repository Sample for Early Arriving Facts” file. Here are the steps to test:
1. There are 3 records in the MY_STG_AREA.ORDERS staging table.
2. Order “12-234-2344” and “12-230-1111” are early arriving facts:
- Customer #1234 does not exist in the customer dimension
- Product #A201DE does not exist in the product dimension.
3. Figure 22 illustrates the content of the MY_STG_AREA.ORDERS staging table:
Figure 22: Content of Orders Staging Table
4. Run the “Warehouse.W_ORDERS_F” interface, and verify the following:
a. The MY_WAREHOUSE.W_ORDERS_F table should contain 3 new records:
Figure 23: Orders Fact records
b. The MY_WAREHOUSE.W_CUSTOMER_D table should have a new record: the late arriving customer (CUST_ID #1234):
Figure 24: Late Arriving Customer record
c. The MY_WAREHOUSE.W_PRODUCT_D table should have a new record: the late arriving product (product #A201DE):
Figure 25: Late Arriving Product record
5. Table MY_STG_AREA.CUSTOMER contains the late arriving customer #1234 with its full attributes. Also, table MY_STG_AREA.PRODUCT contains the late arriving product #A201DE with its full attributes.
6. Run both “Warehouse.W_CUSTOMER_D (Type 2)” and “Warehouse.W_PRODUCT_D (Type 1)” interfaces, and verify the following:
a. The MY_WAREHOUSE.W_CUSTOMER_D should have a new record for customer #1234 with its full attributes (ROW_WID=134). The old record (ROW_WID=132) has been historized.
Figure 26: New customer record
b. The MY_WAREHOUSE.W_PRODUCT_D has been updated. Product #A201DE has been updated with the correct product line, and product description.
Figure 27: Product record update
7. Finally, run again the “Warehouse.W_ORDERS_F” interface, and verify that for customer #1234, the customer warehouse ID (CUST_WID) in the fact table has been updated with the latest warehouse ID. In my example the latest customer warehouse ID is 134:
Figure 28: Fact table with updated customer warehouse IDs
An early arriving fact is a common predicament in a data warehouse. In this article we learned how to address this issue using ODI. Our solution included a combination of features in both ODI and the database engine.
For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit “Oracle A-Team Chronicles for ODI”.