Implementing Early Arriving Facts in ODI, Part II: Implementation Steps

Introduction

This article includes step by step instructions on how to implement an early arriving fact using database functions, ODI lookups, ODI user functions, and ODI custom knowledge modules.  In this article, I focus on creating a reusable solution in ODI that can be utilized for other early arriving facts in your data warehouse.

This is Part II of a two-parts article.  Part I gives you an overview of the proof-of-concept (POC) I created in ODI to implement an early arriving fact.  To read about Part I, go to “Implementing Early Arriving Facts in ODI, Part I: Proof of Concept Overview“.  To download a copy of the Proof of Concept (POC) I created for this article, please go to  “ODI Repository Sample for Early Arriving Facts”.

Implementing Early Arriving Facts in ODI, Part II: Implementation Steps

Implementation Steps

My POC includes the following implementation steps:

  • Creating database functions to handle late arriving dimensions – I will demonstrate how to implement these functions using PL/SQL.  I will keep to a minimum the code that is required and let ODI handle the rest of my ETL process flow.  I am proposing the development of 2 functions:  (1) customer function, and (2) product function.  We don’t need a function to handle the status dimension, since we are not expecting late arriving records for this dimension.
  •  Implementing warehouse dimensions as ODI lookups – ODI lookups is one of the best features of ODI 11g.  I am going to demonstrate how to implement all of my dimensions as ODI lookups for my Early Arriving Orders Fact Interface.   For the Customer Slowly Changing Dimension, I will add an additional condition to ensure that my Customer lookup filters data by the active row of each customer.
  • Creating ODI user functions to manage early arriving facts – Another great feature of ODI.  I plan to encapsulate the database functions into ODI function, so I can add additional business rules, and re-use them in any other early arriving fact table.  This section of this article will give you a clear overview of the benefits of using ODI user functions.
  • Mapping Warehouse Keys of the Fact Table with ODI user functions – In this section, I am going to illustrate how to map warehouse keys of the fact table with ODI user functions in our Early Arriving Orders Fact Interface.
  • Modifying “IKM Oracle Incremental Update” – This KM will be modified to make optional step “sub-select inline view”.  This step is not required by the Orders Fact interface.
  • Testing – Finally, we will test our POC, and validate our test cases.

Step #1: Creating database functions to handle Late Arriving Dimensions

Our first step is to create a series of database functions to handle late arriving dimensions.   Since my database technology is Oracle, I will implement these functions using PL/SQL.  My goal is to write the minimum amount of PL/SQL code required to effectively manage the late arriving dimensions.  The rest of the logic will be handled by ODI.  As I explained in part I of this article, there are 2 late arriving dimensions: customer dimension (type 2), and product dimension (type 1).

The logic to handle the missing dimension record is the same.  Each function will take one parameter:  the missing dimensional natural key.  Each function will return one value:  the warehouse key or surrogate key of the new dimension record.  The functions will only be invoked if an early arriving fact record is detected.

Figure 1 illustrates the PL/SQL package definition for our 2 late arriving dimension functions:

Package Late Dims - Definition

Figure 1: PL/SQL Package Definition

 

Figure 2 illustrates the actual implementation of the PL/SQL function that I developed to handle the customer late arriving dimension.  I am using the default table (W_DEFAULT_VALUES) to populate the values, but a list of hard-coded values can be used too.

Customer Fuction

Figure 2: PL/SQL function LATE_DIMS.D_CUSTOMER_DIM

 

The SQL exception of “-00001” is needed in case another interface, running at the same time, inserts the late arriving dimension record.

Step #2: Implementing Warehouse Dimensions as ODI Lookups

ODI Lookups is a great feature in ODI 11g that can be used to implement warehouse dimensions in interfaces where the target datastore is a warehouse fact table.

For more information on how to define Source Datastores and Lookups in ODI 11g, please see:  “Defining Source Datastores and Lookups in ODI

In our discussion, we are going to model our 3 dimensions (Customer, Status, and Product) as source lookups for the Orders Fact interface (Warehouse.W_ORDERS_F).  The Source Datastore Area of our Orders Fact interface will be modeled as follow:

  • One Driving Table:  ORDERS.  This is a staging table, and it is the main source of the Orders Fact table.  It is called the “driving” table.
  • Three Lookup Tables:  Status, Product, and Customer dimensions.

Figure 3 shows how to add the driving table called “ORDERS” (the staging table) into the source area of our Orders Fact Interface:

1. Locate your driving table in the “Models” section of ODI, and drag it into the source area of your interface.
2. In the Target area of your interface, map column ORDER_NUM with ORDERS.ORDER_NUM.
3. In the Source area of your interface, select the Lookup option to add your first lookup table.

Fact Interface - Adding A Lookup - Step 1

Figure 3: Adding the Driving Table

 

The first lookup we would like to create is the Status Dimension.  Figure 4 shows how to add the Status Dimension as an ODI lookup.

1. Locate and select the Status Dimension in the Lookup Table area.
2. Select Next.

 Fact Interface - Adding A Lookup - Step 2

Figure 4: Selecting the Status Dimension as Lookup

 

Proceed to create the Lookup condition as shown in Figure 5:

1. Select ORDER_STATUS from the source table (the driving table).
2. Select STATUS_CD from the Lookup table.
3. Select Join to create the lookup condition.
4. Set Staging as your “Execute On” option.
5. Select “SQL left-outer join in the from clause” as your Lookup Type.
6. Select Finish.

 Fact Interface - Adding A Lookup - Step 3

Figure 5: Lookup Condition

 

Figure 6 shows the Status Dimension Lookup in the source area of the Orders Fact interface.

Fact Interface - Adding A Lookup - Step 4

Figure 6: Status Dimension Lookup

 

Repeat the same steps for the Product Dimension.

When creating a Lookup condition for our Customer dimension, an additional filter must be configured.  Customer is a slowly changing dimension (Type 2), which means we must add an additional condition to filter by the active record of the customer.  In our Customer dimension, the active record of a customer is the row where CURRENT_FLAG is equal to 1.  Figure 7 shows how to add this filter:

1. In the Lookup Wizard screen, manually type the following condition in the Lookup Condition box:

AND W_CUSTOMER_D.CURRENT_FLAG = 1

2. The entire lookup condition must read as follow:

ORDERS.CUST_ID=W_CUSTOMER_D.CUST_ID AND W_CUSTOMER_D.CURRENT_FLAG = 1

Customer Lookup - Current Flag Filter

Figure 7: Adding the Current Flag Filter

 

Figure 8 shows how the lookup tables and the driving table have been modeled in the Orders Fact interface (Warehouse.W_ORDERS_F):

Lookups & Driving Table

Figure 8: Lookup and Driving table for interface “Warehouse.W_ORDERS_F”

 

Step #3: Creating ODI user functions to manage early arriving facts

Now that we have modeled our dimensions as ODI lookups, and implemented database functions to correctly handle late arriving dimensions, the next step is to map the dimension foreign keys of our early arriving fact table.  Also, we need to find an efficient way to invoke the late arriving dimension functions when we encounter early arriving fact records.

In this section, I am going to illustrate how you can build and use ODI user functions to correctly map your fact keys and invoke the late arriving dimension functions when necessary.  We can then use these ODI user functions in other interfaces that need to handle other early arriving facts.

The first step is to construct a database expression that correctly handles the early arriving fact record.  I like to use a “SQL Case statement” to handle this case.  As an example, let’s construct a SQL case statement that invokes the customer late arriving dimension function when an early fact record arrives and the customer does not exist in the customer dimension:

case

     when W_CUSTOMER_D.ROW_WID is null then LATE_DIMS.D_CUSTOMER_DIM(ORDERS.CUST_ID)

     else W_CUSTOMER_D.ROW_WID

end

ROW_WID (the customer surrogate key) is NULL when the customer does not exist in the customer dimension: a case of an early arriving fact record.  In this case, the late arriving customer function is invoked to insert the missing customer.  The function then returns the new customer surrogate key.  If the customer already exists in the customer dimension, then the existing customer surrogate key is used.

Now, let proceed to implement the above SQL case statement as an ODI User function:

1. Select User Functions from the ODI project, right-click, and select “New User Function”.
2. Enter the function Name, and enter a new Group called “Late Arriving Dimensions” (you can call it anything).
3. In the Syntax box, enter the name of the function, and the parameters that the function will accept. In our example, we are using 2 parameters: SurrogateKey, and NaturalKey. Follow the syntax as shown on Figure 9.

LATE_CUSTOMER($(SurrogateKey),$(NaturalKey))

User Function - Customer

Figure 9: Creating a new ODI user function

 

Let’s now implement the actual function as illustrated in Figure 10:

1. Select the “Implementations” tab
2. Select the “+” sign to create a new implementation
3. Enter the SQL case statement as follow:

CASE

     WHEN $(SurrogateKey) IS NULL THEN LATE_DIMS.D_CUSTOMER_DIM($(NaturalKey))

     ELSE $(SurrogateKey)

END

4. Select “Oracle” as the “Linked Technology”. Select any other technology where SQL case statements are supported.
5. Select OK, and save your new ODI user function.

User Function - Customer - Implementation

Figure 10: Implementation Syntax of function “LATE_CUSTOMER”

 

Repeat the previous steps again to create the ODI user function that will be used to map the Product Foreign Key of the Early Arriving Fact table.  Figure 11 illustrates the implementation of this function:

User Function - Product - Implementation

Figure 11: Implementation of function “LATE_PRODUCT”

 

Step #4:  Mapping Warehouse Keys of the Fact Table with ODI user functions

Now, we can use our new ODI user functions to map both the customer foreign key, and the product foreign key of our early arriving fact table.

Figure 12 shows the mapping section of our early arriving fact interface (Warehouse.W_ORDERS_F).  Let’s proceed to map these columns:

1. Select “CUST_WID” column mapping.
2. Select the Expression Editor. We can directly type the expression in the Mapping Property box, but it is easier to construct the expression with the ODI Expression Editor.

Mapping For Customer

Figure 12: Mapping the early arriving fact table

 

Figure 13 shows how to construct your expression with the Expression Editor window:

1. Locate the Project User Functions created in our previous section.
2. Double click on the user function called “LATE_CUSTOMER”. The Expression editor box should be filled with the actual function syntax. Replace the parameter names of the function with the actual column names as follow:

SYNTAX: LATE_CUSTOMER($(SurrogateKey),$(NaturalKey))

EXPRESSION: LATE_CUSTOMER(W_CUSTOMER_D.ROW_WID, ORDERS.CUST_ID)

3. Select Apply.

 Mapping For Customer - Expression

Figure 13: Mapping the Customer Surrogate Key

 

Repeat the steps to map PRODUCT_WID.  Figure 14 shows the mapping of PRODUCT_WID:

Calling Product Function Within Mapping

Figure 14: Mapping the Product Surrogate Key

 

The last fact key we need to map is the STATUS_WID.  Since the Status dimension is not a late arriving dimension, and we don’t expect to see any unknown statuses from early arriving facts, we map W_ORDERS_F.STATUS_WID with W_STATUS_D.ROW_WID.  Figure 15 shows this mapping:

Setting Status Surrogate Key

Figure 15: Mapping the Status Surrogate Key

 

Finally, let’s map the rest of our early arriving fact columns as follow:

1. REVENUE_AMT is our measure, and it should be mapped to: ROUND(ORDERS.PRICE * ORDERS.QUANTITY,2)

2. CREATED_DT, and UPDATED_DT should be mapped to current date and time: SYSDATE.

Figure 16 shows the full mapping of our early arriving fact interface:

Full Mapping

Figure 16: Full Mapping of the Early Arriving Fact Interface

 

Our mappings are now complete.  Let’s run the Early Arriving fact interface and verify how ODI evaluates the ODI user functions.  Figure 17 shows an example of how the “Insert flow into I$ table” step (from the IKM Oracle Incremental Update) looks like when running the Orders Fact interface.  You can see how ODI substitutes the user functions (LATE_CUSTOMER, and LATE_PRODUCT) with the actual code: the CASE statements:

Insert Flow Step Log

Figure 17: ODI user function substitution, IKM step “Insert of the I$ table”

 

Step #5: Modifying “IKM Oracle Incremental Update”

This POC requires the modification of the IKM Oracle Incremental Update.  Here is why we need to modify the KM:

    • In our Order Fact interface, columns CUST_WID, and STATUS_WID have been mapped to ODI user functions LATE_CUSTOMER, and LATE_PRODUCT respectively.  These ODI user functions will invoke the PL/SQL package called “LATE_DIMS”.  This package will execute an INSERT DML operation when a warehouse key is not found in the Customer or Product dimension.
    • In Oracle database, DML (INSERT, UPDATE, and DELETE) operations are not allowed inside SELECT statements.  Step “sub-select inline view” of the IKM Oracle Incremental Update will fail due to this DML operation restriction.  This step builds a sub-select statement for another interface that wants to use the current interface as a source.  In our Orders Fact interface, we don’t need this step.

To properly manage this restriction, we are going to create a new KM option in the IKM Oracle Incremental Update that will allow us to only execute step  “sub-select inline view”  if the KM option is set to “True”.  By default, the option will be set to “True”, but for the Orders Fact interface, the option will be set to “false”.

Create a new option in the KM called “INLINE_VIEW” of type “Check Box” with a default value of “True” as illustrated on Figure 18.
Adding the Inline_View Option in the IKM

Figure 18: INLINE_VIEW option for IKM Oracle Incremental Update

 

Save your new KM Option.  Select the “Details” tab of the KM, and open the “Sub-Select Inline view” step as shown in Figure 19.

Modifying the KM

Figure 19: Modifying Step “sub-select inline view”

 

At the bottom of the step, expand “Options” as illustrated in the Figure 20.  Unselect the “Always Execute” option and only check the “INLINE_VIEW” option.  This mean that the step will only be executed if INLINE _VIEW option is set to “True” (which by default, this is how we implemented the option).  The idea is that the behavior of the KM will not change.  But for our Orders fact interface we are going to set this option to “false”.

Modifying the KM Inline_View Step - Options

Figure 20: Configuring the Execution of the INLINE_VIEW option

 

Finally, Notice that I renamed the KM by adding at the end “(Inline View Option)”.  Save all your changes.

Close and open the Orders fact interface again (Warehouse.W_ORDERS_F).  Go to the Flow tab of the interface, and select the new INLINE_VIEW option, and set it to “false” as shown Figure 21:

Changing the KM Inline_View Option to False

Figure 21: Setting to “false” the INLINE_VIEW option

Save your changes, and proceed to test your interface.

Testing Your Orders Fact Interface

In order to test your early arriving fact, you must download and install the “ODI Repository Sample for Early Arriving Facts” file.  Here are the steps to test:

1. There are 3 records in the MY_STG_AREA.ORDERS staging table.

2. Order “12-234-2344” and “12-230-1111” are early arriving facts:

- Customer #1234 does not exist in the customer dimension

- Product #A201DE does not exist in the product dimension.

3. Figure 22 illustrates the content of the MY_STG_AREA.ORDERS staging table:

Early Arriving Fact Records

Figure 22: Content of Orders Staging Table

 

4. Run the “Warehouse.W_ORDERS_F” interface, and verify the following:

a. The MY_WAREHOUSE.W_ORDERS_F table should contain 3 new records:

Fact with 3 new records

Figure 23: Orders Fact records

 

b. The MY_WAREHOUSE.W_CUSTOMER_D table should have a new record:  the late arriving customer (CUST_ID #1234):

late arriving customer

Figure 24: Late Arriving Customer record

 

c. The MY_WAREHOUSE.W_PRODUCT_D table should have a new record: the late arriving product (product #A201DE):

late arriving product

Figure 25: Late Arriving Product record

 

5. Table MY_STG_AREA.CUSTOMER contains the late arriving customer #1234 with its full attributes.  Also, table MY_STG_AREA.PRODUCT contains the late arriving product #A201DE with its full attributes.

6. Run both “Warehouse.W_CUSTOMER_D (Type 2)” and “Warehouse.W_PRODUCT_D (Type 1)” interfaces, and verify the following:

a. The MY_WAREHOUSE.W_CUSTOMER_D should have a new record for customer #1234 with its full attributes (ROW_WID=134).  The old record (ROW_WID=132) has been historized.

new customer record

Figure 26: New customer record

 

b. The MY_WAREHOUSE.W_PRODUCT_D has been updated.  Product #A201DE has been updated with the correct product line, and product description.

updated product record

Figure 27: Product record update

 

7. Finally, run again the “Warehouse.W_ORDERS_F” interface, and verify that for customer #1234, the customer warehouse ID (CUST_WID) in the fact table has been updated with the latest warehouse ID.  In my example the latest customer warehouse ID is 134:

new fact surrogate key

Figure 28: Fact table with updated customer warehouse IDs

 

Conclusion

An early arriving fact is a common predicament in a data warehouse.  In this article we learned how to address this issue using ODI.  Our solution included a combination of features in both ODI and the database engine.

For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-Team Chronicles for ODI.

 

Add Your Comment