The Data Sync tool provides the ability to extract from both on-premise, and cloud data sources, and to load that data into BI Cloud Service (BICS), and other relational databases. In the recent 2.2 release of Data Sync, the functionality to run Post Load SQL and Stored Procedures was added.
Currently this functionality is only available for Oracle DBCS or Oracle DB target databases - it will NOT work for a Schema Service database target - although this article provides details of a workaround when the target is a schema service database.
This article will walk through an example to set up both a post load SQL command, and to execute a stored procedure on the target database.
Be sure to download and install the latest version of the Data Sync Tool from OTN through this link.
Data Sync requires JDK8. You can download that through this link.
(Note - customers still using BI Cloud Service (BICS) and loading into the schema service database, can not use this version of Data Sync. The version for BICS is available to download here, but does not have all the functionality of the OAC / OAAC version.)
For further instructions on configuring Data Sync, see this article. If a previous version of Data Sync is being upgraded, use the documentation on OTN.
This article will present a simple use case that can be expanded for real world load scenarios.
A Post Load Processing session will be set up to run both a SQL statement, and a stored procedure on the target database once the load has completed.
In this example, a summary table will be loaded with a row of data once the underlying fact table has been refreshed in BICS. Because the summary table only exists in the target database, we need to create it as a target in data sync.
1. Under 'Project' / 'Target Tables/ Data Sets', select 'New'
In this example the summary table is called 'AUDIT_EVENT_SUMMARY', and consists of just 2 fields.
An 'AUDIT_RECORD_COUNT' numeric field, and a 'CAPTURE_DATE' date field.
2. Create the fields as shown, then 'Save'
Now that we have the target table defined, we can set up the post-load SQL, and the stored procedure.
1. From 'Project' / 'Post Load Processing' select 'New'
2. Enter an appropriate name, hit 'Save', then select the 'SQL Source Tables' tab
Data Sync offers the ability to execute the SQL and Stored Procedure either at the end of the entire load process, or after the load completion of one or more individual tables. This is controlled within the 'SQL Source Tables' section.
If the post load processing is to be run after all tables have been loaded, then no source tables need to be added. If this section is left empty, then by default the data sync tool will run the post load processing only after all tables are loaded.
If the post load processing can be run after one or more tables have been loaded, then that dependency can be set up here.
3. Select 'Add/Remove' and then the 'Go' search button to generate a list of table sources being used.
In this example we will trigger the load after the fact table ('AUDIT_EVENT_DBAAS') has been loaded.
4. Select the table, then hit 'Add', and finally 'Save' to close out of the screen.
There is a 'SQL Target Tables' tab as well. This is useful if the target table needs to be truncated as part of the update process.
Truncating and reloading tables with indexes and large record volumes can result in performance issues. The data sync tool will handle this by having the target database perform the following steps:
If the target table is always going to be loaded incrementally, then select the 'Truncate for Full Load' check box, else 'Truncate Always'.
For demonstration purposes, we will select our target summary table.
5. Select 'Add/Remove'
6. Select 'Go' to list the available target tables
7. Select the table(s) and 'Add'. Then chose the appropriate option as to whether to 'Truncate Always' or only 'Truncate For Full Load'.
The next steps will be used to define the SQL and Stored Procedure.
8. Select 'OK' to return to the 'Edit' tab, hit 'Save', and then select the radio button within the 'SQL(s)/Stored Procedure(s)' box
9. In the next screen select 'Add', enter an appropriate name, and then select whether this step is to run a 'SQL' statement, or a 'Stored Procedure'. In this first example we will set up a post load SQL command.
10. There is also the option to run this post load process on just an 'Initial Load', an 'Incremental Load' or 'Both'. In this example we select 'Both'.
11. In the section below, as shown, enter the valid SQL statement to be run on the target database. In this case a single row is added to the summary table that we had created previously.
12. Click 'OK' to return to the previous screen.
To create a Stored Procedure follow similar steps. In this example we will set up the post load processing entry to run both the SQL, and a Stored Procedure.
13. Select 'Add', enter a suitable name, and select the 'Stored Procedure' type.
14. Enter the name of the procedure in the entry box. You do not need to type in 'execute' - the data sync tool will take care of that - just enter the name of the stored procedure, then click 'OK' and 'OK' again to exit out of the Post Load Processing set-up.
When the Job is next run, the SQL and Procedure will be run after the fact table has been loaded.
It is possible to set up multiple post load processes, with different dependencies. Each will be run independently once the source tables defined have been loaded.
This article walked through the steps to create a Post Load SQL and Stored Procedure within the Data Sync tool.
For other A-Team articles about BICS and Data Sync, click here.