The Data Sync tool provides the ability to extract from both on-premise, and cloud data sources, and to load that data into BI Cloud Service (BICS), and other relational databases. In the recent 2.2 release of Data Sync, the functionality to run Post Session SQL and Stored Procedures was added. This allows, for instance, the Data Sync tool to call a stored procedure to update summary tables, and materialized views, in the target databases once the underlying data load has been completed.
As of the time of writing, this functionality is only available when the target database is an Oracle DBCS or standalone Oracle database. It does NOT work with the standard BICS Schema Service target database.
This article provides steps for a viable workaround to run post session commands in a Schema Service target.
Be sure to download and install the latest version of the Data Sync Tool from OTN through this link.
Data Sync requires JDK8. You can download that through this link.
(Note - customers still using BI Cloud Service (BICS) and loading into the schema service database, can not use this version of Data Sync. The version for BICS is available to download here, but does not have all the functionality of the OAC / OAAC version.)
For further instructions on configuring Data Sync, see this article. If a previous version of Data Sync is being upgraded, use the documentation on OTN.
Once the main data load has been completed, a single row will be inserted into a status table in the schema service database. That will trigger the stored procedure to be run.
This solution will provide two triggering methods. The choice of which to use will depend on the type of stored procedure that needs to be run once the data load has completed.
The current version of the Data Sync tool does not allow us to control the order that the load steps occur in. This means that we do not have the ability to make sure that the status table - that will trigger the stored procedure - is only loaded after all other table loads are complete.
As a workaround we will use 2 jobs. The first will load the data. Once that finishes, the second job will be triggered. This will load the single row into the status table, and that will trigger the post-load stored procedure to be run.
For this demonstration, a simple target table ‘DS_LOAD_STATUS’ will be created in the Schema Service database with 2 fields – a ‘LOAD_STATUS’ and ‘STATUS_DATE’. The make-up of this table is not important. The main point is that a table needs to exist in the schema service database that can be loaded last. The two different trigger methods will be discussed next, but both will use the existence of a new row in this DS_LOAD_STATUS table to trigger the post session stored procedure.
1. This example SQL can be run in the ‘SQL Workshop’ tool within Apex for the Schema Service database accompanying the BICS environment to create the DS_LOAD_STATUS table.
CREATE table "DS_LOAD_STATUS" (
Two different methods are shown below. Method 2 will work for all cases. Method 1, which is slightly simpler, will work for specific cases.
If the post session stored procedure does not include any DDL statements (for example, truncate, drop, create indexes, tables, etc) - so it is using only 'select', 'insert', 'update' and 'delete' commands - then the simplest method is to create an On-Insert trigger on the status table. When a row is added, the trigger fires, and the stored procedure is run.
In this case, it is assumed that a stored procedure, named ‘POST_SESSION_STEPS’, has already been created.
The following SQL will create the ‘on insert’ trigger against the DS_LOAD_STATUS table so that after a row is inserted, this stored procedure is called.
create or replace trigger "DS_LOAD_TRIGGER_SP"
insert on "DS_LOAD_STATUS"
for each row
If the stored procedure does use DDL statements, then the use of a table on-insert trigger may not run smoothly. In that case a scheduled database job will be created, which will look for a new row in the status table. Once the new row is recognized, this job will execute the post load stored procedure.
Once again it is assumed that a stored procedure named ‘POST_SESSION_STEPS’, has already been created.
This process contains two steps. Firstly a short stored procedure is created which will evaluate a condition - in this case whether a new row has been recently added to the status table - and if the condition is true, it will execute the the main stored procedure.
This SQL below creates this procedure called 'CHECK_POST_SESSION_CONDITION' which will check to see if a new row has been added to the DS_LOAD_STATUS table within the last 5 minutes.
create or replace procedure CHECK_POST_SESSION_CONDITION as
select count (*) from DS_LOAD_STATUS
where STATUS_DATE > sysdate - interval '5' minute; -- checking for row inserted in last 5 minutes
IF V_ROW_COUNT >= 1 THEN
POST_SESSION_STEPS; -- post session procedure
The final step is to create a scheduled job that runs every 5 minutes checking the condition above.
begin CLOUD_SCHEDULER.CREATE_JOB (
JOB_NAME => 'POST_SESSION_DS_LOAD_JOB',
JOB_TYPE => 'STORED_PROCEDURE',
JOB_ACTION => 'CHECK_POST_SESSION_CONDITION', -- run the CHECK_POST_SESSION_CONDITION procedure
REPEAT_INTERVAL => 'freq=minutely; interval=5' ); -- run job every 5 minutes
All remaining steps will be carried out in the environment where data sync is installed.
In this scenario, a Data Sync Job already exists which will load the desired data into the BICS schema service database and is named ‘Main_Load’.
If this job has never been run, run it now. A successful load is important so that the ‘Signal’ file can be created. This is the mechanism that will be used to trigger the second job, that will then load the status table, that will in turn trigger the post-load process.
We need to create a new Project for the second job.
3. Do this by selecting ‘Projects’ from the ‘File’ menu.
4. Chose an appropriate name
In this example, the target table with its trigger was created in the steps 1 and 2. We need to set up this table as a target for data sync to load to.
5. Under ‘Project’, select ‘Target Tables/Data Sets’ and then ‘New’. In the table name enter the exact name of the existing target table - in this case ‘DS_LOAD_STATUS'.
6. Select the 'Table Columns' sub-tab, and enter the column names and correct data types to match what was created in step 1.
We also need to define a source to create the data for this DS_LOAD_STATUS table. If a suitable table already exists in the source database, that may be used. In this example we will base the data on a SQL statement.
7. Under ‘Project’ / ‘Relational Data’ select ‘Data from SQL’. Provide a name for the source, and select to load into an existing target. Use the search drop down to select the ‘DS_LOAD_STATUS’ table created in the step 1. Select the source connection and enter the SQL.
In this case it is a simple select statement that will return one row, with a value of ‘LOAD_COMPLETE’ for the LOAD_STATUS field, and the current time and date, for the STATUS_DATE.
sysdate as STATUS_DATE,
‘LOAD_COMPLETE’ as LOAD_STATUS
8. Select the newly created source, and then edit the Load Strategy. In this case, because it’s a status table, we have chosen to always append the new row, and never delete existing data.
9. Give the Job a suitable name in the ‘Jobs’ / ‘Jobs’ menu area, and then ‘Run’ the job.
Make sure the job runs successfully before continuing.
The Data Sync tool creates ‘Signal’ files whenever a job starts and successfully finishes. These files are stored in the /log/jobSignal sub-directory. Take a look in this directory.
In our case we see 4 files, as this image shows. The important one for our purpose is the one that shows when the Main_Load job has completed. In this case that Signal File is named ‘Main_Load_CompletedSignal.txt’. This is the file we will have Data Sync check for, and when it finds it, it will trigger the second job.
To set up Data Sync to automatically trigger a job, we need to edit the ‘on_demand_job.xml’ file in the /conf-shared directory.
10. Open this file with a text editor.
11. An entry needs to be added to the <OnDemandMonitors> section.
The syntax is:
<TriggerFile job=$JOB_NAME file=$FILE_TO_TRIGGER_JOB></TriggerFile>
In this example the full syntax will be:
<TriggerFile job="POST_LOAD_JOB" file="C:\Users\oracle\Desktop\BICSDataSync_V2_2\log\jobSignal\Main_Load_CompletedSignal.txt"> </TriggerFile>
12. Change the pollingIntervalInMinutes to the desired check interval. In this case we set it to 1, so that Data Sync will check for the existence of the Signal file every minute. The entry should look similar to this.
13. Save the updated on_demand_job.xml
14. Test the process is working.
Re-Open the Original Project and run the Main_Load job. Monitor the jobSignal directory. Shortly after finishing the Main_Job, the Signal file, in this case ‘Main_Load_CompletedSignal.txt’ is found. The Data Sync tool deletes the file so that the process will not run again, and starts the POST_LOAD_JOB created in step 9.
15. As an additional check, go to the schema service database in Apex, and confirm that the DS_LOAD_STATUS table has had a new entry added, and that the ‘post-load’ stored procedure has been successfully run.
This article walked through an approach to run a post-load stored procedure with the Data Sync tool and a schema service database target.
For other A-Team articles about BICS and Data Sync, click here.