* This blog was last tested on Oracle Analytics Cloud Data Sync (2.6.1) by Jay Pearson (Oracle A-Team) + ADW 19c *
The Data Sync tool provides the ability to extract from both on-premise and cloud data sources, and other relational databases.
In the past, if you needed to transform the data as part of the load process, the only options were to do it in a SQL statement against the source database, or as a post-session command, or procedure against the target database once the data had been loaded.
Since Data Sync 2.3, functionality has been added which provides the ability to transform the data as part of the load process. This is particularly useful when the source database does not support transformations, or has limited transformation ability - such as flat files or SaaS sources.
The process uses the target Oracle database as the transformation engine, which means all Oracle functions can be leveraged as part of the data transformation and preparation process. Existing data attributes can be modified, and new data attributes added.
There are some caveats:
- The Target database must be an Oracle Database. For example a Public Oracle Database in OCI such as Autonomous databases (ADB) or DBaaS. It does not work with the legacy/limited DBCS-Schema (BICS) database, nor does it support non-Oracle database targets.
- DVCS datasets are also not supported.
Limitations aside, if the target is DBaaS or an Oracle database, then this new functionality adds a lot of options for transforming data and makes Data Sync a much more powerful tool.
This article will walk through steps to set up some example transformations.
In a related article, this A-Team blog walks through the process of using similar functionality to perform Lookups during a load, which provides the ability to enhance the data being loaded by including existing primary and foreign keys, or other lookup information, from the target. A link to that article is here.
Be sure to download and install the latest version of the Data Sync Tool from OTN through this link.
Data Sync requires JDK8. You can download that through this link.
(Note - customers still using the legacy/limited DBCS-Schema database, can not use this version of Data Sync. The version for BICS is available to download here, but does not have all the functionality of the OAC / OAAC version.)
For further instructions on configuring Data Sync, see this article. If a previous version of Data Sync is being upgraded, use the documentation on OTN.
Data Sync makes use of the target Oracle database to do the transformations. If the transformations are simple - for instance replacing null values with a default value, then this is done as part of the insert. If the transformation includes SQL expressions, then a temporary table is created in the target database and the original data loaded there. Then a SQL expression is run on the target database to transform the data and to load the true target table.
To improve performance, the data is loaded in small chunks (10,000 rows is the default). These chunks are loaded and transformed into the target, and then the next rows are loaded. In testing, this approach was shown to improve overall load times, sometimes considerably.
In this way Data Sync adds true ELT functionality - 'Extract Load and Transform'.
This article will walk through 2 different scenarios.
1. Updating the value of an existing column in the source data so that it is transformed as it is loaded into the target database
2. Adding new target fields that did not exist in the source, but were instead created from calculations based on one or more of the source columns.
In this example a file based csv file is the source, but the steps are similar for other data sources. The file used is available for download here for testing (Sample_File.csv)
1. Open the file and examine the data.
In a text or spreadsheet editor, open the Sample_File.csv. Notice the company name is lower case. In this first example we are going to able the SQL command 'initcap' to change the data to have the initial letter of each word capitalized. This is a very simple use case, but more complex SQL expressions could be just as easily be used.
2. Set up the Source Table
a. Under Project -> Data Loads -> New -> Add new source object -> OK -> Select Source = File source -> OK -> Choose a file -> Browse file location -> Next -> Provide Logical Name.
b. Take the default values in the 'Import Options' and then 'next'.
c. Give the target table a suitable name.
3. Create the Transformation
a. Under 'Project' -> Data Loads -> select sample_file data -> Column Mapping tab
b. In the 'Target Expression' column for the 'Company' column, select the small icon at the far right of the column (as shown below) which will open up the 'Expression' editor.
You can key directly in the SQL here, or you can use the Expression editor to select the columns and generate the expression that way.
for instance - you could key this in directly:
or use the expression editor to bring in the column name:
Both will work
c. Click 'OK' to close the editor, and then the 'Save' menu icon:
We will test this mapping in the next section.
In this example 2 new calculated fields will be created.
The first 'Days_Order_to_Bill' is a calculated field that calculates the number of days between the 'bill_day_dt' and 'order_day_dt' fields in the source file
The second 'Profit_Margin' calculates the profit margin based on the product category and revenue.
1. Create the new Target Columns
a. Under 'Project' -> Data Loads -> Click Target Object -> Target Columns -> Click New
b. Create 2 new fields.
PROFIT_MARGIN as a NUMBER of length 38, with precision of 5
DAYS_ORDER_TO_BILL as a NUMBER, of length 38.
as shown below, then hit 'SAVE' -> Close window
2. Create the ELT Mapping
a. From Project -> Data Loads -> select the file row -> Column Mapping tab -> Click on Manage Target Columns.
b. Select 'Add unmapped column' menu option to present the fields that exist in the Target table, but that are not yet part of the source or ELT.
c. Select the two columns created previously in the 'Available Columns' and then move them over to 'Select Columns', and then 'OK'
This will add the 2 columns to the Column Mapping section.
In this example we are going to build target expressions for both new target columns, but if you merely wanted to pass through one of the source columns into the new target - then you could select that column in the 'Source Column Name' drop down that corresponds to the new target field.
d. To create the first SQL statement, select the elipse in the far right of 'Target Expression' for the 'DAYS_ORDER_TO_BILL' field as shown:
e. Create a transformation that subtracts the ORDER_DAY_DT from the BILL_DAY_DT.
This could be keyed in directly as:
ORDER_DAY_DT - BILL_DAY_DT
or by using the expression editor to create the expression:
"%%ETL_TRANSFORMATIONS"."BILL_DAY_DT" - "%%ETL_TRANSFORMATIONS"."ORDER_DAY_DT"
Both will work. Then select 'OK'
f. Open the expression editor for the PROFIT_MARGIN field, and enter the expression.
In this case we are using a CASE statement to create a calculation of a percentage of the REVENUE based on the value of the Product Line of Business field (PROD_LIB)
g. Click 'OK' to close the editor once complete.
case when "%%ETL_TRANSFORMATIONS"."PROD_LOB" = 'communications' then "%%ETL_TRANSFORMATIONS"."REVENUE" * 0.3
when "%%ETL_TRANSFORMATIONS"."PROD_LOB" in ('digital','electronics','tv') then "%%ETL_TRANSFORMATIONS"."REVENUE" * 0.2
else "%%ETL_TRANSFORMATIONS"."REVENUE" * 0.1
It is possible to use fields only in the 'stage' area and not bring into the target.
In the example above, if we wanted to use the BILL_DAY_DT and ORDER_DAY_DT only for the calculation of DAYS_ORDER_TO_BILL, but did not need those fields persisted in the target, then the 'Stage_Only' check mark in the 'Target Tables / Data Sets' / 'Table Columns' could be used.
The target table would then be built without them.
3. Validate the Transformations
Data Sync can validate the expressions to confirm that the syntax is correct.
a. Highlight the table in 'File Targets' and right click to bring up the available options.
b. Select 'Validate Transformations'
If the expressions are valid, a message similar to the one below will be shown.
If the expressions are not valid, then details will be provided in the log file referenced. In that case, examine the log, then correct the expressions and re-validate.
This log file also contains the SQL that Data Sync will uses to perform the ELT, so can be a useful reference
4. Run the Job
Confirm that the mapping worked as expected.
In this case the 3 transformations created in the first 2 steps are shown below.
The COMPANY name had 'initcap' applied to it as the data was loaded.
New fields, DAYS_ORDER_TO_BILL and PROFIT_MARGIN were created and populated.
In addition to the new ELT functionality, Data Sync also added 3 new variables that can be included as part of a load to add metadata or new functionality to loads. These 3 variables are:
%%DML_CODE - which provides the DML used for each row, being it INSERT or UPDATE. It gets a value of 'I' or 'U' depending on if the record got inserted/updated. The target column to take this value should be CHAR(1) data type. This value can be loaded into the target table in the database and can aid trouble-shooting.
%%UPSERT_TIME - which captures the precise time the target row was updated or inserted. The target column should be of 'DATE' format. Again, this can be loaded into the target table.
%%SURROGATE_KEY - which can be used as a lookup to get a surrogate key, particularly useful in joins. It should be a NUMBER field.
They are available in the expression editor in the dropdown under 'Default':
This article walked through two approaches for creating true ELT transformations with the Data Sync tool. While both examples were simple, far more complex SQL expressions can be used, which gives Data Sync powerful ELT capabilities.
For other A-Team articles about OAC and Data Sync, click here.