X

Best Practices from Oracle Development's A‑Team

Configuring the Data Sync Tool for BI Cloud Service (BICS)

For other A-Team articles about BICS and Data Sync, click here

Introduction

The Data Sync Tool provides the ability to synchronize data from on-premise sources to the BI Cloud Service (BICS).  The tool can source from both relational data, and flat files, and it can be used for one-off loads, or in an incremental fashion to keep BICS 'in-sync' with on-premise data.

This article will cover the installation and configuration steps, and then walk through the creation of data loads including load strategies, scheduling, and monitoring jobs.

 

Main Article

This article can be read in its entirety, or used as a reference guide for a specific function.  The following topics will be covered, and the links below can be used to go directly to that topic.

 

Table of Contents

Downloading and Installing

Starting the Tool

Configuration

System Preferences
E-Mail Configuration

Concepts for the Data Sync Tool

Connections

BICS and File Source Connections
New RDBMS Source
DBaaS Instance as Target

Projects

Data from Table
SQL Source
File Source

Load Strategies

Target Table Options
Rolling Delete

Running and Scheduling Jobs

Monitoring and Trouble Shooting Jobs

Reloading Data

 

Downloading and Installing

The Data Sync tool is available on OTN, download it here.

It is a Java application, and requires a Java 1.7 JDK and jdbc drivers.  A Java Runtime Environment (JRE) is not sufficient.

Once downloaded and unzipped, locate the 'config.bat' or 'config.sh' depending on the operating system.  Open this in a text editor, and set the path for the Java home for the machine where the application will be run.

set JAVA_HOME="C:\Program Files\Java\jdk1.7.0_71"

export JAVA_HOME=/u01/app/java/jdk

The Data Sync tool uses JDBC to connect to source databases.  If you intend to source from a relational database to load BICS, you need to make sure the JDBC drivers are the correct versions for the source on-premise database.  The tool ships with Oracle JDBC version 11.2.x.  If the source database is different - then obtain the correct JDBC drivers for that version of the database, and copy into the relevant \lib directory of the Data Sync tool and overwrite the version that is there.  This table shows the files required for each database vendor that need to be obtained and copied.

 

Vendor
JDBC Driver Name
Oracle ojdbc6.jar
MySQL Mysql-connector-java*.jar
Microsoft SQL Server sqljdbc.jar
DB2 db2java.zip
Timesten ttjdbc6.jar, orai18n.jar, timestenjmsxla.jar, jms.jar, javax.jms.jar
Teradata terajdbc4.jar, log4j.jar, teradata.jar, tdgssjava.jar, tdgssconfig.jar

 

Starting the Tool

To start the Data Sync tool, run the datasync.bat for windows, or datasync.sh command for linux.  These commands can be found in the root of the unzipped directory.

The first time the tool is run a configuration wizard will take you through the setup. In subsequent sessions of the application the tool will recognize that the wizard has been run before and open the existing configuration.

 

Configuration

 

System Preferences

A System Preference panel can be opened from under the 'Views' menu.

Within the preferences are a number of options, including the default folder where flat files will be searched for, and the number of log files that will be kept before they are purged.

An explanation for each option is available in the 'Comments' column.

Windows7_x64

 

E-Mail Configuration

The Data Sync tool can be set up to send e-mail notifications about jobs.  This is done under the Tools / E-Mail menu.

1. Select the 'Recipients' sub-menu and add one or more e-mail addresses.

2. Under the Setup menu, add the details of the e-mail server and connection details.

Email_config

3. Hit the 'Send Test Email' button to confirm the settings.  The test e-mail will be sent to the recipients defined in the first step.

 

Concepts for the Data Sync Tool

The tool has 3 main sections.

Connections - this is where the source and target connections are defined.

Project - a workspace where the source files and targets are defined, as well as their attributes.  This is where the load strategy for each table is also defined.

Jobs - a unit of work that can be used to upload the data from one of more sources defined in the project.  A project can have one or more jobs.  This is also where the status of data loads can be found and examined.

Each section can be reached by selecting the corresponding button in the main menu bar:

Windows7_x64

 

Connections

 

BICS and File Source Connections

There are two default connections.  'TARGET' which is the BICS cloud instance that will be loaded, and 'File Source' which is the connection for file sources.

Do not delete or rename either entry, they are referenced internally by the tool.

The 'TARGET' connection must be edited to point to the BICS Cloud instance.  Under Connections, select 'TARGET' and edit the details.  The Connection Type should be left as 'Oracle (BICS)'.  A username and password for a user with the 'Data Loader' application role should be entered.  The URL will be the same as the entry to the BICS /analytics portal, but without the "/analytics" suffix.  For instance:

Windows7_x64

If this URL is not known, it can be located on the Oracle Business Intelligence Cloud Service page within My Services, as shown below.  Copy that URL, and remove the "/analytics" suffix.

Windows7_x64

Select 'Test Connection' to confirm connectivity to the BICS target.

2

 

New RDBMS Source

To add a new RDBMS as a source, select 'New', then name it.  Select the correct connection type for the database vendor, and enter the connection details.  This is an example of an Oracle DB source:

Windows7_x64

Test the connection to confirm the settings.

 

DBaaS Instance as Target

A DBaaS Instance can be used as the target for the Data Sync tool.  This can be done in 2 ways.

1. The Connection can be changed in the BI Cloud Service Console, by selecting to option to 'Use a different cloud database' for the connection, as shown below.  If this is done, then the standard 'TARGET' connection outlined above will automatically load all data into the DBaaS Instance.

Windows7_x64

2. In some circumstances it may be preferable to keep the BICS connection unchanged, but to have the Data Sync tool upload selected data to the DBaaS Instance.  To do that, create a new 'Source / Target' under the Connections section.  The connection should be an 'Oracle (Thin)' type, using the Service Name, Host, Port and an appropriate user with rights to create and load data into the DBaaS Instance, as shown below.

Windows7_x64

Projects

Once the source and target connections have been setup, the next step is to define the actual tables, sql, or files to be used as the sources, and the table(s) that will be loaded in BICS.  This is done under the 'Project' section.

 

Data from Table

To source from a table in the on-premise RDBMS created in the previous section, select the 'Data From Table' option under the Project / Relational Data section.

 

Windows7_x64

Firstly select the Data Source where the table is located (1), then 'Search Tables' to bring up a list of the tables available (2).  Next select the Table or Tables to be imported by checking the 'Import' checkbox (3), and finally the 'Import Tables' button (4).

Windows7_x64

 

SQL Source

Instead of using a table, it may be preferable to use a SQL statement against the source database.  It could be that the data is coming from multiple tables, or that the data in the source tables is at a more detailed level so aggregate functions need to be used to reduce the data set and increase performance in BICS.

Select the 'Data from SQL' option. The query requires a logical name, and the results can be persisted either in a new table or in an existing table.  If 'Create a table' is select, the column definitions are inferred from the SQL structure. If an existing table is selected, any new columns from the SQL will be added to the list of columns.

 

Windows7_x64

 

File Source 

A similar process is used for setting up a file source.  Select the 'New' button under the Project / File Data selections.

 

Windows7_x64

Select the details of the source file location, and then chose the Target Table option.  Again, there is an option to load an existing table, or to create a new one.  Select the delimiter for the file, the timestamp format, and the number of lines the tool will sample to estimate field sizes and types.  If the first row contains headers, select that here.  If not - the column names will need to be entered in the following screen.   If the file contains duplicates, there is an option to remove these if required.

 

Windows7_x64

The next screen provides the options to chose the fields to be imported from the flat file, as well as some basic Data Transformations for changing the case of the field, converting to a number, or trimming spaces.  The tool makes its best 'guess' for each data-type based on sampling up the the first 10,000 rows of the file, but the user should go through and make sure each has been classified correctly.   Special characters will be removed from the column header, and converted to upper case.  If a column name length is more than 30 characters, it will be truncated.

If the 'Remove Duplicates' option was selected in the previous screen, then the 'Update Rows Match' column would be used to select what makes the records unique.

Windows7_x64

If everything is set correctly and there are no data errors, a 'Success' notification will be displayed.  If there is a problem, the error message will provide details of the problem.

When errors are encountered, a 'bad' file is created in the log directory with the convention of <logical name>.bad.

The bad file contains information about which line is the bad record, the record itself, and what were the problems parsing that file.

Once the bad records have been identified, these must be fixed in the original file and the process re-run.  The file will only be loaded once there are no errors.

 

Load Strategies

There are multiple load strategies that can be used with the Data Sync Tool.

For Data from Table, or Data from SQL the load strategy can be found and changed within the 'Edit' properties of the Connection under the Project Section of the tool:

Windows7_x64

For a File source it can be found under the 'File Targets' tab by selecting the 'Load Strategy' option:

Windows7_x64

The load strategies available are:

Replace Data in Table: this will truncate the target table each run and then reload the table.  If the target table has indexes, these will be dropped prior to the load and are then recreated afterwards.

Append Data in Table: data will be appended to the target table.  No checks will be done to see if the data already exists in the target.

Update Table: the first time this is run the target table, if it exists, will be truncated and indexes removed and recreated post load.  There are 2 options, both of which can be run independently or concurrently.

1. Add new records - the target will be checked to see if the key exists - if it does not, then the row will be added

2. Update Existing Records - the target will be checked to see if the key exists and an update date in the source system is more recent than the date the job was last run.  If that is the case, the record in the target will be updated.

For the Update Table strategy the tool requires 2 fields.  A unique key, and an update date.

In this example, the COUNTRY_ID is the unique key, and the M_LAST_UPDATE_DTTM is the field that the source system uses to specify when that record was last updated.

Windows7_x64

There is also an option to 'Never Delete Data' - this prevents the 'Truncate' and removal of indexes from running for the 'Replace Data in Table' option, and also the first run of the 'Update table' load strategies.

 

Target Table Options

Check the 'Target Tables' section.  This will have a row for each file or RDBMS table source.  Select the table, and then within the options below the table name, table columns and properties, and indexes can be edited if necessary.

If the target tables were created by another process - perhaps from a manual load through the File Data Loader, or through the Apex tool - there may be issues for the Data Sync tool when it tries to truncate and remove and recreate indexes, or with existing constraints on the table in BICS that do not match the constraints set up in the data loader tool.

To fix this, have the tool drop and recreate the target table itself first.  To do this, under the 'Target Tables' section, right click on the table name, and then select the 'Drop/Create/Alter Tables' option:

 

Windows7_x64

Several options will be presented.  Select the 'Create New' and also the 'Drop Tables'.  This will drop the table created previously be the other process and recreate it with the tool.

 

Windows7_x64

 

Rolling Delete Days

If the data set in BICS is large, performance can be improved by purging data that has reached a certain age.  This age is calculated based on the date used in the filter for the Incremental Update.  In this example any rows updated in the source table more than 30 days ago will be deleted.

 

Windows7_x64

 

Running And Scheduling Jobs

Once the source tables and files, and the load strategies have been defined, the job can be run by selecting the 'Run Job' button in the toolbar.

Jobs can also be scheduled to run from the 'Job Schedules' sub tab.

Within the properties of the job the number of retries can be set up.  By default this is set to null, or 0.

Windows7_x64

 

Monitoring and Trouble Shooting Jobs

To monitor select the 'Jobs' tab.

Windows7_x64

While the job is running, it can be viewed on the 'Current Jobs' tab.  The 'Auto Refresh' can be set to automatically refresh the status.

Once a job has finished - either successfully, or with an error - it is moved to the 'History' tab.

 

Windows7_x64

The record on the top shows the summary of the job. The tasks tab below lists one line item per data flow as defined by the user. The details tab elaborates on the line items for the data flow.  In this example below a job is highlighted in the top section, and the tasks for that job are listed in the bottom section with details of the rows successfully loaded.

 

Windows7_x64

Logs are also saved to the file system under the 'log' sub-directory of the BICSDataSync folder.

If a task fails, it may be necessary to re-run the whole job.  Before running it again, right click on the failed Job name, and select 'Mark As Completed'.

Windows7_x64

 

Reloading Data

Should it be necessary to do a full load of BICS, this can be achieved in 2 ways.

To reload all tables, go to 'Tools / Re-extract and load all data'.  A confirmation box will open.

Windows7_x64

To reload a single table, go to Connections, then the 'Refresh Dates' tab.  Select the table to be refreshed, and select the 'Re-Extract Data' option.

Windows7_x64

A further option box will be presented, where 'all data' can be reloaded, or all 'data from a specified date'.  Make the appropriate choice and then select 'ok'.  The subsequent run will extract all the data from the source table, and reload the table.

 

Summary

This article walked through the steps to download, configure, and set-up and monitor data loads in BICS.  For further information, see the documentation on OTN.  That documentation can be found here.

For other A-Team articles about BICS and Data Sync, click here

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha