The Oracle Fusion based SaaS offerings provide an interesting tool to extract data and to store them in CSV format on a shared resource like the Fusion SaaS built-in UCM server or a Storage Cloud: BI Cloud Connector (BICC). The extracted data can be copied from these cloud resources and downloaded to local resources, before post-processing operations like transformations and loads into Data Warehouses or other databases can be initiated. Perhaps this built-in tool is not very much known, and for those readers of this blog article who are interested to play with, the first challenge might be the download of these data extracts.
With this in mind, I will give a short introduction to this tool followed by a focus on the file download capabilities of these data extracts. In case of an enhanced interest, then in future blog postings I will discuss other features and functions in more detail. For readers who are interested in lower level details, I’ve provided some scripts that will make downloading of data extracts easier.
The high-level architecture of BICC is shown in this drawing:
As shown above BICC reads the data from Fusion Apps Database and creates CSV files in a specific technical notation: every CSV file roughly represents a specific database view. With every data extraction run a snapshot of these objects will be written into according CSV files.
Here are some more bullet points to characterize the extract of data via BICC:
These extracted data can be used to feed Data Warehouses, create Data Lakes or load data into a custom database representing various Fusion Apps objects by choice. Figure below gives a more detailed overview about data extraction before initiating a loader processes.
The steps above give a brief overview on extraction process:
The extracted data are formatted as comma separated CSV files and would look like this in a text editor
or look like this after an import into Excel or OpenOffice Calc:
The entry point for using BICC is a SaaS Cloud URL looking like this:
https://<SaaS Cloud Host>:<SaaS Cloud Port>/biacm
Such a sample URL could look like this: https://xxxx.bi.us2.oraclecloud.com/biacm. Important to mention that this URL format might vary now and in future according to the web server configuration of the FA SaaS offerings.
It's probably worth to mention that this URL provides access to the BI Configuration pages that require special administrator privileges. It's not a regular users task to setup BI Cloud Extracts. This means there are two ways to administer these data extracts:
After successful login the BICC home page will look like this:
The Overview page shows the existing offerings that can be used for data extracts. This information is usually covering the product families of a Fusion Apps install. When pointing the mouse pointer on a single offerings bar we will get a summary information about number of data stores by Tool Tip.
Configuring and running data extracts is bundled under the menu category Cloud Extract Administration. It provides menu items that allow configuring
Finally, the Preferences menu category provides some options for specific user settings like Language, Accessibility and Regional parameter configuration.
Before running a cloud extract the scope of such a job must be created first. By default, no Data Store in any Offering is chosen. This scope setting has a global approach: whenever a batch job runs a data extraction the settings from this setup are taken. Means: changing the scope will have an impact on the next run – independently whether this runs in a schedule for weeks or months in future or is an ad-hoc activity next moment.
After choosing a BIA Offering in upper page region all belonging Data Sources are enabled by default in dependent lower page frame. This Configuration UI allows detailed modifications by enabling/disabling various Data Stores or resetting single Data Stores to a full data extract or Disabling the Effective Date. It is also worth to mention that disabling a BIA Offering and saving will remove all changes made before for assigned Data Stores. A subsequent enabling will start with Data Stores enabled to default settings.
Save and leave the page after configuring the required data extract. Once done the global extraction settings are active and next run will extract data as configured.
In a next step the data extraction batch job can be scheduled or initiated via menu item Manage Extract Schedules.
The scheduling options start with Daily and end with Annual runs via Weekly and Monthly. One more option is called Simple and will run once at a dedicated date and time foe ad-hoc data extractions.
These jobs are scheduled by Fusion Apps ESS and might run for a long time depending on the scope (chosen BIA Offerings, number of Data Sources) and the choice to run Full vs Incremental extracts. The resulting CSV files become compressed in ZIP format and are stored according to the Cloud Extract Storage configuration either on an external Storage Cloud or the Fusion Apps built-in UCM host. Depending on the scope the output might become very large – up to dozens of GB.
The configuration of extraction output location must be made via the menu item Configure External Storage. Here a global decision must be done: to store the output files on an internal resource coming with the SaaS offering (UCM) or to store it on an external Storage cloud.
Both choices have their benefits and the decision whether to use the one or other the other option might result out of the following considerations:
Similar to Configuration of Extract this is a global setting. It might be switched at any time and will be active from the moment at the next extraction run.
Via the UI for configuration of Storage Cloud as file destination the following parameter are getting collected:
Once the requested data have been entered this configuration should be tested via the according button. If no issue exists, the configuration must be saved and is valid from the moment for all sub-sequent data extraction runs. This configuration can be changed and switched at any time.
UCM provides a convenient method to store the data extraction files in the SaaS environment itself. The UCM server is part of the technology stack in Fusion Apps and used for multiple purposes like staging area for file imports/exports etc. The configuration via BICC UI is pretty straight forward by entering the following information:
No special user credentials are required for this connection as the according permissions of the logged-in SaaS user account will be used for any access activities.
The values for Staging Folder Configuration can’t be changed - they are just listed on page.
After finishing a data extraction, the resulting files are provided in the configured Storage Cloud Service and Container. One file set consists of a Manifest file and a CSV file per data source. All files are stored in plain format (no sub-directories) in the given container and appear that way in web browser. The Manifest file has a naming format as follows:
MANIFEST-<Storage Date in format YYYYMMDD-HH24MISS>.MF.
This file contains all information required to download the related CSV files. Despite the fact a download from Storage Cloud could be initiated per file via client’s web browser, it seems not practicable to follow that approach in reality due to the usual massive number of files.
As an alternative to downloading every file manually, a Shell script under Linux will help us here. If not connected via NFS (not handled in this blog) such a script can run using the File Transfer Manager Library for Storage Cloud as shown below. For an easier use the Java library has been bundled with the script for download.
The file listing in Manifest file has the following structure (sample):
PLV_KEY=FUSION_12_0
file_crmanalyticsam_productgroupanalyticsam_productgroupitemassign-batch76017394-20170818_170424.zip;33114078f758038a6ced8e74775905d1
file_crmanalyticsam_productgroupanalyticsam_productgroup-batch76017394-20170818_170424.zip;d56adec1d7830c297b17cf32ad0d5159
. . .
While the first line describes the Fusion Apps version (here 12_0) the rest of the file lists the exported and zipped CSV files in the following notation:
The download script starts reading and downloading all Manifest files in a loop. In current version, the standard downloading directory is located in the upper directory of the script called export. According to the date and time information of the Manifest script a new subdirectory export/<date_time> will be created if not existent yet. If a previous download has been completed successfully there is no necessity to download these files again. Indicator is the existence of a file export/<date_time>/download.complete. Its important to remember that an initial full data extraction might produce dozens of GByte of data and that the receiving filesystem has enough free space available. In same context, its worth to remember that a successful previous download should not executed again due to the time and resources it will consume.
The script takes its connection information for Storage Cloud from a file config/fmtcli.properties:
#saving authkey #Mon Aug 21 17:25:31 CEST 2017 segment-size=100 user=<user_name> service=Storage identity-domain=xxxxx auth-key=K7/ZpAUj7Fb/SiVHw1HzAMfnilTUIBkD0yfSqJt2TXk\= retries=5 storage-class=Standard auth-url=https\://<oracle_domain>.storage.oraclecloud.com https=//<storage_console>.us.oraclecloud.com max-threads=15
The auth-key above must be generated once as described in this documentation, section Setting Up Configuration Files. You should follow the instructions to create a new configuration file and locate it afterwards sub-directory config. This is a one-time configuration step that must be done before the script can run.
This script looks in detail as shown below. Worth to mention that a JAVA runtime environment must be installed and necessary environment variables like PATH must point to the java executable. The script is not checking the correct setting of these environment variables and will fail if not set correctly.
#/bin/bash # Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved # # ujanke: 2017/08/30 initial release # # This script searches for MANIFEST.MF files in Storage Cloud. If a data extraction has been previously completely downloaded # it will skip those. By this script files will be downloaded into a directory ../export/<Data Extraction Date>. After a # complete download a file named "download.complete" is being created in that directory. It will be taken as an indicator for a completed download. # The script reads the content of downloaded MANIFEST.MF file and downloads the zipped data extraction files belonging to this # data extraction run. As the MANIFEST.MF file lists the MD5 checksum for every single file this script also compares these # values to validate the correct download. # # Please change the value for variable BIContainer below or change this script to pass the container name as a parameter. # # Deletion of downloaded data extracts is not handled by this script. # export ToolDir=`dirname $0` export BIContainer=UJ_BIConsole java -jar ${ToolDir}/ftmcli.jar list --properties-file ${ToolDir}/config/ftmcli.properties ${BIContainer} > logs/fileList tr -d '\r' < logs/fileList > logs/cleanNames numRows=`cat logs/cleanNames | grep MANIFEST | wc -l` echo "Number of Rows: ${numRows}" if [ ${numRows} -eq 0 ] then echo "No data available! Finishing ..." exit 255 else echo "Running the download of recent BIACM extracts from Storage Cloud for ${numRows} record(s) ..." fi if [ ! -d ../export ] then mkdir ../export fi rm -f > logs/manifestDownload.txt logs/fileDownload.txt for maniFest in `cat logs/cleanNames | grep MANIFEST` do currDocDate=`echo ${maniFest} | awk -F\- '{print $2}' | awk -F\. '{print $1}'` if [ ! -d ../export/${currDocDate} ] then mkdir ../export/${currDocDate} fi if [ ! -f ../export/${currDocDate}/download.complete ] then echo " >> Saving Files from Extraction Date : ${currDocDate}" echo " >> Processing Manifest File : ${maniFest}" java -jar ${ToolDir}/ftmcli.jar download --properties-file ${ToolDir}/config/ftmcli.properties ${BIContainer} ${maniFest} > logs/manifestDownload.txt 2>&1 mv ${maniFest} ../export/${currDocDate} firstRecord=1 for fileInfo in `cat ../export/${currDocDate}/${maniFest}` do if [ ${firstRecord} -lt 1 ] then fileName=`echo ${fileInfo} | awk -F\; '{print $1}'` fileChkSum=`echo ${fileInfo} | awk -F\; '{print $2}'` java -jar ${ToolDir}/ftmcli.jar download --properties-file ${ToolDir}/config/ftmcli.properties ${BIContainer} ${fileName} > logs/fileDownload.txt 2>&1 downloadedMD5Sum=`md5sum -b ${fileName} | awk '{print $1}'` mv ${fileName} ../export/${currDocDate} if [ "${downloadedMD5Sum}" = "${fileChkSum}" ] then echo " >> File ${fileName} downloaded and validated!" else echo " >> !!! Error when processing file ${fileName}: MD5 chk sums are not identical! Exiting the script!" echo " FileName=${fileName}, FileChkSum=${fileChkSum}, downloadedMD5Sum=${downloadedMD5Sum}" exit 1 fi fi firstRecord=0 echo "`date +%Y%m%d%H%M%S`" > ../export/${currDocDate}/download.complete done else echo " >> Skipping Files from Extraction Date ${currDocDate} as previously downloaded on `cat ../export/${currDocDate}/download.complete`"..." fi echo "-----------------------------------------------------------------------------------------" echo "" done echo " >> Finished saving Files from Storage Cloud ..."
After downloading of every ZIP file the MD5 check sum will be obtained and compared with the value from Manifest file. If both values are equal the next file from list in Manifest will be downloaded. In case these check sums are differing the script will break with an according error message.
The execution time of this script may take a significant time depending on the network connections speed and amount of data. As said before the amount of data will vary depending on scope (number of chosen offerings and data sources) and type of extraction (full vs incremental). After completion all files are located in directory ../export/<date_time> and are ready for customer post-processing activities: unzipping, loading and transforming (ELT) via ODI or other tools.
As mentioned earlier in this document the alternate option to store data extraction is the built-in UCM server that can be accessed by URL https://<fusion_server>:<port>/cs for the UI. The principles for data storage are similar to Storage Cloud, but not complete identical. In this case also a Manifest file is written with a file listing of data extraction files belonging to this extraction run.
In opposite to Storage Cloud the search for BICC generated files is not as straight forward. Via the UCM UI a Search page must be opened and Advanced Search chosen as in screenshot shown below. To locate all BICC generated files the value “OBIAImport” or “OBIA*” must be entered in UI field Security Group. The result might end up with thousands of files if none of the previous runs has been deleted. Therefore, it will be easier to search also for Manifest files by entering “MANIFEST.MF” in UI field Title. This will list all existing Manifest files produced by BICC. As we see here the naming convention differs also between UCM and Storage Cloud. As the unique file identifier in UCM is set by the Document ID many files can have the same Title and no distinction via date and time is required.
A broader search via Security Group only will produce a file listing as shown below – a mix of Manifest and data extraction files. Same as in Storage Cloud all files can be downloaded manually via UI, but this would be a questionable strategy due to the sheer number of files per data extraction that might up to some thousands of files. Also here the usage of a self-written script by using the WebCenter Content Document Transfer Utility will make sense and the Java library can be downloaded from here.
The content of a MANIFEST.MF file looks like this:
PLV_KEY=FUSION_12_0
crmanalyticsam_productgroupanalyticsam_productgroupitemassign;1981208;b20bb2c44155e96b1b746fefd2ffa2bc
fscmtopmodelam_finlelegalentitiesam_legalentityprimaryledgerpvo;1981205;2ab03cb400d504c1c677a5fb83a104ad
fnd_flex_labels;1981206;c62a2e71a78389daabf303dc59eeca5e
crm_flex_labels;1981207;280ed4c51d70b854b3e97a9546c7e531
. . .
While the first line, same as in Manifest file for Storage Cloud, describes the Fusion Apps version (here 12_0) the rest of the file lists the exported and zipped CSV files in the following notation:
The sample download script takes its connection information for UCM from file connection.properties as its used by RIDC Java Library:
url=https://<fusion-apps-host>/cs/idcplg username=<username to access UCM like biadmin> password=xxxx policy=oracle/wss_username_token_client_policy
Same as in the script for Storage Cloud download, this script start reading and downloading all Manifest files in a loop. Also here the standard downloading directory is located in the upper directory of the script called export. According to the unique UCM Document ID for Manifest file a new subdirectory export/<MANIFEST.MF DocID> will be created if not existent yet. If a previous download has been completed successfully there is no necessity to download these files again. Also here the indicator is the existence of a file export/< MANIFEST.MF DocID >/download.complete. Again, it’s important to remember that an initial full data extraction might produce dozens of GByte of data and to have in receiving filesystem enough free space available. Also, here it’s worth to remember that a successful previous download should not executed again due to the time and resources it will consume.
Important: A more recent version of this download script was updated in a later blog (U.Janke, Apr 25 2018)
Please find it here: BI Cloud Connector – Deletion of Data Extract Files on UCM
#/bin/bash # Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved # # ujanke: 2017/08/30 initial release # # This script searches for MANIFEST.MF files in UCM. If a data extraction has been previously completely downloaded # it will skip those. By this script files will be downloaded into a directory ../export/<UCM DocID of MANIFEST.MF>. After a # complete download a file named "download.complete" being created in that directory. It will be taken as an indicator for a completed download. # The script reads the content of downloaded MANIFEST.MF file and downloads the zipped data extraction files belonging to this # data extraction run. As the MANIFEST.MF file lists the MD5 checksum for every single file this script also compares these # values to validate the correct download. # # Deletion of downloaded data extracts is not handled by this script. # java -jar ridc/oracle.ucm.fa_client_11.1.1.jar SearchTool \ --SortField=dID \ --SortOrder=asc \ --quiet --propertiesFile=./connection.properties \ --log_file_name=logs/manifestSearchResults.txt \ --simpleRIDCLogging \ --ResultCount=2000 \ --dDocTitle="MANIFEST.MF" \ --dSecurityGroup="OBIAImport" \ --delimiter=, \ --fields=dID,dDocName,dInDate,dOriginalName,VaultFileSize --defaultFields=false >logs/manifestSearchResults.log 2>logs/manifestSearchResults.err numRows=`cat logs/manifestSearchResults.txt | wc -l` numRecs=`cat logs/manifestSearchResults.txt | awk -v lastRow="$numRows" '{ if (FNR == lastRow) printf("%d\n", $1); }'` echo "Number of Rows: ${numRecs}" if [ ${numRecs} -eq 0 ] then echo "No data available! Finishing ..." exit 255 else echo "Running the download of recent BIACM extracts for ${numRecs} record(s) ..." fi i=1 while [ $i -le $numRecs ] do currDocId=`cat logs/manifestSearchResults.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $1}'` currDocDate=`cat logs/manifestSearchResults.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $3}'` echo " >> Saving Manifest file with UCM Doc ID : ${currDocId}" echo " >> Saving Files from Extraction Date : ${currDocDate}" if [ ! -d ../export/${currDocId} ] then mkdir ../export/${currDocId} fi if [ ! -f ../export/${currDocId}/download.complete ] then echo " Will extract data into directory `pwd`/../export/${currDocId}" java -jar ridc/oracle.ucm.fa_client_11.1.1.jar DownloadTool \ --propertiesFile=./connection.properties \ --log_file_name=logs/manifestDownload.txt \ --outputDir=../export/$currDocId --md5 --md5lowercase \ --dID=$currDocId --RevisionSelectionMethod Latest for a in `cat ../export/$currDocId/MANIFEST.MF | awk '{ if (NR > 1) print $0}'` do nextFileName=`echo $a | awk -F\; '{print $1}'` nextDocId=`echo $a | awk -F\; '{print $2}'` nextMD5Val=`echo $a | awk -F\; '{print $3}'` echo " >> Downloading file ${nextFileName} with Document ID ${nextDocId} ..." java -jar ridc/oracle.ucm.fa_client_11.1.1.jar DownloadTool \ --propertiesFile=./connection.properties \ --log_file_name=logs/fileDownload.txt \ --outputDir=../export/$currDocId --md5 --md5lowercase \ --dID=$nextDocId --RevisionSelectionMethod Latest downldMD5=`cat logs/fileDownload.txt | grep MD5 | awk -F= '{print $2}' | awk -F\] '{print $1}'` if [ $nextMD5Val != $downldMD5 ] then echo "Error: MD5 checksum value for downloaded file is incorrect!" echo "Exiting!" fi done echo "`date +%Y%m%d%H%M%S`" > ../export/$currDocId/download.complete echo " Done! Data Extract downloaded into directory `pwd`/../export/${currDocId}!" else echo " Not downloading extract data into directory `pwd`/../export/${currDocId} as previously downloaded on `cat ../export/${currDocId}/download.complete`" fi echo "-----------------------------------------------------------------------------------------" echo "" i=`expr $i + 1` done
The underlying RIDC reports automatically the MD5 check sum when an according parameter is passed to do so. This MD5 check sum will be determined and compared with the value from Manifest file. If both values are equal the next file from list in Manifest will be downloaded. In case these check sums are differing, the script will break with an according error message.
Same as for the download script from Storage Cloud the execution time of this script may take a significant time depending on the network connections speed and amount of data. Also, here the amount of data will vary depending on scope (number of chosen offerings and data sources) and type of extraction (full vs incremental). After completion all files are located in directory ../export/< MANIFEST.MF DocID >/ and became available for customer post-processing activities: unzipping, loading and transforming (ELT) via ODI or other tools.
So far, this article described the principles of creating and downloading data extraction files. It’s obvious that with a certain amount of time the number of files and data occupation increases significantly. That will require periodically cleaning operations in both storage systems – UCM and also Storage Cloud.
Both storage systems have different methods of deleting files. It’s the RIDC Java interface for UCM and the File Transfer Manager Library for Storage Cloud.
Important: A more detailed explanation about the file deletion in UCM was shared in a later blog (U.Janke, Apr 25 2018)
Please find it here: BI Cloud Connector – Deletion of Data Extract Files on UCM
A general description for UCM file deletion operations can be found in the standard documentation. While the documentation above describes a Web Service call also an alternative solution via Java would be possible by using the RIDC library as shown below:
String ucmRidcConnectionUrl = "<value>"; String username = "<value>"; String password = "<value>"; IdcClientManager clientManager = new IdcClientManager(); IdcClient idcClient = clientManager.createClient(ucmRidcConnectionUrl); IdcContext idcContext = new IdcContext(username, password); DataBinder binder = client.createBinder(); binder.putLocal("IdcService", "DELETE_DOC"); binder.putLocal("dID", <parameter value for Document ID>); ServiceResponse response = idcClient.sendRequest(idcContext, binder);
Such a Java program can be executed inside a Shell script and take the Document ID for the UCM file as a parameter. This value exists, as shown before, inside the Manifest file and a complete deletion is possible based on these information.
In case Storage Cloud is not mounted as an NFS share we have the option to delete the files via the delete operation as provided in File Transfer Manager Library for Storage Cloud. As provided here in standard Oracle documentation the delete operation will allow file deletion according to the information in Manifest file. This can run in an own written Shell script.
This post was meant to be a basic introduction into the topic of exporting BI Analytics data from Fusion Apps in CSV file format. As mentioned on top of this article this document should help doing the first steps with BI Cloud Connector. It will allow to initiate extractions and to download the resulting compressed CSV files.
Once the first step is done you might become some ideas how to continue with this data extraction for Data Warehouses import activities or for any other usage of the exported data. Its planned to provide more information about the BI Connector Cloud tool in future blogs. Mainly about the built-in features and how they can be used for a customized data extraction.
More information can be found here:
Many thanks to Rohan Lobo and Sireesha Mandava for reviewing and all the useful feedback!