Since end of 2018 SaaS Cloud provides a new feature to track the status of a data extraction run in BI Cloud Connector (BICC). Every single run to extract data or keys will produce a JSON file with log information that can be used for monitoring purposes.
The naming style per file looks like shown below:
The Schedule ID is used as a main grouping element for extraction runs. Every extraction run, independent from being a single ad-hoc run, a single scheduled run or a series of periodically runs performed more or less frequently, has a unique Schedule ID and Request ID. Only periodically executed runs will share the same Schedule ID and using an alternate changing number as Request ID. The link between a Schedule and a Request can be mapped between the file name and the according run time information as shown in BICC UI below.
This JSON log file will be created for every extraction run - independently whether this run has resulted in an error or was successful. In case of an error there will be no extraction output (MANIFEST file and zip files) existing, while the JSON file will contain an error message pointing to the root cause of this issue. Statistic data for every single View Object in a dedicated extraction run are listed as follows:
The fields above are covering the current releases 18 and 19 – changes in future might happen by adding more fields while the removal of fields will be highly unlikely.
It is important to mention that these EXTRACT JSON files can only be related to a dedicated extraction process, but not to the output files as listed in MANIFEST.MF. Means in case of an error there won’t be any output files being created as already mentioned. The reason for this failed extraction will be found in the EXTRACT JSON log file.
In UCM the log files can be searched by a similar method as the MANIFEST files:
As mentioned in the beginning, the files in the result set can be linked to the according schedule and job ID by the naming convention.
In the following sections in this blog the downloading, cumulation and visualization of these information will be explained more in detail. Based on the data provided by these log files the data can be used for long-time BICC run-time analysis and continuous monitoring purposes. It’s a feature for end users outside of CloudOps to maintain own statistics and gives the ability to create a reporting using historical data over the time.
As the size of these EXTRACT JSON log files is presumably quite small this script has not implemented any logic to avoid downloading and overriding files again until they are removed from UCM. The script takes two parameters:
In the current script version there is a variable defined with the name subDirExtractStatus which names a sub-directory under the download location where the EXTRACT JSON log files will be stored. If the directory and sub-directory do not exist the script will create them.
Storing all the EXTRACT JSON files in one single directory is required as the sub-sequent reading, parsing and rendering of these information requires all files to be uniquely existent on one place.
#/bin/bash # Copyright (c) 2019 Oracle and/or its affiliates. All rights reserved # # ujanke: 2019/04/08 initial release # # This script searches for EXTRACT_STATUS*.JSON files in UCM. If a data extraction has been run (independently of success or error status) # these status files will exist. This feature is available since release 19 # By this script files will be downloaded into a directory <export>/EXTRACT_STATUS. # If a file has already been downloaded by a previous run this specific file won't be downloaded again! # # IMPORTANT: You have to set the environment variable UCMRidcDir in line 20 below to the correct directory containing the RIDC JAR files from # Webcenter Document Transfer Utility before you run this script! # # Two parameters are expected by this script: # 1) link to file with UCM connection properties (documented in Webcenter Document Transfer Utility docs) # 2) export directory # # export UCMRidcDir=/usr/local/lib/ridc export subDirExtractStatus=EXTRACT_STATUS if [ $# -lt 2 ] then echo "Wrong number of arguments!" echo "Usage: $0 <UCM connection properties file> <directory>" echo " <UCM connection properties file> : file with connection information to UCM by using the RIDC library" echo " <directory> : directory location for download of BICC files from UCM! Files will be extracted under sub-directory ${subDirExtractStatus}" exit 1 else propFile=$1 exportDir=$2 fi if [ ! -f $propFile ] then echo "Error: UCM connection properties file ${propFile} is non-existing ..." exit 1 fi if [ ! -d ${exportDir} ] then echo "Creating export directory ${exportDir} as non-existing ..." mkdir ${exportDir} 2>/dev/null dirCreationSuccess=`echo $?` if [ ${dirCreationSuccess} -ne 0 ] then echo "Error: Directory ${exportDir} can't be created!" exit 1 fi fi if [ ! -d ${exportDir}/${subDirExtractStatus} ] then echo "Creating export directory ${exportDir}/${subDirExtractStatus} as non-existing ..." mkdir ${exportDir}/${subDirExtractStatus} 2>/dev/null dirCreationSuccess=`echo $?` if [ ${dirCreationSuccess} -ne 0 ] then echo "Error: Directory ${exportDir}/${subDirExtractStatus} can't be created!" exit 1 fi fi java -jar ${UCMRidcDir}/oracle.ucm.fa_client_11.1.1.jar SearchTool --propertiesFile=${propFile} --ping --quiet --silent > /dev/null 2>&1 connSuccess=`echo $?` if [ ${connSuccess} -ne 0 ] then echo "Error: Connection to UCM server failed! Please check your connection information in ${propFile}" exit 1 fi java -jar ${UCMRidcDir}/oracle.ucm.fa_client_11.1.1.jar SearchTool \ --SortField=dID \ --SortOrder=asc \ --quiet --propertiesFile=${propFile} \ --log_file_name=/tmp/extractSearchResults_$$.txt \ --simpleRIDCLogging \ --dDocTitle="EXTRACT_STATUS*" \ --dSecurityGroup="OBIAImport" \ --delimiter=, \ --fields=dID,dDocName,dInDate,dOriginalName,VaultFileSize --defaultFields=false >/tmp/extractSearchResults_$$.log 2>/tmp/extractSearchResults_$$.err numRows=`cat /tmp/extractSearchResults_$$.txt | wc -l` numRecs=`cat /tmp/extractSearchResults_$$.txt | awk -v lastRow="$numRows" '{ if (FNR == lastRow) printf("%d\n", $1); }'` echo "Number of Rows: ${numRecs}" if [ ${numRecs} -eq 0 ] then echo "No data available! Finishing ..." exit 255 else echo "Running the download of existing BICC extract status log files for ${numRecs} record(s) ..." fi i=1 while [ $i -le ${numRecs} ] do currDocId=`cat /tmp/extractSearchResults_$$.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $1}'` currDocDate=`cat /tmp/extractSearchResults_$$.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $3}'` currDocFileName=`cat /tmp/extractSearchResults_$$.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $4}'` echo " >> Saving EXTRACT file with UCM Doc ID : ${currDocId}" echo " >> Saving EXTRACT file with Name : ${currDocFileName}" echo " >> Saving Files from Extraction Date : ${currDocDate}" if [ -f ${exportDir}/${subDirExtractStatus}/${currDocFileName} ] then echo " Will skip extraction of file ${exportDir}/${subDirExtractStatus}/${currDocFileName} as file is existing!" else echo " Will extract file ${currDocFileName} into directory ${exportDir}/${subDirExtractStatus}" java -jar ${UCMRidcDir}/oracle.ucm.fa_client_11.1.1.jar DownloadTool \ --propertiesFile=${propFile} \ --log_file_name=logs/extractDownload.txt \ --outputDir=${exportDir}/${subDirExtractStatus} --md5 --md5lowercase \ --dID=$currDocId --RevisionSelectionMethod Latest fi echo " Done! Data Extract Status downloading into directory ${exportDir} finished!" echo "-----------------------------------------------------------------------------------------" echo "" i=`expr $i + 1` done
The value for variable UCMRidcDir in line 20 above has probably to be changed to reflect the customer specific location of the RIDC library.
The output of this script will look like shown below when running:
In case of a failure there might be only a partial or no download of these extraction log files. In such a situation it makes sense to solve this issue first. Once done you can run the script again. Having files from a partial download existing in the target directory will not harm any functionality of BICC. It will only give an incomplete view on the statistics page as described in the sample app below that can be corrected at any time.
The following description about the further processing of downloaded EXTRACT log files must be seen as a design study or as a suggested proof of concept. The starting point for the proposed solution is the following consideration:
In case of an increased interest about alternate samples with enhanced UI handling and automation for continuous downloading of log information, we will consider another blog to cover these topics. Please provide your interest in the comment section below or sending an email to the author.
As a very alternative approach it would be also possible by uploading the data to a standard reporting and analytical tool like Oracle Analytics Cloud. For this purpose, the processed data can be transferred to such a service and no self-written UI would be necessary. This solution won’t be discussed here and would also be a topic for another article if requested.
The proposed solution uses Node.js as a frame to control the reading and cumulation of log data. These files will be opened in the directory that has been used for downloading by the previously executed shell script as described above.
Once these data from log files have been fully read and processed, some internal JS object arrays with cumulated information have been created and filled with data. The details will be explained further down in this blog.
As a final step this cumulated information will be rendered as HTML output and various reports will be assigned to multiple request handler. Finally, the Express.js app is getting started on a configurable port and the cumulated log information are accessible from a web browser.
As mentioned the intention of this article is to show a sample that uses a minimum on technical requirements. Install Node.js for the according OS as being provided on the homepage:
https://nodejs.org
Once Node.js has been installed the two following extension packages have to be installed as well:
The commands to install these extension packages are listed below:
npm install fs --save-bundle
npm install express --save-bundle
Further editing of the Javascript source code might be done best in an IDE like Apache NetBeans. To install this development environment the software can be downloaded from this URL:
https://netbeans.apache.org/download
Some screen shots of the sample code as opened in IDE for this program is shown in screen shots below:
The source code for Node.js script processBICCLogs.js can be downloaded via this link. It is not necessary to create a Node.js project in NetBeans for some simple editing – just opening the file in the IDE would be sufficient. An exception would be the attempt to add some larger extensions to the current code like UI enhancements. Migration to Oracle JET etc. Here it would make sense to setup an according project and to embed the existing source code into this solution.
Same as the download script this Javascript code runs on command line. It takes two parameters:
If no arguments are passed the message below will appear with some usage information:
$ node processBICCLogs.js Error: insufficient number of arguments! Usage: node processBICCLogs.js <File Path with EXTRACT JSON Log Files> [port=value between 1024 and 65554] [log=y|Y] port=value -> web port - default value = 3000 [optional] log=y|Y -> console log y|Y - default value = no [optional]
If flag for log information was passed to the program the content of internal arrays will be dumped to standard output. This can be taken as a control of the data being processed internal before rendering as HTML reports.
As shown in screenshots below, the script output will either be very minimal (1) or quite massively (2) by providing details of BICC log data having been processed. Its important to mention that the script will keep running in same shell until interrupted. To take back control in same shell it must be put into the background. For a continuous processing of these log statistics the Shell Script for downloading the log files and the Javascript code for processing them can be scheduled for a restart, wrapped by another script (not part of this blog).
Once started, the HTML reports can be opened in a web browser by opening the following URL:
https://<localhost or hostname>:<port number>
As mentioned before the HTML formatting has been kept as simple as possible. The web page contains an HTML table on top of the page and an iFrame to hold the report data.
The entry page shows by default the content being activated when clicking the most left button called Statistics Summary providing the cumulated statistics for this collection of log information. These columns in report give details about the time this report has been created, the Oldest/Newest Extraction Date, the Total Number of extracted rows in BICC jobs, Number of Schedules, Number of Jobs, Number of Processed Files and Number of Processed VO’s. A sample of such an output is shown below.
Next button in line us called Extracts By Run Date. This report provides BICC extract statistics grouped by the Run Date. This field contains the value of Run Date including the fraction of a second, but the grouping here uses the value without the fraction of a second and shows all VO’s being extracted at the same time down to a second. The timestamp including the fraction of information is provided in another column named Helveticatamp. In screenshot below the output of this report is shown.
The third button in line is named Extracts by VO and uses the VO names as a grouping element. This report contains similar columns as described for the previous report.
Screenshot below shows the report activated by the button named Extracts by Schedule. This report uses a two-level grouping: first level group is the Schedule ID of the BICC extract and the second level group is the according Request ID. Every extraction run is assigned to exactly one Schedule ID and one Request ID. If this is a single run (immediate, scheduled, but not recurring) there will be only one record reported. In case of recurring jobs, the report will contain multiple records for Request ID per Schedule ID.
One report for failed BICC extracts is using the VO Name as a grouping element and will appear after clicking the button Failed by VO as shown in screenshot below. In opposite to the other reports here the Error Message is shown. We can presume this field will be filled with significant information about a failed export run.
The rightest button named Failed by Date activates the listing of a report showing BICC extract runs with a status FAILURE ordered by Run Date. Such a sample output is shown in screenshot below.
All these data shown in these various reports are of a static nature. It means the values will be read from the files that have been downloaded from UCM before. To refresh these data, another download of the log files from UCM and a restart of this script is required. As mentioned above such a task could be scheduled on a regular base either on operating system level or by any tool. Alternately the Javascript code could be extended to initiate the download script and to refresh the data as well.
In this section the code design and structure of Node.js program processBICCLogs.js will be explained more in detail. In absence of an in-line documentation such a description should help to adopt the source code for some own written extensions or modifications if required.
The entire functionality and logic to handle the BICC log data and to format the HTML output is encapsulated in a class called extractStats. This class contains the following structures to hold the data:
These member variables above are static after initialization and won’t be refreshed in the current code. Such an enhancement would require some extensions which have not been implemented here in order to keep the code in a compact format.
The following methods are used as central services inside this class:
The following methods starting with the term fill in their name are used to feed the specific arrays used for reporting with data:
Finally, the existing methods starting with the term render… are used to prepare and format the HTML output for each data array:
The functionality of Express.js to start the web server by using the formatted HTML output is described in next section.
The method visualizeStatsData() in class extractStats is responsible to prepare the request handlers and to execute the web server as provided by Express.js.
Here are some explanations to the code essentials provided below. Every single of theses six reports has its own request handler via app.get() call. For the index/root page (app.get(‘/’,…)) the HTML code was added inline the method visualizeStatsData(). On top of the page an HTML table contains HTML forms with buttons each to call the specific reports. Beneath that part an iFrame exists as a flexible region that will be used as a target for the specific HTML reports as being registered via the request handlers. Some HTML formatting is encapsulated in member variables rootHtmlHeader and htmlHeader inside class extractStats.
… app.get('/reportStats', (req, res) => res.send( this.htmlHeader + reportStatsTable + "</body></html>")); app.get('/byDate', (req, res) => res.send( this.htmlHeader + voDateTable + "</body></html>")); app.get('/byName', (req, res) => res.send( this.htmlHeader + voNameTable + "</body></html>")); app.get('/bySchedID', (req, res) => res.send( this.htmlHeader + voScheduleTable + "/body></html>")); app.get('/byFailStatusVO', (req, res) => res.send( this.htmlHeader + voFailedByVOTable + "</body></html>")); app.get('/byFailStatusDate', (req, res) => res.send( this.htmlHeader + voFailedByDateTable + "</body></html>")); app.get('/', (req, res) => res.send( "<html>" + this.rootHtmlHeader + "<body>" + "<div class=\"submit\">" + "<table>" + " <tr>" + " <td>" + " <form>" + " <button type=\"button\" class=\"submit\" onclick=\"showStats('http://localhost:" + appPort + "/reportStats')\">Statistics Summary</button>" + " </form>" + " </td>" + " <td>" + " <form>" + " <button type=\"button\" class=\"submit\" onClick=\"javascript:showStats('http://localhost:" + appPort + "/byDate')\">Extracts by Run Date</button>" + " </form>" + " </td>" + " <td>" + " <form>" + " <button type=\"button\" class=\"submit\" onclick=\"showStats('http://localhost:" + appPort + "/byName')\">Extracts by VO</button>" + " </form>" + " </td>" + " <td>" + " <form>" + " <button type=\"button\" class=\"submit\" onclick=\"showStats('http://localhost:" + appPort + "/bySchedID')\">Extracts by Schedule</button>" + " </form>" + " </td>" + " <td>" + " <form>" + " <button type=\"button\" class=\"submit\" onClick=\"javascript:showStats('http://localhost:" + appPort + "/byFailStatusVO')\">Failed by VO</button>" + " </form>" + " </td>" + " <td>" + " <form>" + " <button type=\"button\" class=\"submit\" onClick=\"javascript:showStats('http://localhost:" + appPort + "/byFailStatusDate')\">Failed by Date</button>" + " </form>" + " </td>" + " </tr>" + " </table>" + "</div>" + " <hr>" + " <iframe id=\"if1\" style=\"border: 0; width:100%;\" src=\"http://localhost:" + appPort + "/reportStats\">" + " </iframe>" + "</body></html>")); …
In code above there is a static values localhost used as host name for the URL. This must be seen as a pragmatic restriction to keep the code simple and means that this solution runs currently only on the same host where the Node.js program is executed. The challenge for adopters of this code is to add some logic to add a variable hostname that will be ideally passed as a command line argument to the program. Next section below describes the usage of main() function – such a parameter hostname can be added in a straight forward fashion.
Alternately there are other options to make the output running across a network like hard-coding a hostname in the program (not recommended!) or reading the from a config file or similar. As said: only by the reason to keep the code as compact as possible this logic hadn’t been added.
This code uses a main() function to initiate the extractStats object and to call the methods
As Node.js works in an asynchronous manner, the entire processing of log file data must be kept within the loop for reading the EXTRACT.JSON files.
function main(myArgs) { var debugFlag = new Boolean; var appPort = 3000; var arrNum = 0; var jsonInput = new Array(); debugFlag = false; if (myArgs.length < 3) { console.log("Error: insufficient number of arguments!"); console.log( "Usage: " + myArgs[0] + " " + myArgs[1] + " <File Path with EXTRACT JSON Log Files> [port=value between 1024 and 65554] [log=y|Y]"); console.log( " port=value -> web port - default value = 3000 [optional]"); console.log( " log=y|Y -> console log y|Y - default value = no [optional]"); process.exit(); } var filePath = myArgs[2]; for(x = 3; x < myArgs.length; x++){ var equalPos = myArgs[x].toString().indexOf("=", 0); var paramName, paramVal; if(equalPos > 0) { paramName = myArgs[x].toString().substring(0, equalPos); paramVal = myArgs[x].toString().substring(equalPos + 1, myArgs[x].toString().length); if (paramName === "port") { if (paramVal.match(/^[0-9]+$/) !== null ) { if( (paramVal < 1000) || (paramVal > 65554) ) console.log("Hint: parameter <port> has an invalid value=" + paramVal + " - using default value 3000 instead!"); else appPort = paramVal; }; } if(paramName === "log") { if(paramVal.toString().toUpperCase() === 'Y') debugFlag = true; else debugFlag = false; } } } fs.readdir(filePath, function (err, files) { if (err) { return console.log('Unable to scan directory: ' + err); } files.forEach(function (file) { jsonInput.length = arrNum + 1; fullFileName = filePath.toString() + "/" + file.toString(); jsonInput[arrNum] = JSON.parse(fs.readFileSync(fullFileName, 'utf8')); var recVal = jsonInput[arrNum]; extractStats.handleData(recVal, arrNum, file.toString()); arrNum++; }); extractStats.createArrays(); if(debugFlag) extractStats.logData(); extractStats.visualizeStatsData(appPort); }); } main(process.argv);
Calling the main() function and passing the command line arguments will initiate the creation of extractStats object. There is only one mandatory argument required with the directory name where the EXTRACT.JSON files can be found. Another parameter is the web port which will be handled as optional. If no argument has been passed the default value of 3000 is taken. Finally, a third argument might be passed to enable the output of log information. Its also an optional parameter and log output is disabled by default. Some code lines in main() will validate these parameters at the beginning and before further processing to initiate the extractStats object. As stated before, the extension for a dynamic handling of hostname could be added by passing one more command line argument with an according value.
It is the intention of this article to introduce the audience with some recent features in BICC to provide dedicated monitoring information per extraction job. The focus for the solution and sample code is clearly set on the reading of files and processing of log data. The visualization of data has been kept as simple as possible with some restrictions on the UI flexibility.
The provided code for the download script and the processing is runnable and can be used out of the box with the given features. On other hand this sample can be adopted for further modifications and enhancements by the readers. As mentioned before this can be done by migrating the tech stack to an Oracle solution (Oracle JET, OKE etc) or by adding enhanced features like a continuous downloading and processing of EXTRACT.JSON files.