X

Best Practices from Oracle Development's A‑Team

BI Cloud Connector – Processing Extraction Log Files

Introduction EXTRACT JSON files

Since end of 2018 SaaS Cloud provides a new feature to track the status of a data extraction run in BI Cloud Connector (BICC). Every single run to extract data or keys will produce a JSON file with log information that can be used for monitoring purposes.

The naming style per file looks like shown below:

  • EXTRACT_STATUS_DATA_SCHEDULE_<Schedule ID>_REQUEST_<Request ID>.JSON
    • BICC Extract log file for VO data if been chosen in run definition
    • Sample: EXTRACT_STATUS_DATA_SCHEDULE_1664130_REQUEST_1698030.JSON
  • EXTRACT_STATUS_PRIMARY_KEYS_SCHEDULE_<Schedule ID>_REQUEST_<Request ID>.JSON
    • BICC Extract for VO Primary Keys being used if this extract option has been activated in the definition of a run
    • Sample: EXTRACT_STATUS_PRIMARY_KEYS_SCHEDULE_1492486_REQUEST_1492486.JSON

The Schedule ID is used as a main grouping element for extraction runs. Every extraction run, independent from being a single ad-hoc run, a single scheduled run or a series of periodically runs performed more or less frequently, has a unique Schedule ID and Request ID. Only periodically executed runs will share the same Schedule ID and using an alternate changing number as Request ID. The link between a Schedule and a Request can be mapped between the file name and the according run time information as shown in BICC UI below.

This JSON log file will be created for every extraction run - independently whether this run has resulted in an error or was successful. In case of an error there will be no extraction output (MANIFEST file and zip files) existing, while the JSON file will contain an error message pointing to the root cause of this issue. Statistic data for every single View Object in a dedicated extraction run are listed as follows:

  • name: View Object (VO) Name
  • status: Extraction status for this VO – either SUCCESS or FAILURE
  • errorMessage: Detailed description of the issue in case of a FAILURE
  • runDate: Helveticatamp for run date of this extraction – down to the fraction of a second
  • queryDurationInSec: Time in seconds to perform the query to obtain the VO data
  • extractDurationInSec: Time in seconds to read the data from DB
  • uploadDurationInSec: Time in seconds to upload the output file to storage server
  • totalDurationInSec: Total amount of time to query, extract and upload the VO data
  • rowCount: Number of rows for this VO

The fields above are covering the current releases 18 and 19 – changes in future might happen by adding more fields while the removal of fields will be highly unlikely.

It is important to mention that these EXTRACT JSON files can only be related to a dedicated extraction process, but not to the output files as listed in MANIFEST.MF. Means in case of an error there won’t be any output files being created as already mentioned. The reason for this failed extraction will be found in the EXTRACT JSON log file.

In UCM the log files can be searched by a similar method as the MANIFEST files:

  • Security Group: OBIA*
  • Title: EXTRACT*

As mentioned in the beginning, the files in the result set can be linked to the according schedule and job ID by the naming convention.

In the following sections in this blog the downloading, cumulation and visualization of these information will be explained more in detail. Based on the data provided by these log files the data can be used for long-time BICC run-time analysis and continuous monitoring purposes. It’s a feature for end users outside of CloudOps to maintain own statistics and gives the ability to create a reporting using historical data over the time.

Downloading of EXTRACT JSON Log Files

This blog describes the downloading of the EXTRACT JSON log files from UCM in a similar manner as previously being blogged in some other A-Team Chronicles articles: Downloading BICC extracts from UCM and Deletion of BICC Data Extract Files on UCM. This script is also using the Webcenter Document Transfer Utility as used in the other samples. File Storage on external servers won’t be handled here as the efforts to download files could be less compared to UCM when using standard protocols like NFS.

Downloading files from UCM requires some more attention for two reasons:

  1. UCM is the default location in BICC and doesn’t require additional licenses
  2. It is a system to store documents and need some special coding for downloading the extract log files

The published script here is very similar to the other scripts being published in former blogs. This is a Shell Script running in bash as probably most of these tasks will run on Linux servers. An automation of this script can be implemented by using standard Linux features like cron if not using higher sophisticated tools for scheduling jobs. For users who’d like to run alternate script languages like Python there should be a possibility for a migration of script logic.

Script Usage

As the size of these EXTRACT JSON log files is presumably quite small this script has not implemented any logic to avoid downloading and overriding files again until they are removed from UCM. The script takes two parameters:

  1. Location and name of the connection.properties file
  2. Top directory as download location

In the current script version there is a variable defined with the name subDirExtractStatus which names a sub-directory under the download location where the EXTRACT JSON log files will be stored. If the directory and sub-directory do not exist the  script will create them.

Storing all the EXTRACT JSON files in one single directory is required as the sub-sequent reading, parsing and rendering of these information requires all files to be uniquely existent on one place.

#/bin/bash
# Copyright (c) 2019 Oracle and/or its affiliates. All rights reserved
#
# ujanke: 2019/04/08 initial release
#
# This script searches for EXTRACT_STATUS*.JSON files in UCM. If a data extraction has been run (independently of success or error status)
# these status files will exist. This feature is available since release 19
# By this script files will be downloaded into a directory <export>/EXTRACT_STATUS.
# If a file has already been downloaded by a previous run this specific file won't be downloaded again!
#
# IMPORTANT: You have to set the environment variable UCMRidcDir in line 20 below to the correct directory containing the RIDC JAR files from
# Webcenter Document Transfer Utility before you run this script!
#
# Two parameters are expected by this script:
#       1) link to file with UCM connection properties (documented in Webcenter Document Transfer Utility docs)
#       2) export directory
#
#

export UCMRidcDir=/usr/local/lib/ridc
export subDirExtractStatus=EXTRACT_STATUS

if [ $# -lt 2 ]
then
        echo "Wrong number of arguments!"
        echo "Usage: $0 <UCM connection properties file> <directory>"
        echo "       <UCM connection properties file>  :  file with connection information to UCM by using the RIDC library"
        echo "       <directory>                :  directory location for download of BICC files from UCM! Files will be extracted under sub-directory ${subDirExtractStatus}"
        exit 1
else
        propFile=$1
        exportDir=$2
fi

if [ ! -f $propFile ]
then
       echo "Error: UCM connection properties file ${propFile} is non-existing ..."
       exit 1
fi

if [ ! -d ${exportDir} ]
then
       echo "Creating export directory ${exportDir} as non-existing ..."
       mkdir ${exportDir} 2>/dev/null
       dirCreationSuccess=`echo $?`

       if [ ${dirCreationSuccess} -ne 0 ]
       then
              echo "Error: Directory ${exportDir} can't be created!"
              exit 1
       fi
fi

if [ ! -d ${exportDir}/${subDirExtractStatus} ]
then
       echo "Creating export directory ${exportDir}/${subDirExtractStatus} as non-existing ..."
       mkdir ${exportDir}/${subDirExtractStatus} 2>/dev/null
       dirCreationSuccess=`echo $?`

       if [ ${dirCreationSuccess} -ne 0 ]
       then
              echo "Error: Directory ${exportDir}/${subDirExtractStatus} can't be created!"
              exit 1
       fi
fi

java -jar ${UCMRidcDir}/oracle.ucm.fa_client_11.1.1.jar SearchTool --propertiesFile=${propFile} --ping --quiet --silent > /dev/null 2>&1

connSuccess=`echo $?`

if [ ${connSuccess} -ne 0 ]
then
       echo "Error: Connection to UCM server failed! Please check your connection information in ${propFile}"
       exit 1
fi

java -jar ${UCMRidcDir}/oracle.ucm.fa_client_11.1.1.jar SearchTool \
--SortField=dID \
--SortOrder=asc \
--quiet --propertiesFile=${propFile} \
--log_file_name=/tmp/extractSearchResults_$$.txt \
--simpleRIDCLogging \
--dDocTitle="EXTRACT_STATUS*" \
--dSecurityGroup="OBIAImport" \
--delimiter=, \
--fields=dID,dDocName,dInDate,dOriginalName,VaultFileSize --defaultFields=false >/tmp/extractSearchResults_$$.log 2>/tmp/extractSearchResults_$$.err

numRows=`cat /tmp/extractSearchResults_$$.txt | wc -l`
numRecs=`cat /tmp/extractSearchResults_$$.txt | awk -v lastRow="$numRows" '{ if (FNR == lastRow) printf("%d\n", $1); }'`

echo "Number of Rows: ${numRecs}"

if [ ${numRecs} -eq 0 ]
then
    echo "No data available! Finishing ..."
    exit 255
else
    echo "Running the download of existing BICC extract status log files for ${numRecs} record(s) ..."
fi

i=1

while [ $i -le ${numRecs} ]
do
    currDocId=`cat /tmp/extractSearchResults_$$.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $1}'`
    currDocDate=`cat /tmp/extractSearchResults_$$.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $3}'`
    currDocFileName=`cat /tmp/extractSearchResults_$$.txt | awk -v "current=$i" '{ if (NR == (2 + current)) print $0 }' | awk -F, '{print $4}'`

    echo "   >> Saving EXTRACT file with UCM Doc ID    : ${currDocId}"
    echo "   >> Saving EXTRACT file with Name          : ${currDocFileName}"
    echo "   >> Saving Files from Extraction Date      : ${currDocDate}"

    if [ -f ${exportDir}/${subDirExtractStatus}/${currDocFileName} ]
    then
        echo "     Will skip extraction of file ${exportDir}/${subDirExtractStatus}/${currDocFileName} as file is existing!"
    else
        echo "     Will extract file ${currDocFileName} into directory ${exportDir}/${subDirExtractStatus}"
        java -jar ${UCMRidcDir}/oracle.ucm.fa_client_11.1.1.jar DownloadTool \
                --propertiesFile=${propFile} \
                --log_file_name=logs/extractDownload.txt \
                --outputDir=${exportDir}/${subDirExtractStatus} --md5 --md5lowercase \
                --dID=$currDocId --RevisionSelectionMethod Latest
    fi

    echo "   Done! Data Extract Status downloading into directory ${exportDir} finished!"
    echo "-----------------------------------------------------------------------------------------"
    echo ""

    i=`expr $i + 1`
done

The value for variable UCMRidcDir in line 20 above has probably to be changed to reflect the customer specific location  of the RIDC library.

The output of this script will look like shown below when running:

In case of a failure there might be only a partial or no download of these extraction log files. In such a situation it makes sense to solve this issue first. Once done you can run the script again. Having files from a partial download existing in the target directory will not harm any functionality of BICC. It will only give an incomplete view on the statistics page as described in the sample app below that can be corrected at any time.

Processing and Visualising of Log Information

The following description about the further processing of downloaded EXTRACT log files must be seen as a design study or as a suggested proof of concept. The starting point for the proposed solution is the following consideration:

  • As the file format comes as JSON a further processing via Javascript (JS) would be most appropriate
  • The app is self-containing and will demonstrate how to read the EXTRACT log files from filesystem as previously downloaded
  • The visualization part has been kept as simple as possible by
    • Running a local web server that renders the processed data within a view reports to be opened in a browser
    • keep the UI as simple and use HTML in a very limited fashion inline the code – there was no usage intended of platforms like React.js, Knockout.js, Webpack.js, Oracle JET, Oracle VBCS or any other available solution
    • Keep the dependencies low which will simplify the installation and support focusing on the essentials as this is meant to be the log file reading and evaluation of information containing
    • Embed entire functionality into one single Javascript file and object for better readability
    • The UI provides no advanced usability other than showing a simple HTML report – searching and printing capabilities can be used within the browser
    • In UI also, no cross-linking between values was implemented due to the intention to keep the code as readable as possible
  • The provided solution can be used running as a micro-service and wrapped by a batch job scheduler that will restart and initiate the reloading and re-rendering the information
  • Alternately the available source code can be downloaded, improved and migrated to any other platform as needed including Oracle solutions, such as Oracle JET etc

In case of an increased interest about alternate samples with enhanced UI handling and automation for continuous downloading of log information, we will consider another blog to cover these topics. Please provide your interest in the comment section below or sending an email to the author.

As a very alternative approach it would be also possible by uploading the data to a standard reporting and analytical tool like Oracle Analytics Cloud. For this purpose, the processed data can be transferred to such a service and no self-written UI would be necessary. This solution won’t be discussed here and would also be a topic for another article if requested.

Architectural sample design

The proposed solution uses Node.js as a frame to control the reading and cumulation of log data. These files will be opened in the directory that has been used for downloading by the previously executed shell script as described above.

Once these data from log files have been fully read and processed, some internal JS object arrays with cumulated information have been created and filled with data. The details will be explained further down in this blog.

As a final step this cumulated information will be rendered as HTML output and various reports will be assigned to multiple request handler. Finally, the Express.js app is getting started on a configurable port and the cumulated log information are accessible from a web browser.

Installation of the Web Components

As mentioned the intention of this article is to show a sample that uses a minimum on technical requirements. Install Node.js for the according OS as being provided on the homepage:

https://nodejs.org

Once Node.js has been installed the two following extension packages have to be installed as well:

  1. fs as a package to access the file system for reading the log files
  2. express as a fast and minimalist web framework for Node

The commands to install these extension packages are listed below:

npm install fs --save-bundle

npm install express --save-bundle

Further editing of the Javascript source code might be done best in an IDE like Apache NetBeans. To install this development environment the software can be downloaded from this URL:

https://netbeans.apache.org/download

Some screen shots of the sample code as opened in IDE for this program is shown in screen shots below:

The source code for Node.js script processBICCLogs.js can be downloaded via this link. It is not necessary to create a Node.js project in NetBeans for some simple editing – just opening the file in the IDE would be sufficient. An exception would be the attempt to add some larger extensions to the current code like UI enhancements. Migration to Oracle JET etc. Here it would make sense to setup an according project and to embed the existing source code into this solution.

Using the solution

Same as the download script this Javascript code runs on command line. It takes two parameters:

  1. the directory where the EXTRACT.JSON files reside
  2. the network port the Express.js Web Server is listening to – default value is 3000

If no arguments are passed the message below will appear with some usage information:

$ node processBICCLogs.js
Error: insufficient number of arguments!
Usage: node processBICCLogs.js <File Path with EXTRACT JSON Log Files> [port=value between 1024 and 65554] [log=y|Y]
	port=value  ->  web port - default value = 3000 [optional] 
	log=y|Y     ->  console log y|Y - default value = no [optional] 

If flag for log information was passed to the program the content of internal arrays will be dumped to standard output. This can be taken as a control of the data being processed internal before rendering as HTML reports.

As shown in screenshots below, the script output will either be very minimal (1) or quite massively (2) by providing details of BICC log data having been processed. Its important to mention that the script will keep running in same shell until interrupted. To take back control in same shell it must be put into the background. For a continuous processing of these log statistics the Shell Script for downloading the log files and the Javascript code for processing them can be scheduled for a restart, wrapped by another script (not part of this blog).

Once started, the HTML reports can be opened in a web browser by opening the following URL:

https://<localhost or hostname>:<port number>

As mentioned before the HTML formatting has been kept as simple as possible. The web page contains an HTML table on top of the page and an iFrame to hold the report data.

The entry page shows by default the content being activated when clicking the most left button called Statistics Summary providing the cumulated statistics for this collection of log information. These columns in report give details about the time this report has been created, the Oldest/Newest Extraction Date, the Total Number of extracted rows in BICC jobs, Number of Schedules, Number of Jobs, Number of Processed Files and Number of Processed VO’s. A sample of such an output is shown below.

Next button in line us called Extracts By Run Date. This report provides BICC extract statistics grouped by the Run Date. This field contains the value of Run Date including the fraction of a second, but the grouping here uses the value without the fraction of a second and shows all VO’s being extracted at the same time down to a second. The timestamp including the fraction of information is provided in another column named Helveticatamp. In screenshot below the output of this report is shown.

The third button in line is named Extracts by VO and uses the VO names as a grouping element. This report contains similar columns as described for the previous report.

Screenshot below shows the report activated by the button named Extracts by Schedule. This report uses a two-level grouping: first level group is the Schedule ID of the BICC extract and the second level group is the according Request ID. Every extraction run is assigned to exactly one Schedule ID and one Request ID. If this is a single run (immediate, scheduled, but not recurring) there will be only one record reported. In case of recurring jobs, the report will contain multiple records for Request ID per Schedule ID.

One report for failed BICC extracts is using the VO Name as a grouping element and will appear after clicking the button Failed by VO as shown in screenshot below. In opposite to the other reports here the Error Message is shown. We can presume this field will be filled with significant information about a failed export run.

The rightest button named Failed by Date activates the listing of a report showing BICC extract runs with a status FAILURE ordered by Run Date. Such a sample output is shown in screenshot below.

All these data shown in these various reports are of a static nature. It means the values will be read from the files that have been downloaded from UCM before. To refresh these data, another download of the log files from UCM and a restart of this script is required. As mentioned above such a task could be scheduled on a regular base either on operating system level or by any tool. Alternately the Javascript code could be extended to initiate the download script and to refresh the data as well.

Code Design

In this section the code design and structure of Node.js program processBICCLogs.js will be explained more in detail. In absence of an in-line documentation such a description should help to adopt the source code for some own written extensions or modifications if required.

Class extractStats

The entire functionality and logic to handle the BICC log data and to format the HTML output is encapsulated in a class called extractStats. This class contains the following structures to hold the data:

  • voArr
    • Array that keeps all EXTRACT JSON entries as flattened data once being read from the according file
  • voNameArr
    • Array that holds the cumulated log data grouped by the VO Names
    • Very first entry in Array contains the values shown later in HTML report as table header
  • voRunDateArr
    • Array that holds the cumulated log data grouped by Run Date (Year to Second)
    • Very first entry in Array contains the values shown later in HTML report as table header
  • voFailedArrByDate
    • Array that holds the log data for failed extraction runs grouped by Run Date
    • Very first entry in Array contains the values shown later in HTML report as table header
  • voFailedArrByVO
    • Array that holds the log data for failed extraction runs grouped by VO Name
    • Very first entry in Array contains the values shown later in HTML report as table header
  • voJobIDArr
    • Array that uses a grouping of cumulated log data by Schedule ID on first and Request ID on second level
    • Very first entry in Array contains the values shown later in HTML report as table header
  • reportStats
    • cumulated statistics about all entries being read from the EXTRACT.JSON files
    • Very first entry in Array contains the values shown later in HTML report as table header

These member variables above are static after initialization and won’t be refreshed in the current code. Such an enhancement would require some extensions which have not been implemented here in order to keep the code in a compact format.

The following methods are used as central services inside this class:

  • createArrays()
    • this is a central member function that iterates through the member variable voArr and calls the according fill… member functions to add cumulated values into the specific arrays including the specific grouping by various parameters
  • getScheduleID()
    • this function takes the specific EXTRACT-JSON filename and determines the Schedule ID from that name
  • getRequestID()
    • this function takes the specific EXTRACT-JSON filename and determines the Request ID from that name
  • handleData()
    • this is the central member function that determines the dedicated values from the JSON structures as provided per every single file and adds entries to the central member variable voArr
  • logData()
    • this member function will dump all data structures to console output if log flag has been set to Yes when executing the program

The following methods starting with the term fill in their name are used to feed the specific arrays used for reporting with data:

  • fillBatchJobArr()
    • this method cumulates and copies the data from voArr into voJobIDArr including a grouping by Schedule ID and Request ID
  • fillFailedArrays()
    • this method cumulates and copies the data from voArr into voFailedArrByDate and voFailedArrByVO including the according grouping
  • fillReportStatsArr()
    • this method cumulates the data from voArr into the statistic overview array reportStats
  • fillRunDateArr()
    • this method cumulates and copies the data from voArr into voRunDateArr including the grouping by Run Date
  • fillVONameArr()
    • this method cumulates and copies the data from voArr into voRunDateArr including the grouping by VO Name

Finally, the existing methods starting with the term render… are used to prepare and format the HTML output for each data array:

  • renderBatchJobArr()
    • creates a single string that contains the formatted output containing the HTML table from data in voJobIDArr
  • renderFailedArrByDate()
    • creates a single string that contains the formatted output containing the HTML table from data in voFailedArrByDate
  • renderFailedArrByVO()
    • creates a single string that contains the formatted output containing the HTML table from data in voFailedArrByVO
  • renderReportStats()
    • creates a single string that contains the formatted output containing the HTML table from data in reportStats
  • renderRunDateArr()
    • creates a single string that contains the formatted output containing the HTML table from data in voRunDateArr
  • renderVONameArr()
    • creates a single string that contains the formatted output containing the HTML table from data in voNameArr

The functionality of Express.js to start the web server by using the formatted HTML output is described in next section.

Express.js usage

The method visualizeStatsData() in class extractStats is responsible to prepare the request handlers and to execute the web server as provided by Express.js.

Here are some explanations to the code essentials provided below. Every single of theses six reports has its own request handler via app.get() call. For the index/root page (app.get(‘/’,…)) the HTML code was added inline the method visualizeStatsData(). On top of the page an HTML table contains HTML forms with buttons each to call the specific reports. Beneath that part an iFrame exists as a flexible region that will be used as a target for the specific HTML reports as being registered via the request handlers. Some HTML formatting is encapsulated in member variables rootHtmlHeader and htmlHeader inside class extractStats.

…
app.get('/reportStats', (req, res) => res.send( this.htmlHeader + reportStatsTable + "</body></html>"));
app.get('/byDate', (req, res) => res.send( this.htmlHeader + voDateTable + "</body></html>"));
app.get('/byName', (req, res) => res.send(  this.htmlHeader + voNameTable + "</body></html>"));
app.get('/bySchedID', (req, res) => res.send(  this.htmlHeader + voScheduleTable + "/body></html>"));
app.get('/byFailStatusVO', (req, res) => res.send( this.htmlHeader + voFailedByVOTable + "</body></html>"));
app.get('/byFailStatusDate', (req, res) => res.send( this.htmlHeader + voFailedByDateTable + "</body></html>"));
app.get('/', (req, res) => res.send( "<html>" + this.rootHtmlHeader + "<body>" +
       "<div class=\"submit\">" +
       "<table>" +
       "             <tr>" +
       "                    <td>" +
       "                           <form>" +
       "                                   <button type=\"button\" class=\"submit\" onclick=\"showStats('http://localhost:" + appPort + "/reportStats')\">Statistics Summary</button>" +
       "                            </form>" +
       "                     </td>" +
       "                     <td>" +
       "                            <form>" +
       "                                   <button type=\"button\" class=\"submit\" onClick=\"javascript:showStats('http://localhost:" + appPort + "/byDate')\">Extracts by Run Date</button>" +
       "                            </form>" +
       "                     </td>" +
       "                     <td>" +
       "                           <form>" +
       "                                   <button type=\"button\" class=\"submit\" onclick=\"showStats('http://localhost:" + appPort + "/byName')\">Extracts by VO</button>" +
       "                            </form>" +
       "                     </td>" +
       "                     <td>" +
       "                           <form>" +
       "                                   <button type=\"button\" class=\"submit\" onclick=\"showStats('http://localhost:" + appPort + "/bySchedID')\">Extracts by Schedule</button>" +
       "                            </form>" +
       "                     </td>" +
       "                     <td>" +
       "                            <form>" +
       "                                   <button type=\"button\" class=\"submit\" onClick=\"javascript:showStats('http://localhost:" + appPort + "/byFailStatusVO')\">Failed by VO</button>" +
       "                            </form>" +
       "                     </td>" +
       "                     <td>" +
       "                            <form>" +
       "                                   <button type=\"button\" class=\"submit\" onClick=\"javascript:showStats('http://localhost:" + appPort + "/byFailStatusDate')\">Failed by Date</button>" +
       "                            </form>" +
       "                     </td>" +
       "             </tr>" +
       "      </table>" +
       "</div>" +
       "      <hr>" +
       "      <iframe id=\"if1\" style=\"border: 0; width:100%;\" src=\"http://localhost:" + appPort + "/reportStats\">" +
       "      </iframe>" +
       "</body></html>"));
…

In code above there is a static values localhost used as host name for the URL. This must be seen as a pragmatic restriction to keep the code simple and means that this solution runs currently only on the same host where the Node.js program is executed. The challenge for adopters of this code is to add some logic to add a variable hostname that will be ideally passed as a command line argument to the program. Next section below describes the usage of main() function – such a parameter hostname can be added in a straight forward fashion.

Alternately there are other options to make the output running across a network like hard-coding a hostname in the program (not recommended!) or reading the from a config file or similar. As said: only by the reason to keep the code as compact as possible this logic hadn’t been added.

Usage of main() function

This code uses a main() function to initiate the extractStats object and to call the methods

  • handleData() – initializing the arrays in extractStats as described above
  • logData() – optional if log flag had been set to Yes
  • visualizeStatsData() – formatting the HTML output and starting the Express.js based web server as described above

As Node.js works in an asynchronous manner, the entire processing of log file data must be kept within the loop for reading the EXTRACT.JSON files.

function main(myArgs)
{
    var debugFlag = new Boolean;
    var appPort = 3000;
    var arrNum = 0;
    var jsonInput = new Array();
   
    debugFlag = false;

    if (myArgs.length < 3)
    {
        console.log("Error: insufficient number of arguments!");
        console.log( "Usage: " + myArgs[0] + " " + myArgs[1] + " <File Path with EXTRACT JSON Log Files> [port=value between 1024 and 65554] [log=y|Y]");
        console.log( "          port=value  -> web port - default value = 3000 [optional]");
        console.log( "          log=y|Y     ->  console log y|Y - default value = no [optional]");
        process.exit();
    }
       
    var filePath = myArgs[2];

    for(x = 3; x < myArgs.length; x++){
        var equalPos = myArgs[x].toString().indexOf("=", 0);
        var paramName, paramVal;
        if(equalPos > 0)
        {
            paramName = myArgs[x].toString().substring(0, equalPos);
            paramVal = myArgs[x].toString().substring(equalPos + 1, myArgs[x].toString().length);

            if (paramName === "port")
            {
                if (paramVal.match(/^[0-9]+$/) !== null )
                {
                    if( (paramVal < 1000) || (paramVal > 65554) )
                        console.log("Hint: parameter <port> has an invalid value=" + paramVal + " - using default value 3000 instead!");
                    else
                        appPort = paramVal;
                };
            }

            if(paramName === "log")
            {
                if(paramVal.toString().toUpperCase() === 'Y')
                    debugFlag = true;
                else
                    debugFlag = false;
            }
        }
    }

    fs.readdir(filePath, function (err, files) {
        if (err) {
            return console.log('Unable to scan directory: ' + err);
        }

        files.forEach(function (file) {
            jsonInput.length =  arrNum + 1;
            fullFileName = filePath.toString() + "/" + file.toString();
            jsonInput[arrNum] = JSON.parse(fs.readFileSync(fullFileName, 'utf8'));
            var recVal = jsonInput[arrNum];
            extractStats.handleData(recVal, arrNum, file.toString());
            arrNum++;
        });

        extractStats.createArrays();

        if(debugFlag)
            extractStats.logData();

        extractStats.visualizeStatsData(appPort);
    });
}

main(process.argv);

Calling the main() function and passing the command line arguments will initiate the creation of extractStats object. There is only one mandatory argument required with the directory name where the EXTRACT.JSON files can be found. Another parameter is the web port which will be handled as optional. If no argument has been passed the default value of 3000 is taken. Finally, a third argument might be passed to enable the output of log information. Its also an optional parameter and log output is disabled by default. Some code lines in main() will validate these parameters at the beginning and before further processing to initiate the extractStats object. As stated before, the extension for a dynamic handling of hostname could be added by passing one more command line argument with an according value.

Conclusion

It is the intention of this article to introduce the audience with some recent features in BICC to provide dedicated monitoring information per extraction job. The focus for the solution and sample code is clearly set on the reading of files and processing of log data. The visualization of data has been kept as simple as possible with some restrictions on the UI flexibility.

The provided code for the download script and the processing is runnable and can be used out of the box with the given features. On other hand this sample can be adopted for further modifications and enhancements by the readers. As mentioned before this can be done by migrating the tech stack to an Oracle solution (Oracle JET, OKE etc) or by adding enhanced features like a continuous downloading and processing of EXTRACT.JSON files.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha