Introduction

Enterprise Scheduler Service (ESS) is the backbone of batch and background processing in Oracle Fusion applications. Monitoring ESS jobs is critical to ensure business continuity, data freshness, SLA adherence, and operational confidence.

Enterprise customers rely on thousands of Oracle Fusion ESS jobs to run payroll, financial close, integrations, billing, and compliance workloads.

Traditional ESS monitoring techniques include:

  • ESS Scheduled Processes UI (reactive, manual)
  • ESS Scheduler REST APIs (operational but limited analytics)
  • Database queries on ESS history tables (powerful but not operationalized)

However, these approaches lack trend detection & analysis and proactive risk detection, failures are detected after business impact, performance degradation goes unnoticed for weeks, SLA breaches are reactive, not preventable.

This leads to outages, escalations, audit risks.

Objective

The objective of the ESS Job Health & Risk Observatory is to provide an industry-aligned, proactive, and decision-oriented monitoring framework.

  • Detect failures, delays, and anomalies early
  • Provide SLA and performance intelligence
  • Reduce noise and focus on actionable risks

The ESS Job Health & Risk Observatory changes Fusion ESS monitoring from simple job tracking into actionable operational intelligence. Instead of reacting to failures, customers gain clear visibility into current job health, emerging risks, and future failure trends using identified key operational metrics (E.g… Average runtime, Success Rate, Failure Rate, Runtime volatility, Failure probability, SLA breach probability, Stuck Job probability, Job Regression, Job risk score).

With more than 30 purpose-built widgets and queries, the dashboard reveals patterns that are not visible in standard ESS views. These include runtime drift, recurring failures, schedule congestion, SLA exposure, and abnormal execution behavior.

A key capability of the observatory is its rule-based detection and alarm framework. Using Detection rules for any event or risk condition, such as repeated failures, abnormal duration spikes, missed schedules, SLA breaches. These rules drive automatic alerts and alarms, ensuring teams are notified early and consistently when risk thresholds are crossed.

By combining real-time execution signals with historical trends and behavioral analysis, teams can detect issues early, reduce alert noise, and focus only on what matters. This enables a shift from firefighting to prevention, with better confidence in scheduling, capacity planning, and SLA commitments.

When enriched with OCI Log Analytics AI (LoganAI), the same insights support anomaly detection and faster root-cause analysis

This dashboard delivers derived intelligence, not raw data, giving customers foresight instead of hindsight.

In this blog, we explain how this dashboard can be designed and implemented using the underlying tools and analytical techniques that make predictive ESS observability possible.

ESS Job Health & Risk Observatory

ESS Job Health & Risk Observatory
ESS Job Health & Risk Observatory

ESS Job Health Detailed Info (Drill Down Report)

ESS Job Health Detailed Info (Drill Down Report)

Key Metrics

Each metric answers a concrete operational question and maps directly to an action, this is explained in details in section 1.3 ESS Job Health Signal Calculation

MetricWhat It RepresentsWhy It Is Useful for Customers
Average Run time MinutesAverage runtime of successful executions within the monitoring window.Short-term performance tracking Drift detection  
Standard Deviation  Run time MinutesStandard deviation of job runtimes within the monitoring window.It measures how much a job’s runtime varies from run to run.

If a job always takes about the same time, this value is low If a job sometimes runs fast and sometimes very slow, this value is high It tells you runtime consistency, not speed.
Average Run time Baseline MinutesTypical job runtime – Long-term historical average runtime.Baseline for planning and benchmarking
P95 Run time MinutesExpected worst-case runtime – Upper runtime bound for normal behavior.  SLA design, alert thresholds, capacity sizing
Runtime Trend SlopeDirection and strength of runtime change over time – Is the job getting slower over time?• Detects gradual degradation not visible in single runs
• Early warning before SLAs or failures occur
• Essential for predictive observability
Runtime Volatility IndexDegree of runtime variability for a job – Is runtime becoming unpredictable?• High volatility = unstable job behavior
• Helps explain why SLAs are breached intermittently
• Critical for confidence scoring and alert trustworthiness
Run Time Regression StatusDetects performance degradation between consecutive runs.Early signal of data growth Detects degradation before SLA breach  
Predicted Failure ProbabilityLikelihood that the next execution of an ESS job will fail• Predicts risk before failure happens
• Avoids reacting only after incidents
• Helps prioritize proactive remediation for high-risk jobs even if last run succeeded
SLA Breach ProbabilityProbability that a job’s runtime exceeds its historical SLA baselineHow likely is this job to run longer than what is considered normal for it?
• Identifies jobs that are silently drifting toward SLA breaches
• Useful for capacity planning and SLA governance
Stuck Job ProbabilityProbability that a currently running job is “stuck”• Highlights jobs that may never complete
• Enables early termination or escalation
• Prevents resource starvation and cascading delays
Composite Job Risk ScoreUnified risk indicator combining multiple failure dimensions – Single number summarizing job health• Single, explainable risk number for dashboards
• Enables sorting, bucketing, and alert thresholds
• Avoids alert fatigue by combining signals
Confidence Score WeightedA numeric confidence score (0–1) used internally for:

Threshold tuning, Alert gating, Observability & dashboards  
Allows fine-grained analysis beyond LOW / MEDIUM / HIGH

Enables future adaptive thresholds  
Confidence Level (LOW / MEDIUM / HIGH)Trustworthiness of computed metrics – How reliable the prediction is ?• Prevents false alerts on sparse data
• Allows alerts only when signals are reliable
• Increases customer trust in observability outputs


High Level Architecture

Data Flow

Fusion Applications → BI Publisher → OIC REST API → OCI Management Agent → OCI Log Analytics → Widgets/Dashboards → Detection Rules → Alarms !


Detailed Steps

1 – BI-Layer


The BI Analytics–driven approach is intentionally chosen because it enables historical baselining with statistical modeling.

This solution does not replace ESS APIs instead, it elevates ESS monitoring from reactive status checks to predictive job health intelligence.

BI is explicitly designed to bring aggregate Job data at controlled intervals and return only pre-computed summaries, ensuring negligible performance impact on Fusion therefore data volume remains low which brings –

  • Time-series analytics
  • Historical analysis
  • Scheduled batch data access
  • Derived metrics

Follow Quick Links to jump over topics

1.1- Report Columns

Column Legend – Computation Scope

  • Windowed → Uses :P_FROM_TIMESTAMP:P_TO_TIMESTAMP
  • Cumulative (as of TO) → Uses data up to :P_TO_TIMESTAMP
  • Derived → Parsed / inferred metadata
  • Calculated → Computed using formulas, statistics, or rules
Job Identity & Metadata
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
JOB_IDENTIFIERUnique ESS job technical keyPrimary identifier, joinsDerivedESS_REQUEST_HISTORY.DEFINITION
JOB_NAMEHuman-readable job nameUI displayDerivedExtracted from definition path
APPLICATIONFusion applicationApp filteringDerivedESS metadata
PRODUCTFusion productProduct analyticsDerivedESS metadata
REQUEST_TYPEJob request typeExecution classificationDerivedESS request type
PROCESSGROUPESS process groupCapacity analysisDerivedESS metadata
SUBMITTERJob submitterAudit & RCADerivedESS metadata
SCHEDULEDScheduled vs ad-hocExecution behaviorDerivedESS flag
JOB_DEDUPE_KEYUnique job-run keyAlert deduplicationDerivedRequest ID || DEFINITION
Execution Timing & Request Details
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
LAST_RUN_TIMESTAMPLatest execution startFreshness checksWindowedMax PROCESSSTART
LAST_RUN_END_TIMELatest execution endRuntime calcWindowedPROCESSEND
REQUEST_START_TIMERequested start timeSchedule adherenceWindowedREQUESTEDSTART
REQUEST_EVENT_TIMEAnalytics event timeIncremental ingestionWindowedAlias of last run time
REQUEST_COMPLETE_TIMECompletion timestampSLA checksWindowedCOMPLETEDTIME
REQUESTIDLatest request IDDrill-downWindowedHighest request ID
PARENTREQUESTIDParent requestJob hierarchyDerivedESS metadata
Success & Runtime History
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
LAST_SUCCESS_TIMESTAMPMost recent successReliability signalCumulative (as of TO)Max success COMPLETEDTIME
LAST_RUNTIME_MINUTESLast success runtimePerformance trackingCumulative(END − START) × 1440
PREVIOUS_RUNTIME_MINUTESPrior success runtimeRegression detectionCumulative2nd latest success
RUNTIME_REGRESSION_STATUSRuntime degradationPredictive alertsCalculatedCompares last vs previous using regression factor
Baseline & Statistical Metrics
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
AVG_RUNTIME_BASELINE_MINHistorical mean runtimeSLA baselineCumulativeAVG(success runtimes)
P95_RUNTIME_BASELINE_MIN95th percentile runtimeSLA thresholdCumulativePERCENTILE_CONT(0.95)
AVG_RUNTIME_MINUTESAvg runtime (window)Trend analysisWindowedAVG(success only)
STDDEV_RUNTIME_MINUTESRuntime deviationStability analysisWindowedSTDDEV(success only)
RUNTIME_VOLATILITY_INDEXRuntime instabilityRisk scoringCalculatedSTDDEV / AVG
RUNTIME_TREND_SLOPE_PER_RUNRuntime trendCapacity planningCumulativeLinear regression slope
Execution & Failure Metrics
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
TOTAL_RUNS_CUMULATIVETotal executionsJob criticalityWindowedCOUNT(1)
FAILED_RUNSFailuresReliabilityWindowedSTATE IN (10,15,18,19)
SUCCEEDED_RUNSSuccess countConfidenceWindowedSTATE IN (12, 17)
RUNNING_RUNSActive runsLoadWindowedSTATE = 3
BLOCKED_RUNSBlocked jobsDependencyWindowedSTATE = 5
CANCELLED_RUNSCancelledOperationalWindowedSTATE IN (7,9)
WARNING_RUNSWarningConfidenceWindowedSTATE = 11
QUEUE_PRESSUREQueueReliabilityWindowedSTATE IN (1,2,6,13,14,20)
FAILURES_LAST_24_HOURS24h failuresImmediate alertingCalculatedRolling 24h window
FAILURES_LAST_7_DAYS7-day failuresTrend analysisCalculatedRolling 7-day window
ADHOC_COUNTAdhoc CountLoad distributionWindowedJobs executed by Adhoc Submissions
SCHEDULE_COUNTScheduled CountLoad distributionWindowedJobs executed by Scheduled Submissions
Running & Stuck Job Indicators
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
CURRENTLY_RUNNING_COUNTActive job countCapacityCumulativeCount of active states
MAX_RUNNING_DURATION_HOURSLongest runtimeHung job detectionCalculatedElapsed runtime
STUCK_JOB_PROBABILITYStuck likelihoodCritical alertsCalculatedCompared vs P90 / P95
Predictive & Risk Analytics
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
PREDICTED_FAILURE_PROBABILITYNext-run failure riskProactive alertingCalculatedBayesian smoothing
SLA_BREACH_PROBABILITYSLA violation chanceSLA alertsCalculated% runs > P95
JOB_RISK_SCOREComposite risk scoreAlert prioritizationCalculatedWeighted multi-factor formula
CONFIDENCE_LEVELSignal reliabilityNoise suppressionCalculatedRun count + volatility + trend
CONFIDENCE_SCORE_WEIGHTEDNumeric confidenceThreshold tuningCalculatedWeighted scoring formula
Scheduling Intelligence
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
SCHEDULE_START_TIMESchedule startSLA alignmentDerivedParsed from XML
SCHEDULE_END_TIMESchedule endOverdue detectionDerivedParsed from XML
SCHEDULE_FREQUENCYFrequencyLoad analysisDerivedParsed from XML
SCHEDULE_INTERVALIntervalHigh-frequency detectionDerivedParsed numeric
SCHEDULE_ENDED_FLAGSchedule endedAlert suppressionCalculatedESS terminal state
SCHEDULE_OVERDUE_FLAGSchedule overdueCritical alertCalculatedEnd < now & not ended
HIGH_FREQUENCY_SCHEDULE_FLAGVery frequent jobsLoad/cost alertsCalculatedMINUTELY / HOURLY ≤ 60
Request Diagnostics
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
LAST_REQUEST_IDLatest requestDrill-downCumulativeMAX(requestid)
LAST_SUCCESS_REQUEST_IDLast successRCACumulativeMAX success
LAST_FAILED_REQUEST_IDLast failureRCACumulativeMAX failure
LONGEST_RUNNING_REQUEST_IDLongest active runHung job debugCalculatedMax runtime
 Alert Decision Outputs
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
ALERT_FLAGAlert decisionOCI alarmsCalculatedRule engine
ALERT_REASONRoot reasonOperator clarityCalculatedFailure / SLA / Risk
ALERT_SEVERITYSeverity levelEscalationCalculatedRisk-based severity
Meta Fields
Column NameDescriptionHow It Is UsedComputation ScopeCalculation / Derivation
META_FROM_TIMESTAMPQuery startAPI traceabilityInputParameter
META_TO_TIMESTAMPQuery endAPI traceabilityInputParameter
META_GENERATED_ATExecution timeAuditCalculatedSYSTIMESTAMP UTC
META_RECORD_COUNTRecord countPaginationCalculatedCOUNT(*) OVER()

1.2 – Query Parameters

The ESS Job Health & Risk Observatory uses a small set of tunable control parameters to balance sensitivity, accuracy, and operational noise.

These parameters do not change raw execution data , they influence how predictive intelligence and alerts are derived from it.

They can be adjusted to align with specific risk appetite, job criticality, and data maturity.

Below Parameters have default value but can be set as per specific needs, these parameters do not affect job execution, they do not increase ESS or Fusion load, they operate on aggregated, periodic snapshots ,this design ensures predictive intelligence without performance impact on Fusion applications.

ParameterDescriptionDefaultMeaning / ImpactHow Used in Query
:P_FROM_TIMESTAMPMonitoring window startRequiredDefines time window lower boundFilters windowed metrics
:P_TO_TIMESTAMPMonitoring window endRequiredSnapshot time for metricsAnchor for all calculations
:P_REGRESSION_FACTORRuntime regression sensitivity1.5Detects runtime degradationLast vs previous runtime
:P_MIN_SUCCESS_RUNSMinimum success runs3Confidence eligibilityConfidence calculation
:P_STUCK_JOB_PROBABILITYStuck alert threshold0.8Hung job detectionAlert rules
:P_JOB_RISK_SCORERisk alert threshold0.6High-risk jobsAlert rules
:P_SLA_BREACH_PROBABILITYSLA alert threshold0.4SLA risk toleranceAlert rules

P_REGRESSION_FACTOR
Purpose: Controls sensitivity to runtime regressions between consecutive successful runs.
What it means:Defines how much slower a job must run (compared to its previous successful run) before it is classified as a runtime regression.
Default behavior: Default value (e.g. 1.5) means the job must run 50% longer than its previous run to be flagged.
When to tune:

  • Lower value (e.g. 1.2):
    Detects smaller performance degradations earlier (more sensitive).
  • Higher value (e.g. 2.0):
    Flags only severe regressions (less noise).
  • Recommended usage: Keep default unless jobs are highly latency-sensitive.

P_MIN_SUCCESS_RUNS
Purpose: Ensures statistical confidence before applying predictive logic.
What it means:Minimum number of successful executions required before, Runtime trend analysis, Volatility calculation, Confidence scoring calculations are considered reliable.
Default behavior: Jobs with fewer than this number are marked LOW confidence.
When to tune:

  • Lower value:
    Useful in new environments with limited history.
  • Higher value:
    Improves accuracy in mature, high-volume systems.
  • Recommended usage: Do not reduce below 3 unless absolutely necessary.

P_STUCK_JOB_PROBABILITY
Purpose: Threshold for flagging a running job as potentially stuck.
What it means:Represents how far current runtime has exceeded historical norms (P95 baseline).
Default behavior:Alerts triggered only when probability crosses a high threshold (e.g. 0.8).
When to tune:

  • Lower value:
    Early warning for long-running jobs.
  • Higher value:
    Alerts only when jobs are clearly abnormal.
  • Recommended usage: Tune lower for business-critical batch windows.

P_JOB_RISK_SCORE
Purpose: Global threshold for composite job risk.
What it means:Defines when a job’s combined risk (failure probability + SLA risk + volatility + trend + stuck risk) becomes actionable.
Default behavior: Typical alert threshold around 0.6.
When to tune:

  • Lower value:
    Aggressive alerting, proactive operations.
  • Higher value:
    Focus only on high-impact risks.
  • Recommended usage:Adjust per environment maturity and alert fatigue tolerance.

P_SLA_BREACH_PROBABILITY
Purpose: Controls when predicted SLA violations generate alerts.
What it means:
Probability that a job’s runtime will exceed its historical SLA baseline (P95).
Default behavior:
Alerts triggered when probability crosses a moderate threshold (e.g. 0.4).
When to tune:

  • Lower value:
    SLA-driven environments with strict deadlines.
  • Higher value:
    Non-critical or flexible batch workloads.
  • Recommended usage:Align with business SLA definitions rather than technical limits.

1.3 – ESS Job Health Signal Calculations

This section explains how each analytical signal is derived, why it exists, and how it contributes to job health and alerting decisions.

1.3.1 – AVG_RUNTIME_MINUTES

Average runtime of successful executions within the monitoring window.
How It Is Calculated
AVG(runtime_minutes) WHERE state = SUCCESS AND PROCESSSTART between FROM and TO
Usage

  • Short-term performance tracking
  • Drift detection

1.3.2 – STDDEV_RUNTIME_MINUTES

Runtime variability within the window.
How It Is Calculated
STDDEV(runtime_minutes) WHERE state = SUCCESS
Usage

  • Stability analysis
  • Volatility detection

1.3.3 – AVG_RUNTIME_BASELINE_MIN

Long-term historical average runtime.
How It Is Calculated
AVG(runtime_minutes) WHERE state = SUCCESS AND PROCESSSTART <= TO_TIMESTAMP
Usage

  • SLA baseline
  • Performance expectations

1.3.4 – P95_RUNTIME_BASELINE_MIN

Upper runtime bound for normal behavior.
How It Is Calculated PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY runtime_minutes)
Usage

  • SLA thresholds
  • Stuck job detection
  • SLA breach probability

1.3.5 – RUNTIME_TREND_SLOPE_PER_RUN

Indicates whether job runtime is:
• Increasing
• Decreasing
• Stable
How It Is Calculated REGR_SLOPE(runtime_minutes, run_sequence)
Interpretation

SlopeMeaning
≈ 0Stable
> 0Runtime increasing
< 0Runtime improving

Usage

  • Detects slow degradation
  • Enables capacity planning

DERIVED STABILITY & BEHAVIOR SIGNALS

1.3.6 – RUNTIME_VOLATILITY_INDEX

How unstable job runtime is over time.

How It Is Calculated STDDEV_RUNTIME / AVG_RUNTIME

• Normalized
• Capped at 1.0 in downstream usage
Interpretation

ValueMeaning
< 0.2Very stable
0.2 – 0.5Moderate variance
> 0.5Highly unstable

1.3.7 – RUNTIME_REGRESSION_STATUS

Detects performance degradation between consecutive runs.
How It Is Calculated
IF previous_runtime IS NULL
  → INSUFFICIENT_HISTORY
ELSE IF
  last_runtime > previous_runtime × regression_factor
  → RUNTIME_REGRESSION
ELSE  → NORMAL
(Default regression factor = 1.5)

Usage

  • Early signal of data growth

Detects degradation before SLA breach

PROBABILITY SIGNALS

1.3.8 – PREDICTED_FAILURE_PROBABILITY

Likelihood that the next execution will fail.
How It Is Calculated failed_runs / total_runs

1.3.9 – SLA_BREACH_PROBABILITY

Represents the likelihood that future executions of an ESS job will exceed its historical SLA threshold

SLA_BREACH_PROBABILITY Meaning

ValueMeaning
< 0.1Healthy
0.1 – 0.3Watch
0.3 – 0.5High risk
> 0.5Very likely to breach

How It Is Calculated
IF runtime_minutes > P95_RUNTIME_BASELINE_MIN → SLA breach
SLA_BREACH_PROBABILITY =  (Number of successful runs exceeding P95) ÷(Total successful runs)

Usage

  • Identifies jobs trending toward SLA failure
  • Enables proactive scaling or scheduling changes

1.3.10 – STUCK_JOB_PROBABILITY

Probability that a currently running job is hung.
How It Is Calculated
Compares current runtime against historical percentiles:

ConditionProbability
Runtime > P950.9
Runtime > P900.6
Else0.1

Usage

  • Percentile-based (job-specific)
  • No hard-coded SLA assumptions

1.3.11 – JOB_RISK_SCORE

A composite health risk score representing the probability that a job will:
• Fail
• Miss SLA
• Become unstable
• Get stuck

Components & Weights

SignalWeight
Failure probability35%
SLA breach probability25%
Runtime volatility15%
Runtime trend magnitude15%
Stuck job probability10%

Formula (Conceptual)
JOB_RISK_SCORE =  0.35 × failure_probability+ 0.25 × sla_breach_probability+ 0.15 × volatility_index+0.15 × abs(runtime_trend_slope)+ 0.10 × stuck_probability

Usage

  • Single, explainable health number
  • Easy alert prioritization
  • Works across job types and frequencies

CONFIDENCE SIGNAL

1.3.12 – CONFIDENCE_SCORE_WEIGHTED

A numeric confidence score (0–1) used internally for:
• Threshold tuning
• Alert gating
• Observability & dashboards

How It Is Calculated LEAST(total_runs / 10, 1) * 0.5 + (1 – LEAST(stddev_runtime / avg_runtime, 1)) * 0.3 + CASE WHEN trend_exists THEN 0.2 ELSE 0 END

1.3.13 – CONFIDENCE_LEVEL

Indicates how reliable the computed health signals are for a job.
This prevents false positives for:
• New jobs
• Infrequently run jobs
• Highly unstable runtime patterns

Confidence Components & Weights

ComponentWeightMeaning
Execution history depth50%More runs = higher confidence
Runtime stability30%Lower variance = higher confidence
Trend availability20%Trend present = higher confidence

Score Range Mapping

Score RangeCONFIDENCE_LEVEL
≥ 0.75HIGH
0.40 – 0.74MEDIUM
< 0.40LOW

Why This Works

  • Ensures predictions are based on enough data
  • Automatically adapts as job history grows
  • Eliminates alert noise for new jobs

1.4 –  Alert Engine – Decision Logic & Design

The ESS Job Alert Engine evaluates each job’s execution health using a rule-based decision framework enriched with predictive analytics.

The engine combines:

  • Hard operational signals (failures, stuck jobs, overdue schedules)
  • Soft predictive signals (runtime regression, risk score, SLA breach probability)
  • Confidence gating to avoid false positives

The output is a single actionable signal:

  • ALERT_FLAG = ‘Y’ → Attention required
  • ALERT_FLAG = ‘N’ → Job considered healthy

Alert Engine Design Principles

  1. Hard signals always win
    • If a job is failing or stuck, alert immediately
  2. Predictive alerts require confidence
    • Noisy signals are suppressed when historical data is weak
  3. Single primary alert reason
    • Operators should see why an alert fired
  4. Severity reflects operational impact
    • Not all alerts deserve the same urgency

1.4.1 – ALERT_FLAG – How the Alert Is Triggered

ALERT_FLAG is set to 'Y' if ANY of the following conditions are met.

A. Hard Signals (Immediate Alert)

These signals indicate current operational impact and do not require confidence checks.

A.1 Recent Failures

FAILURES_LAST_24_HOURS > 0
  • Job has failed at least once in the last 24 hours

Alert Behavior Severity = CRITICAL

A.2 Stuck Job Detection

STUCK_JOB_PROBABILITY >= :P_STUCK_JOB_PROBABILITY

(Default threshold = 0.8)

Meaning

  • Running job has exceeded historical P95 runtime
  • estimates the likelihood that a currently running job is abnormally over-running relative to its historical behavior, indicating a real-time execution risk that may require operational intervention
  • For a job that is currently running, how likely is it that it will never complete normally – real-time, forward-looking, and state-aware.
  • Strong indicator of hung threads, deadlocks, or data issues

Alert Behavior Severity = CRITICAL

A.3 Overdue Schedule

SCHEDULE_OVERDUE_FLAG = 'Y'

Meaning

  • Scheduled job did not complete before its defined end time
  • SLA breach or missed business cutoff

Alert Behavior Severity = HIGH

B. Predictive / Soft Signals (Confidence-Gated)

These signals predict future or latent risk and require sufficient historical confidence.

B.1 Confidence Gate

CONFIDENCE_LEVEL IN ('MEDIUM', 'HIGH')

Meaning

  • Prevents alerting on new or unstable jobs
  • Reduces false positives

B.2 Runtime Regression

RUNTIME_REGRESSION_STATUS = 'RUNTIME_REGRESSION'

Meaning

  • Last runtime exceeds previous runtime by a regression factor
  • Early sign of data growth or performance degradation

Alert Behavior Alert only if confidence is MEDIUM or HIGH

B3. High Composite Risk Score

JOB_RISK_SCORE >= :P_JOB_RISK_SCORE

(Default threshold = 0.6)

Meaning

  • Combined risk across failures, SLA, volatility, trends, and stuck probability
  • Represents overall job health deterioration

B3. SLA Breach Probability

SLA_BREACH_PROBABILITY >= :P_SLA_BREACH_PROBABILITY

(Default threshold = 0.4)

  • Significant likelihood that future runs will exceed SLA
  • How often a job historically violates its expected runtime SLA, providing a stable indicator of long-term performance risk.
  • Proactive alert before SLA violations occur
  • How likely is this job to run longer than what is considered normal for it?
  • Based on past successful executions, how often does this job exceed its expected SLA?

ALERT_FLAG Logic

ALERT_FLAG = ‘Y’
IF
FAILURES_LAST_24_HOURS > 0
 OR STUCK_JOB_PROBABILITY >= threshold
 OR SCHEDULE_OVERDUE_FLAG = ‘Y’
 OR ( CONFIDENCE_LEVEL IN (‘MEDIUM’,’HIGH’)
AND (RUNTIME_REGRESSION_STATUS = ‘RUNTIME_REGRESSION’
OR JOB_RISK_SCORE >= threshold
OR SLA_BREACH_PROBABILITY >= threshold
)
) ELSE ‘N’

1.4.2 – ALERT_REASON

How the Primary Reason Is Selected, Only one reason is emitted to keep alerts actionable, The most operationally urgent reason is always selected first.

Precedence Order (Top Wins)

  1. FAILURE
  2. STUCK
  3. SCHEDULE_OVERDUE
  4. RUNTIME_REGRESSION
  5. HIGH_RISK_SCORE
  6. SLA_RISK

CASE
WHEN FAILURES_LAST_24_HOURS > 0 THEN ‘FAILURE
WHEN STUCK_JOB_PROBABILITY >= threshold THEN ‘STUCK
WHEN SCHEDULE_OVERDUE_FLAG = ‘Y’ THEN ‘SCHEDULE_OVERDUE
WHEN RUNTIME_REGRESSION_STATUS = ‘RUNTIME_REGRESSION‘ THEN ‘RUNTIME_REGRESSION’
WHEN JOB_RISK_SCORE >= threshold THEN ‘HIGH_RISK_SCORE
WHEN SLA_BREACH_PROBABILITY >= threshold THEN ‘SLA_RISK‘ END

1.4.3 – ALERT_SEVERITY

How Severity Is Determined – Severity reflects business impact and urgency.

ConditionSeverity
Recent failuresCRITICAL
Stuck jobCRITICAL
Schedule overdueHIGH
Risk score ≥ 0.75HIGH
Risk score ≥ 0.60MEDIUM
Predictive signals onlyMEDIUM
No alertLOW

CASE
WHEN FAILURES_LAST_24_HOURS > 0 THEN ‘CRITICAL
WHEN STUCK_JOB_PROBABILITY >= threshold THEN ‘CRITICAL
WHEN SCHEDULE_OVERDUE_FLAG = ‘Y’ THEN ‘HIGH
WHEN JOB_RISK_SCORE >= 0.75 THEN ‘HIGH
WHEN JOB_RISK_SCORE >= 0.6 THEN ‘MEDIUM
ELSE ‘LOW’ END

 
1.5 –  Report Design

These signals intentionally combine short-term windowed behavior with long-term historical baselines, ensuring that alerts are both timely and statistically grounded.

1.5.1 –  Create BI Report


2 – OIC Layer

  • Expose BI report via REST facade
  • Normalize output
  • Enforce security


2.1 – Prerequisites

Fusion BIP Report

  • Data model uses SQL query
  • Report returns XML output
  • Report is tested in Fusion UI

OIC Connectivity

We have used OAuth2.0 as better Security posture, however same can be done via basic authentication.

Schemas Available

  • XSD for:
    • BIP SOAP response
    • Stage File read operation
  • JSON request/response
    • Request parameters
    • Response payload.

Step 2.2 : Create App Driven Integration

  1. Navigate to Integrations → Insight → Integrations
  2. Click Create → App Driven Orchestration
  3. Provide:
    • Integration Name
    • Identifier
    • Package
  4. Enable REST Adapter as trigger

This REST endpoint becomes the public API for your BIP report, please keep note of this end point as this will be used as Log End Point URL when creating Source in Log Analytics.

Step 2.3 : Configure REST Trigger (GetESSHealthInfo)

Define REST Contract

Step 2.4: Map REST Request to BIP SOAP Request

  1. Add Map action after trigger
  2. Source: REST request
  3. Target: BIP runReport SOAP request

Mapping includes:

  • Report Path
  • Report Name
  • Report Parameters
  • Output Format = xml
  • P_FROM_TIMESTAMP
  • P_TO_TIMESTAMP
  • P_REGRESSION_FACTOR
  • P_MIN_SUCCESS_RUNS
  • P_STUCK_JOB_PROBABILITY
  • P_JOB_RISK_SCORE
  • P_SLA_BREACH_PROBABILITY

Step 2.5: Invoke BIP SOAP Service

  1. Add Invoke action
  2. Use Fusion Applications Adapter defined in prerequisites
  3. Select BI Publisher Service
  4. Operation: runReport

SOAP response is returned with:

  • reportBytes (Base64 encoded XML)

Step 2.6: Write SOAP XML to Stage File (Opaque Handling)

  1. Add Map → Stage File (Write)
  2. Provide:
    • File name (dynamic or static)
    • Directory
  3. Map:
    • Base64 decoded reportBytes → file content
  4. Use Opaque Schema XSD, required because

Step 2.7: Read Stage File using Structured XSD

  • Add Stage File → Read File
  • Associate Custom XSD that matches BIP XML structure
  • OIC interprets XML as structured data

Step 2.8: Map BIP XML to REST Response

  • Add Map action
    • Source:
  • Stage file read output (XSD-based)
    • Target:
  • REST response XSD

Step 2.9: Return REST Response

  • Final map sends transformed payload back
  • REST client receives:
    • Clean JSON

Step 2.10: Activate Integration & Test API


3 – OCI Logging Analytics


3.1 – Prerequisites

Process flow – Management Agent ↓ Calls OIC REST API (POST) ↓ OAuth token fetched using Source Credential ↓ JSON parsed by Fusion_ESS_Job_Health_JSON_Parser ↓ Records stored in Log Group ↓ Entity = Fusion_ESS_Job_Health -> Dashboards / Alerts

Follow Quick Links to jump over topics

STEP 3.1 – Create Management Agent

We need to provision compute VM Instance on which Management Agent runs, this Management Agent will eventually poll REST Endpoints for continuous log collection.

VM Creation

  • Name – fusion-mgmt-agent-01 (You can keep name as per your conventions)
  • Shape: VM.Standard.E4.Flex
  • OS: Oracle Linux 8
  • Network:
    • Private subnet preferred
    • Internet access or NAT required
  • Open outbound HTTPS (443)
  • Check for Addition of Management Agent.

Follow below Guidance for configuration of VM  

Compute

  • VM.Standard.E5.Flex
  • 1–2 OCPUs
  • 12–16 GB RAM
  • Oracle Linux 9
  • Add – Management Agent , Optionally OS Management Hub Agent

Select Management Agent in Oracle Cloud Agent , optionally add – OS Management Hub Agent to Automate Management and monitor updates and patches for the operating system environment on the instance.
In case you choose – OS Management Hub Agent, then you need to choose Profile – Profile which matches with OS details, if it not there you can create new profile

Network – Select VCN which allows Ingress Access to Port 22 for compute SSH access

Download Key Pairs so you can use that to connect to compute.

Finally Press – Create Button to submit provision request & wait for instance to be in active state.

Keep Note of Management Agent Name & Public IP, in our case it is – fusion-mgmt-agent-01, this will be used later while creating Entity in next steps.

STEP 3.2 – Update Management Agent

Update management Agent to Enable REST API Collection method.

Open Terminal & SSH to Compute

ssh -i /path/to/your_private_key.pem opc@PUBLIC_IP_OF_COMPUTE

Validate Agent is Running, below command should return Active.

systemctl is-active oracle-cloud-agent

Starts an interactive Bash shell as the user oracle-cloud-agent

sudo -u oracle-cloud-agent /bin/bash

Access Agent configuration File

cd /var/lib/oracle-cloud-agent/plugins/oci-managementagent/polaris/agent_inst/config

Edit Agent Configuration File to Enable REST API Collection.

vi emd.properties 

Go to Enter Mode in Vi Editor & Paste below lines in End to allows REST API–based log collection

loganalytics.rest_api.enable_oci_api=true 
loganalytics.rest_api.report_interval=600

Verify If lines are added

grep loganalytics.rest_api emd.properties

Restart the Management Agent so the change takes effect

sudo systemctl restart oracle-cloud-agent 

Validate if Agent is Active and running

sudo systemctl status oracle-cloud-agent

STEP 3.3 – Create Log Group

For Logical isolation for ESS monitoring data

OCI Console → Log Analytics
Log Groups → Create
Name: Fusion_ESS_Job_Health_Log_Group

STEP 3.4 – Create Log Parser

Parses ESS Job Health REST API response from OIC
Log Analytics → Administration → Parsers → Create ParserJSON Parser

Field  Value
NameFusion_ESS_Job_Health_JSON_Parser_V5
TypeJSON
Example Log ContentSample JSON Rest Response 
Log Entry JSON Path$Jobs
Log Entry TypeARRAYOBJECT

Sample JSON Rest Response Structure –

REST API response
 ├── meta             ignored by parser
 └── jobs[]           each element = 1 log entry
       ├── runtime
       ├── executionStats
       ├── alert
       └── schedule

Note – Map Attribute “requestEventTime” to Log Analytics Standard Time, this column would be used by log analytics to collect data in incremental way using Time Intervals.

Refer Mapping as below –

JSON PathLog Analytics FieldData TypeCreation Type
$.jobIdentifierjobIdentifierSTRINGUser-created
$.jobNamejobNameSTRINGUser-created
$.applicationFusionApplicationSTRINGUser-created
$.productFusionProductSTRINGUser-created
$.requestTyperequestTypeSTRINGUser-created
$.requestEventTimeTimeTIMESTAMPOracle-defined
$.requestStartTimeFusionStartTimeTIMESTAMPUser-created
$.requestCompleteTimeFusionCompletetimeTIMESTAMPUser-created
$.lastRunTimestampFusionlastRunTimestampTIMESTAMPUser-created
$.lastRunEndTimeFusionLastRunEndTimeTIMESTAMPUser-created
$.lastSuccessTimestampFusionlastSuccessTimestampTIMESTAMPUser-created
$.runtime.lastRuntimeMinuteslastRuntimeMinutesDOUBLEUser-created
$.runtime.previousRuntimeMinutespreviousRuntimeMinutesDOUBLEUser-created
$.runtime.avgRuntimeBaselineMinutesavgRuntimeBaselineMinutesDOUBLEUser-created
$.runtime.p95RuntimeBaselineMinutesp95RuntimeBaselineMinutesDOUBLEUser-created
$.runtime.runtimeVolatilityIndexruntimeVolatilityIndexDOUBLEUser-created
$.runtime.runtimeTrendSlopePerRunruntimeTrendSlopePerRunDOUBLEUser-created
$.runtime.runtimeRegressionStatusruntimeRegressionStatusSTRINGUser-created
$.executionStats.totalRunstotalRunsLONGUser-created
$.executionStats.succeededRunssucceededRunsLONGUser-created
$.executionStats.failedRunsfailedRunsLONGUser-created
$.executionStats.runningRunsrunningRunsLONGUser-created
$.executionStats.blockedRunsblockedRunsLONGUser-created
$.executionStats.cancelledRunscancelledRunsLONGUser-created
$.executionStats.warningRunswarningRunsLONGUser-created
$.executionStats.adhocCountadhocCountLONGUser-created
$.executionStats.scheduledCountscheduledCountLONGUser-created
$.executionStats.queuePressurequeuePressureLONGUser-created
$.executionStats.avgRuntimeMinutesavgRuntimeMinutesDOUBLEUser-created
$.executionStats.stddevRuntimeMinutesstddevRuntimeMinutesDOUBLEUser-created
$.failureMetrics.failuresLast24HoursfailuresLast24HoursLONGUser-created
$.failureMetrics.failuresLast7DaysfailuresLast7DaysLONGUser-created
$.failureMetrics.predictedFailureProbabilitypredictedFailureProbabilityDOUBLEUser-created
$.slaMetrics.slaBreachProbabilityslaBreachProbabilityDOUBLEUser-created
$.runningState.currentlyRunningCountcurrentlyRunningCountLONGUser-created
$.runningState.maxRunningDurationHoursmaxRunningDurationHoursDOUBLEUser-created
$.runningState.stuckJobProbabilitystuckJobProbabilityDOUBLEUser-created
$.runningState.longestRunningRequestIdlongestRunningRequestIdSTRINGUser-created
$.confidence.confidenceLevelconfidenceLevelSTRINGUser-created
$.confidence.confidenceScoreWeightedconfidenceScoreWeightedDOUBLEUser-created
$.risk.jobRiskScorejobRiskScoreDOUBLEUser-created
$.schedule.scheduleStartTimescheduleStartTimeTIMESTAMPUser-created
$.schedule.scheduleEndTimescheduleEndTimeTIMESTAMPUser-created
$.schedule.scheduleFrequencyscheduleFrequencySTRINGUser-created
$.schedule.scheduleIntervalscheduleIntervalLONGUser-created
$.schedule.scheduleEndedFlagscheduleEndedFlagSTRINGUser-created
$.schedule.scheduleOverdueFlagscheduleOverdueFlagSTRINGUser-created
$.schedule.highFrequencyScheduleFlaghighFrequencyScheduleFlagSTRINGUser-created
$.requests.lastRequestIdlastRequestIdLONGUser-created
$.requests.lastSuccessRequestIdlastSuccessRequestIdLONGUser-created
$.requests.lastFailedRequestIdlastFailedRequestIdLONGUser-created
$.requests.RequestIdFusionRequestIDLONGUser-created
$.requests.parentRequestIdFusionParentRequestIDLONGUser-created
$.requests.processGroupFusionProcessGroupSTRINGUser-created
$.requests.submitterFusionSubmitterSTRINGUser-created
$.alert.alertFlagalertFlagSTRINGUser-created
$.alert.alertSeverityJobAlertSeveritySTRINGUser-created
$.alert.alertReasonalertReasonSTRINGUser-created
$.additionalInfo.attributeName1attributeName1STRINGUser-created
$.additionalInfo.attributeVal1attributeVal1STRINGUser-created
$.additionalInfo.attributeName2attributeName2STRINGUser-created
$.additionalInfo.attributeVal2attributeVal2STRINGUser-created
$.additionalInfo.attributeName3attributeName3STRINGUser-created
$.additionalInfo.attributeVal3attributeVal3STRINGUser-created
$.additionalInfo.attributeName4attributeName4STRINGUser-created
$.additionalInfo.attributeVal4attributeVal4STRINGUser-created
$.additionalInfo.attributeName5attributeName5STRINGUser-created
$.additionalInfo.attributeVal5attributeVal5STRINGUser-created

STEP 3.5 – Create Log Source

We will Create Logging Analytics REST Source.

Refer – How to set Credentials, Time Parameters as bind variable, refer –
https://docs.oracle.com/en-us/iaas/log-analytics/doc/rest-api-log-collection.html

Log Analytics → Administration → Sources → Create Source

Field  Value
NameFusion_ESS_Job_Health_Source_V30
Source TypeREST API
Entity TypesOIC Integration 
ParserFusion_ESS_Job_Health_JSON_Parser_V5

Log End Point

Log URL – OIC API End point, refer Step 2.2 : Create App Driven Integration
For Example – https://<OICURL>/ic/api/integration/v1/flows/rest/GETESSHEALTHINFO/1.0/ess-jobs/health

Credentials

FieldValue
Log Credentials TypeDynamic Token(OAuth2.0)
Token End Point NameFA-CREDS
Token End Point URLhttps://<idcsURL>/oauth2/v1/token
Grant Typeclient_credentials
Scopehttps://<OICURL>:urn:opc:resource:consumer::all

Method – POST

Request Payload –

Here in P_FROM_TIMESTAMP & P_TO_TIMESTAMP, we are passing Log analytics standard MacrosSTART_TIME & CURR_TIME with precision of millisecond in date format, as many ESS Jobs would complete within same second however may exceed in milliseconds.

Therefore for accuracy & to account for any overlaps – we don’t want same Jobs data to be collected in next runs, millisecond precision is important, please refer how START_TIME is computed, this is very important to understand how incremental data is collected over each collection runs with moving time window.

The default value of START_TIME is 30 days, that is, the value of the timestamp is 30 days earlier than the current timestamp.

For remaining parameters please refer 1.2 – Query Parameters Section, how they are set as per requirements.

{
  "P_FROM_TIMESTAMP": "{START_TIME:yyyy-MM-dd'T'HH:mm:ss.SSS}",
  "P_TO_TIMESTAMP": "{CURR_TIME:yyyy-MM-dd'T'HH:mm:ss.SSS}",
  "P_REGRESSION_FACTOR": 1.5,
  "P_MIN_SUCCESS_RUNS": 3,
  "P_STUCK_JOB_PROBABILITY": 0.8,
  "P_JOB_RISK_SCORE": 0.6,
  "P_SLA_BREACH_PROBABILITY": 0.4
}

Agent Collection Properties

Set Duplicate as True – This will remove any duplicates collected over time.
Scheduler as 5 minutes or 10 Minutes per your requirements, should not exceed large time intervals for good near real time monitoring & low data volumne.

STEP 3.6 – Create Credential File for REST Access

This is additional step where OAuth2.0 Confidential App Client ID & Secret is saved as credentials inside management agent , as we don’t want them to be exposed over UI.

This is OAuth2.0 Client Confidential App configured to access OIC End points securely.
Refer https://docs.oracle.com/en/cloud/paas/application-integration/integrations-user/protect-integrations-rest-endpoint-oauth.html

Configure OAuth2.0 credential file on agent, Open Terminal & SSH to Compute

ssh -i /path/to/your_private_key.pem opc@PUBLIC_IP_OF_COMPUTE

Create FA-CREDS.json

vi /tmp/FA-CREDS.json

Add below Information, change value of HTTPSUserName & HTTPSPassword for OAuth2.0 Client ID and Secret.

[
 {
      "source": "lacollector.la_rest_api",
      "name": "FA-CREDS",
      "type": "HTTPSBasicAuthCreds",
      "description": "These are HTTPS (BasicAuth) credentials.",
      "properties": 
      [
          {
              "name": "HTTPSUserName",
              "value": "oauth2.0clientid"
          },
          {
              "name": "HTTPSPassword",
              "value": "oauth2.0clientsecret""
          }
      ]
  }
]

Upload Credentials –

cat /tmp/FA-CREDS.json | sudo -u oracle-cloud-agent /var/lib/oracle-cloud-agent/plugins/oci-managementagent/polaris/agent_inst/bin/credential_mgmt.sh -o upsertCredentials -s logan

It should return with message – Credentials Uploaded.

STEP 3.7 – Create Entity

Entity sits over Management Agent & Associate with Source to pull data  

Property NameProperty Value
NameFusion_ESS_Job_Health
Entity TyoeOIC Integration
Management AgentFusion-mgmt-agent-01
Source TypeREST API
>>Add New Properties 
ProducerESS
SourceTypeESS_JOB_HEALTH
applicationFusionInstanceName

STEP 3.8 – Associate Source to Entity

Go to Entity And Press Button – Add Association, to add Source to Entity.

STEP 3.9 – Verify if Entity has started polling REST API

Go to Log Explorer & execute below search Search Query, data should start flowing in sometime, usually as per polling frequency (ie..5 Minutes in our case)

'Log Source' = Fusion_ESS_Job_Health_Source_V30 | timestats count as logrecords by 'Log Source' | sort -logrecords


If data doesn’t flows into then you might need to troubleshoot, if credentials are working or for any other failures, check for any Management Agent log collection warning on source or entity page.
https://docs.oracle.com/en-us/iaas/log-analytics/doc/view-agent-collection-warnings.html


4 – Create Widgets & OCI Dashboards


Once data start flowing in Log analytics , we can start building Widgets.

Below are widgets used for building OCI Dashboard – ESS Job Health & Risk Observatory that tracks for health, failures, runtime trends, risks & shows Unified operational, statistical, and predictive visibility into Fusion ESS jobs, combining windowed execution signals with long-term behavioral risk indicators.

Once these Widgets are created , they can be saved over OCI Dashboard.
OCI Logging Analytics dashboards support automatic refresh at configurable intervals.You can set the refresh interval (e.g., every 1 min, 5 min, 15 min) depending on your log ingestion frequency.

Build Widgets as Saved Query  – Visualize Data Using Charts and Controls
https://docs.oracle.com/en-us/iaas/log-analytics/doc/visualize-data-using-charts-controls.html

Below widgets were developed to build ESS Job Health & Risk Observatory, you can use Log queries to build same and covert them into OCI Dashboard.

WidgetLog QueryVisualizationsDescriptions
1ESS_TotalJobRunsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats sum(totalRuns) as TotalJobRunsTileTotal Number or Jobs in Given Time
2ESS_Accuracy_RateLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats sum(succeededRuns) as success, sum(warningRuns) as warning, sum(totalRuns) as total | eval ESS_Accuracy_Rate = round((success * 1 + warning * 0.7) / total * 100, 2) | stats max(ESS_Accuracy_Rate) as ESS_Accuracy_RateTileHow reliably and predictably are my ESS jobs executing?
3ESS_FailuresRateLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats sum(failedRuns) as failed, sum(totalRuns) as total | eval ESS_Failure_Rate = round((failed * 1) / total * 100, 2) | stats max(ESS_Failure_Rate) as ESS_Failure_RateTile% of Jobs Failing
4ESS_Execution_Outcome_SummaryLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats sum(succeededRuns) as success, sum(failedRuns) as failed, sum(cancelledRuns) as cancelled, sum(warningRuns) as warning, sum(runningRuns) as running, sum(blockedRuns) as blocked, sum(queuePressure) as queueBarAggregates execution results across all jobs.
Why it matters
• Reliability posture
• CIS: failure monitoring
5ESS_LatestRuntimeSnapShotTrendLog Source’ = Fusion_ESS_Job_Health_Source_V30 | timestats avg(lastRuntimeMinutes) as avgRuntimeLineShows average runtime evolution over time.
Why it matters
• Capacity & performance planning
• Early degradation visibility
6ESS_FailureSignalsLog Source’ = Fusion_ESS_Job_Health_Source_V30
| where failedRuns > 0
| timestats sum(failedRuns) AS failureEvents
LineHow many job failures occurred per time bucket?
7ESS_Job_AlertsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where alertReason != ” | stats count as alertCount by alertReasonPieShows why alerts are firing

Helps justify predictive model to stakeholders
8ESS_TopJobsLog Source’ = Fusion_ESS_Job_Health_Source_V30
| stats sum(totalRuns) AS runs BY jobName
| sort -runs
| head 10
BarWhich jobs run the most (volume)?
Ranks ESS jobs by total execution volume within the selected time range.
9ESS_TopSuccessfullJobsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats sum(succeededRuns) as succeededRuns, sum(totalRuns) as totalRuns by jobName | eval successPct = (succeededRuns * 100) / totalRuns | where successPct >= 99 | sort -successPct | head 10BarHighlights ESS jobs with consistently high success rates
10ESS_TopFailingJobsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats sum(failedRuns) as failedRuns, sum(totalRuns) as totalRuns by jobName | eval errorRate = (failedRuns * 100) / totalRuns | where totalRuns > 0 | where errorRate > 1 | sort -errorRate | head 20BarJos which are failing , calculated bby Error Runs/Total Runs
11ESS_Top10RiskyJobsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats max(jobRiskScore) as jobRiskScore, max(slaBreachProbability) as slaRisk, max(predictedFailureProbability) as predictedFailureRisk, max(runtimeVolatilityIndex) as volatility by jobName, jobIdentifier | sort -jobRiskScore | top 10 jobRiskScoreHorizontal BarShow Top 10 Risky Jobs , Counts jobs with composite jobRiskScore , derived from:
• Failure probability
• SLA risk
• Runtime volatility
• Trend slope
• Stuck probability
12ESS_Longest_Running_JobsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats max(lastRuntimeMinutes) as runtime by jobName | sort -runtime | top 10 runtimeBarIdentifies jobs with highest runtime.
Why it matters
• Bottleneck analysis
• Capacity planning
13ESS_High_Frequency_Scheduled_JobsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where highFrequencyScheduleFlag = Y | stats sum(totalRuns) as totalRuns by jobName | sort -totalRunsHorizontal BarShows Jobs which are Scheduled very Frequently
14ESS_High_Volume_RunsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where totalRuns > 10000 | stats sum(totalRuns) as totalRuns by jobName | sort -totalRunsSunburst Shows jobs with:
• 20k+ runs
• MEDIUM confidence
• Low visible failures
These are latent risks, Small defect × massive volume = systemic outage
15ESS_Runtime_VolatilityLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats avg(runtimeVolatilityIndex) as volatility by jobName | top 10 volatilityHorizontal BarIdentifies jobs with unstable runtimes using:
runtimeVolatilityIndex = stddev / mean
Why it matters
• Predicts failure likelihood
• CIS: anomaly detection
16ESS_Jobs_LikelytoFailLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where predictedFailureProbability >= 0.1 | stats max(predictedFailureProbability) as predictedFailureProbability by jobName | sort -predictedFailureProbability | top 10 predictedFailureProbabilityHorizontal BarTop ten Jobs which are Likely to Fail with predictedFailureProbability >= 0.1
17ESS_SLA_Risk_JobsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where slaBreachProbability >= 0.2 | stats max(slaBreachProbability) as slaBreachProbability, max(avgRuntimeMinutes) as avgRuntimeMinutes, max(p95RuntimeBaselineMinutes) as p95Baseline by jobName, jobIdentifier | sort -slaBreachProbabilityBarAvg SLA risk per job.
18ESS_Silent_Risk_JobsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where failuresLast24Hours = 0 | where jobRiskScore >= 0.6 or slaBreachProbability >= 0.4 | stats max(jobRiskScore) as jobRiskScore, max(slaBreachProbability) as slaBreachProbability by jobName | sort -jobRiskScore, slaBreachProbabilityBarJobs showing high predicted risk or SLA breach probability despite zero recent failures
19ESS_Runtime_vs_P95Log Source’ = Fusion_ESS_Job_Health_Source_V30 | stats avg(lastRuntimeMinutes) as actual, avg(p95RuntimeBaselineMinutes) as baseline by jobName | eval deviation = actual – baseline | sort -deviationBarCompares actual runtime vs historical P95 baseline.
Why it matters
• SLA drift detection
• CIS: performance thresholds
20ESS_Jobs_By_ApplicationLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats count by FusionApplication | sort -CountHorizontal BarAggregates jobs by Fusion application/module.
21ESS_Jobs_By_SubmitterLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats count as runs by FusionSubmitter | sort -runsHorizontal Bar• Identifies:
o Noisy service accounts
o Rogue integrations
o Overactive modules
• CIS relevance: accountability & traceability of automated jobs
22ESS_Jobs_By_ProcessGroupsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats count by FusionProcessGroupPie
• Shows ESS cluster pressure
• Helps infra teams correlate:
o CPU spikes
o JVM GC
o SOA congestion
23ESS_Jobs_ScheduledvsAdhocLog Source’ = Fusion_ESS_Job_Health_Source_V30 | stats sum(scheduledCount) as SCHEDULED, sum(adhocCount) as ADHOCBarWhy it matters
• Detects uncontrolled ad-hoc usage
• Supports governance & change management
24ESS_Risk_DistributionLog Source’ = Fusion_ESS_Job_Health_Source_V30 | eval riskBucket = if(jobRiskScore >= 0.75, High, if(jobRiskScore >= 0.6, Medium, if(jobRiskScore >= 0.4, Low, ‘Very Low’))) | where failuresLast24Hours > 0 or failuresLast7Days > 0 | stats count as jobCount, avg(slaBreachProbability) as avgSlaRisk, avg(runtimeVolatilityIndex) as avgVolatility by riskBucketBarThis chart shows where our ESS job risk is concentrated, and whether that risk is driven by instability or SLA exposure, allowing us to focus remediation where it matters most.
25ESS_Runtime_Regression_DriversLog Source’ = Fusion_ESS_Job_Health_Source_V30 | eval regressionDriver = if(succeededRuns < 3, ‘Insufficient History’, runtimeVolatilityIndex > 1, ‘High Runtime Variance’, runtimeTrendSlopePerRun > 0, ‘Upward Runtime Trend’, lastRuntimeMinutes > previousRuntimeMinutes, ‘Recent Runtime Spike’, Stable) | stats count as jobCount by runtimeRegressionStatus, regressionDriver | sort runtimeRegressionStatus, -jobCountBarThis explains why jobs are NORMAL vs REGRESSED vs INSUFFICIENT.
26ESS_Runtime_Regression_DistributionLog Source’ = Fusion_ESS_Job_Health_Source_V30 | fields jobName, runtimeRegressionStatus, succeededRuns, lastRuntimeMinutes, previousRuntimeMinutes, runtimeVolatilityIndex, runtimeTrendSlopePerRun, jobRiskScore | stats count as jobCount, avg(succeededRuns) as avgSucceededRuns, avg(lastRuntimeMinutes) as avgLastRuntime, avg(previousRuntimeMinutes) as avgPreviousRuntime, avg(runtimeVolatilityIndex) as avgVolatility, avg(runtimeTrendSlopePerRun) as avgTrendSlope, avg(jobRiskScore) as avgRiskScore by runtimeRegressionStatus | sort -jobCountBarClassifies jobs as NORMAL / RUNTIME_REGRESSION based on:
lastRuntime > previousRuntime × regressionFactor
Why it matters
• Detects performance decay
• CIS: baseline deviation monitoring
27ESS_Confidence_DistributionLog Source’ = Fusion_ESS_Job_Health_Source_V30 | fields jobName, confidenceLevel, succeededRuns, runtimeVolatilityIndex, runtimeTrendSlopePerRun, jobRiskScore | stats count as jobCount, avg(succeededRuns) as avgRuns, avg(runtimeVolatilityIndex) as avgVolatility, avg(jobRiskScore) as avgRisk by confidenceLevel | eval confidenceOrder = if(confidenceLevel = HIGH, 1, if(confidenceLevel = MEDIUM, 2, 3)) | sort confidenceOrder | fields -confidenceOrderBarDistribution of model confidence.
28ESS_Confidence_Gap_JobsLog Source’ = Fusion_ESS_Job_Health_Source_V30
| where confidenceLevel = “LOW”
| fields jobName, succeededRuns, runtimeVolatilityIndex, runtimeTrendSlopePerRun
BarIdentifies jobs where alerting and predictions are unreliable due to insufficient run history or unstable metrics
29ESS_Job_Activity_HeatmapLog Source’ = Fusion_ESS_Job_Health_Source_V30 | timestats sum(totalRuns) as executions by jobNameHeatmapWhich jobs are active, and when?
Shows when each ESS job had activity signals over time.
Each cell indicates the presence of job execution activity in that time bucket, not the exact run count.
30ESS_Active_AlertsLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where alertFlag = Y | fields alertFlag, JobAlertSeverity, alertReason, jobName, jobIdentifier, FusionApplication, FusionProduct, requestType, confidenceLevel, jobRiskScore, predictedFailureProbability, slaBreachProbability, stuckJobProbability, runtimeRegressionStatus, failuresLast24Hours, failuresLast7Days, lastRunTimestamp, FusionLastRunEndTime, FusionlastRunTimestamp, runningRuns, maxRunningDurationHours, FusionRequestID, lastRequestId, lastFailedRequestId, lastSuccessRequestId, longestRunningRequestId, scheduleOverdueFlag, highFrequencyScheduleFlag, -Entity, -‘Entity Type’, -‘Host Name (Server)’, -‘Problem Priority’, -Label, -‘Log Source’TableLists all jobs currently in alert state.
Why it matters
• SOC / Ops action list
• CIS: incident detection
31ESS_Job_Detail_TableLog Source’ = Fusion_ESS_Job_Health_Source_V30 | fields jobName, FusionApplication, FusionProduct, requestType, FusionSubmitter, alertFlag, alertReason, JobAlertSeverity, lastRequestId, lastSuccessRequestId, lastFailedRequestId, failuresLast24Hours, failuresLast7Days, lastRuntimeMinutes, FusionlastRunTimestamp, FusionlastSuccessTimestamp, confidenceLevel, jobRiskScore, slaBreachProbability, stuckJobProbability, runtimeRegressionStatus, runtimeVolatilityIndex, runtimeTrendSlopePerRun | fields -Entity, -‘Entity Type’, -‘Host Name (Server)’, -‘Problem Priority’, -Label, -‘Log Source’TableSingle pane of glass for investigation.
Why it matters
• Root cause analysis
• CIS: audit & traceability
32ESS_Active_Alerts_DetectionRuleLog Source’ = Fusion_ESS_Job_Health_Source_V30 | where alertFlag = Y | stats count(alertFlag) as alertCount by jobName, JobAlertSeverityTreeMapUsed For Detection Rule – FusionESSActiveAlertsDetectionRule


5 – Create Detection Rules


Use Saved Search – ESS_Active_Alerts_DetectionRule to build Detection rule which will create metrics in OCI Monitoring Service & eventually used in OCI Alarm.

Property NameProperty Value
NameFusionESSActiveAlertsDetectionRule
Rule TypeScheduled search
Target ServiceMonitoring
MetricNamespacefusionessjobalerts
Resource Groupfusionessjobalerts
Metric Namefusionessjobalerts

Similarly, you can create Detection rule for any relevant Event you wanted to be notified using above Widgets(Saved Search) or you can create your own, as for Eg..

  • slaBreachProbability >= 0.4
  • failuresLast24Hours = 0 AND (jobRiskScore >= 0.6 OR slaBreachProbability >= 0.4)
  • For Monitor Failure of any particular critical ESS Job (jobName = ‘<JobName>’)


6 – Create OCI Alarms


Build Alarms using Alert Flag which captures various health parameters as
– Runtime regression
– Failure spikes
– Missed critical jobs

Property NameProperty Value
Alarm Name  ESSActiveAlerts
MetricNamespacefusionessjobalerts
Resource Groupfusionessjobalerts
Metric Namefusionessjobalerts
Interval1 Minute
StatisticsMax
Trigger Rule 1 – Operator (greater than or equal to )1 Minute
Trrigger delay1 Minute
Alarm SeverityCritical
Alarm BodyESS Job Health Alert   One or more ESS jobs have entered an ALERT state.   Severity: CRITICAL Metric: ESSActiveAlerts (threshold >= 1)   Affected job details are included in this notification under: Metric Dimensions (jobName, JobAlertSeverity)   Action: Open the ESS Active Alerts report for full diagnostics, including request IDs, root cause, risk scores, and SLA impact.   Details Report: ESS_Active_Alerts <URL of Widget – ESS_Active_Alerts >
Destination ServiceNotifications
Topic<Topic created for Notification>
Notification SubjectESS Job Health Alert !
Message groupingSplit Notifications Per metric
Message formatSend formatted message

Alarm Notifications –


7 – Analyze data using LoganAI


AI analysis using LoganAI can be done over collected ESS Job data from Log Explorer.

Traditional queries tell us what happened. Logan AI lets us ask why it happened, what will happen next, and where to focus directly on Fusion ESS operational data

Search Query –

'Log Source' = Fusion_ESS_Job_Health_Source_V30 | timestats count as logrecords by 'Log Source' | sort -logrecords

Sample Questions –

  • Summarize the overall health of ESS job executions
  • What are the top 3 operational risks identified from ESS job data?
  • Which is longest running job ?
  • Which jobs have high runtime variability (runtime volatility)?
  • What jobs exceed their expected SLA runtime thresholds?
  • Which Fusion application generates the most ESS job executions (HCM, ERP, SCM)?
  • Are there early warning indicators before job failures occur?
  • Which jobs should be proactively monitored more closely?


Conclusion


The ESS Job Health & Risk Observatory represents an industry-grade monitoring solution. By combining BI-driven analytics, OCI Log analytics observability & AI features (LoganAI), it shifts ESS monitoring from reactive reporting to proactive risk management.

This design aligns with SRE best practices and scales with enterprise complexity.

This solution turns ESS from a black box into a predictable, measurable, and governable system, it converts raw ESS execution data into actionable operational intelligence.

This is not just monitoring – it is predictive analytics:

  • Failure Probability – Predicts likelihood of next-run failure with low noise
  • Runtime Regression Detection – Identifies jobs slowing down vs historical P95 baselines
  • Trend Slope Analysis – Detects long-term degradation before outages occur
  • Composite Job Risk Scoring – Single score combining failures, SLA risk, volatility, and trend
  • Explainable Alerts – Every alert includes why it fired (failure, SLA risk, regression, stuck)

We are no longer asking why did this fail?
We are asking what will fail next, and why?


References

  1. Oracle MOS: Sample SQL Queries for Monitoring ESS (Doc ID 2820161.1)
  2. Oracle ESS Scheduler REST API Documentation
  3. Fusion Apps Observability by Collecting Enterprise Scheduler ESS Logs Using OCI Logging Analytics REST API Ingestion
  4. Creating Innovative User Experience for Fusion Cloud Applications Scheduled Processes


Reference Code Artifacts

BI Data Model Query

SELECT
    /* =========================
       Job FIELDS FOR API
    ========================= */ 
    Q.JOB_IDENTIFIER,
    Q.JOB_NAME,
    Q.APPLICATION,
    Q.PRODUCT,
    Q.REQUEST_TYPE,
    Q.LAST_RUN_TIMESTAMP,
    Q.LAST_RUN_END_TIME,
    Q.REQUEST_START_TIME,
    Q.REQUEST_EVENT_TIME,
    Q.REQUEST_COMPLETE_TIME,
    Q.REQUESTID,
    Q.PARENTREQUESTID,
    Q.PROCESSGROUP,
    Q.SUBMITTER,
    Q.SCHEDULED,
    --
    Q.LAST_SUCCESS_TIMESTAMP,
    Q.LAST_RUNTIME_MINUTES,
    Q.PREVIOUS_RUNTIME_MINUTES,
    Q.RUNTIME_REGRESSION_STATUS,
    --
    Q.AVG_RUNTIME_BASELINE_MIN,
    Q.P95_RUNTIME_BASELINE_MIN,
    --
    Q.FAILURES_LAST_24_HOURS,
    Q.FAILURES_LAST_7_DAYS,
    --
    Q.CURRENTLY_RUNNING_COUNT,
    Q.MAX_RUNNING_DURATION_HOURS,
    --
    Q.TOTAL_RUNS,
    Q.FAILED_RUNS,
    Q.SUCCEEDED_RUNS,
    Q.RUNNING_RUNS,
    Q.BLOCKED_RUNS,
    Q.CANCELLED_RUNS,
    Q.WARNING_RUNS,
    Q.queue_pressure,
   --
    Q.AVG_RUNTIME_MINUTES,
    Q.STDDEV_RUNTIME_MINUTES,
   --
    Q.PREDICTED_FAILURE_PROBABILITY,
    Q.SLA_BREACH_PROBABILITY,
    Q.RUNTIME_VOLATILITY_INDEX,
    Q.RUNTIME_TREND_SLOPE_PER_RUN,
    Q.STUCK_JOB_PROBABILITY,
  --
    Q.CONFIDENCE_LEVEL,
    Q.CONFIDENCE_SCORE_WEIGHTED,
    Q.JOB_RISK_SCORE,
   --
    Q.SCHEDULE_START_TIME,
    Q.SCHEDULE_END_TIME,
    Q.SCHEDULE_FREQUENCY,
    Q.SCHEDULE_INTERVAL,
    Q.SCHEDULE_ENDED_FLAG,
    Q.SCHEDULE_OVERDUE_FLAG,
    Q.HIGH_FREQUENCY_SCHEDULE_FLAG,
   --
    Q.LAST_REQUEST_ID,
    Q.LAST_SUCCESS_REQUEST_ID,
    Q.LAST_FAILED_REQUEST_ID,
    Q.LONGEST_RUNNING_REQUEST_ID,
   --
    Q.ADHOC_COUNT     ,
    Q.SCHEDULE_COUNT  ,   
    /* =========================
       META FIELDS FOR API
       ========================= */
    TO_CHAR(
      TO_TIMESTAMP_TZ(:P_FROM_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3'),
      'YYYY-MM-DD"T"HH24:MI:SS"Z"'
    ) AS META_FROM_TIMESTAMP,

    TO_CHAR(
      TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3'),
      'YYYY-MM-DD"T"HH24:MI:SS"Z"'
    ) AS META_TO_TIMESTAMP,

    TO_CHAR(
      SYSTIMESTAMP AT TIME ZONE 'UTC',
      'YYYY-MM-DD"T"HH24:MI:SS"Z"'
    ) AS META_GENERATED_AT,
    COUNT(1) OVER () AS META_RECORD_COUNT,   
    /* Alert decision */
    CASE
        /* HARD SIGNALS */
        WHEN Q.FAILURES_LAST_24_HOURS > 0 THEN 'Y'
        WHEN Q.STUCK_JOB_PROBABILITY >= NVL(:P_STUCK_JOB_PROBABILITY,0.8) THEN 'Y'
        WHEN Q.SCHEDULE_OVERDUE_FLAG = 'Y' THEN 'Y'
        /* SOFT / PREDICTIVE SIGNALS */
        WHEN Q.CONFIDENCE_LEVEL IN ('MEDIUM', 'HIGH')
            AND Q.RUNTIME_REGRESSION_STATUS = 'RUNTIME_REGRESSION'
            THEN 'Y'
        WHEN Q.CONFIDENCE_LEVEL IN ('MEDIUM', 'HIGH')
            AND Q.JOB_RISK_SCORE >= NVL(:P_JOB_RISK_SCORE,0.6)
            THEN 'Y'
        WHEN Q.CONFIDENCE_LEVEL IN ('MEDIUM', 'HIGH')
            AND Q.SLA_BREACH_PROBABILITY >= NVL(:P_SLA_BREACH_PROBABILITY,0.4)
            THEN 'Y'
        ELSE 'N'
    END AS ALERT_FLAG,
    -- Alert Reason
    CASE
        WHEN
            (
                Q.FAILURES_LAST_24_HOURS > 0
                OR Q.STUCK_JOB_PROBABILITY >= NVL(:P_STUCK_JOB_PROBABILITY,0.8)
                OR Q.SCHEDULE_OVERDUE_FLAG = 'Y'
                OR (
                    Q.CONFIDENCE_LEVEL IN ('MEDIUM','HIGH')
                    AND (
                        Q.RUNTIME_REGRESSION_STATUS = 'RUNTIME_REGRESSION'
                        OR Q.JOB_RISK_SCORE >= NVL(:P_JOB_RISK_SCORE,0.6)
                        OR Q.SLA_BREACH_PROBABILITY >= NVL(:P_SLA_BREACH_PROBABILITY,0.4)
                    )
                )
            )
        THEN
            CASE
                WHEN Q.FAILURES_LAST_24_HOURS > 0 THEN 'FAILURE'
                WHEN Q.STUCK_JOB_PROBABILITY >= NVL(:P_STUCK_JOB_PROBABILITY,0.8) THEN 'STUCK'
                WHEN Q.SCHEDULE_OVERDUE_FLAG = 'Y' THEN 'SCHEDULE_OVERDUE'
                WHEN Q.RUNTIME_REGRESSION_STATUS = 'RUNTIME_REGRESSION' THEN 'RUNTIME_REGRESSION'
                WHEN Q.JOB_RISK_SCORE >= NVL(:P_JOB_RISK_SCORE,0.6) THEN 'HIGH_RISK_SCORE'
                WHEN Q.SLA_BREACH_PROBABILITY >= NVL(:P_SLA_BREACH_PROBABILITY,0.4) THEN 'SLA_RISK'
            END
        ELSE NULL
    END AS ALERT_REASON,
    -- Alert Severity
    CASE
        WHEN
            (
                Q.FAILURES_LAST_24_HOURS > 0
                OR Q.STUCK_JOB_PROBABILITY >= NVL(:P_STUCK_JOB_PROBABILITY,0.8)
                OR Q.SCHEDULE_OVERDUE_FLAG = 'Y'
                OR (
                    Q.CONFIDENCE_LEVEL IN ('MEDIUM','HIGH')
                    AND (
                        Q.RUNTIME_REGRESSION_STATUS = 'RUNTIME_REGRESSION'
                        OR Q.JOB_RISK_SCORE >= NVL(:P_JOB_RISK_SCORE,0.6)
                        OR Q.SLA_BREACH_PROBABILITY >= NVL(:P_SLA_BREACH_PROBABILITY,0.4)
                    )
                )
            )
        THEN
            CASE
                WHEN Q.FAILURES_LAST_24_HOURS > 0 THEN 'CRITICAL'
                WHEN Q.STUCK_JOB_PROBABILITY >= NVL(:P_STUCK_JOB_PROBABILITY,0.8) THEN 'CRITICAL'
                WHEN Q.SCHEDULE_OVERDUE_FLAG = 'Y' THEN 'HIGH'
                WHEN Q.JOB_RISK_SCORE >= 0.75 THEN 'HIGH'
                WHEN Q.JOB_RISK_SCORE >= 0.6 THEN 'MEDIUM'
                ELSE 'MEDIUM'
            END
        ELSE 'LOW'
    END AS ALERT_SEVERITY,
    -- Additional Attribute for Future
    'JOB_DEDUPE_KEY'  ATTRIBUTE_NAME1,
    Q.JOB_DEDUPE_KEY  ATTRIBUTE_VAL1,
    NULL ATTRIBUTE_NAME2,
    NULL ATTRIBUTE_VAL2,
    NULL ATTRIBUTE_NAME3,
    NULL ATTRIBUTE_VAL3, 
    NULL ATTRIBUTE_NAME4,
    NULL ATTRIBUTE_VAL4, 
    NULL ATTRIBUTE_NAME5,
    NULL ATTRIBUTE_VAL5           
FROM
    (
        SELECT
            /* Unique job identifier derived from ESS job name or definition */
            J.JOB_KEY   AS JOB_IDENTIFIER,
            /* Display name of the ESS job */
            J.JOB_DISPLAY_NAME  AS JOB_NAME,
            J.APPLICATION AS APPLICATION,
            J.PRODUCT AS PRODUCT, 
            J.TYPE        AS REQUEST_TYPE,
            /* Most recent submission time of the job */
            J.LAST_RUN_TIME   AS LAST_RUN_TIMESTAMP,
            J.LAST_RUN_END_TIME AS LAST_RUN_END_TIME,
            J.REQUEST_START_TIME AS REQUEST_START_TIME,
            -- REQUEST_EVENT_TIME For Log analytics Time Window Incremental Run
            J.LAST_RUN_TIME AS REQUEST_EVENT_TIME,
            J.REQUEST_COMPLETE_TIME AS REQUEST_COMPLETE_TIME,
            J.REQUESTID       AS REQUESTID,
            J.PARENTREQUESTID AS PARENTREQUESTID,
            J.PROCESSGROUP AS PROCESSGROUP,
            J.SUBMITTER AS SUBMITTER,
            J.SCHEDULED AS SCHEDULED,
            J.JOB_DEDUPE_KEY AS JOB_DEDUPE_KEY,
            /* Most recent successful completion time */
            LS.LAST_SUCCESS_TIME   AS LAST_SUCCESS_TIMESTAMP,
            /* Runtime (minutes) of the most recent successful run */
            ROUND(LS.LAST_RUNTIME_MIN, 2)   AS LAST_RUNTIME_MINUTES,
            /* Runtime (minutes) of the previous successful run */
            ROUND(LS.PREV_RUNTIME_MIN, 2) AS PREVIOUS_RUNTIME_MINUTES,
            /* Indicates whether runtime has regressed compared to previous run */
            CASE
                WHEN LS.PREV_RUNTIME_MIN IS NULL THEN
                    'INSUFFICIENT_HISTORY'
                WHEN JS_HIST.SUCCEEDED_RUNS >= 2 AND LS.PREV_RUNTIME_MIN >= 1
                AND LS.LAST_RUNTIME_MIN > LS.PREV_RUNTIME_MIN * P.REGRESSION_FACTOR
                THEN
                    'RUNTIME_REGRESSION'
                ELSE
                    'NORMAL'
            END  AS RUNTIME_REGRESSION_STATUS,
            /* Average runtime over the last N days */
            ROUND(RB.AVG_RUNTIME_BASELINE_MIN, 2)     AS AVG_RUNTIME_BASELINE_MIN,
            /* 95th percentile runtime baseline over the last N days */
            ROUND(RB.P95_RUNTIME_BASELINE_MIN, 2)     AS P95_RUNTIME_BASELINE_MIN,
            /* Number of failed executions in the last 24 hours */
            NVL(FC.FAILED_24H_COUNT, 0)  AS FAILURES_LAST_24_HOURS,
            /* Number of failed executions in the last 7 days */
            NVL(FC.FAILED_7D_COUNT, 0)  AS FAILURES_LAST_7_DAYS,
            /* Count of currently running / blocked / cancelling jobs */
            NVL(CR.RUNNING_COUNT, 0)   AS CURRENTLY_RUNNING_COUNT,
            /* Maximum number of hours a job has been running continuously */
            ROUND(NVL(CR.RUNNING_HOURS, 0), 2) AS MAX_RUNNING_DURATION_HOURS,
            JS_WIN.TOTAL_RUNS AS TOTAL_RUNS,
            JS_WIN.failed_runs AS failed_runs,
            JS_WIN.succeeded_runs AS succeeded_runs, 
            JS_WIN.running_runs AS running_runs, 
            JS_WIN.blocked_runs AS blocked_runs, 
            JS_WIN.cancelled_runs AS cancelled_runs, 
            JS_WIN.warning_runs AS warning_runs,
            JS_WIN.queue_pressure AS queue_pressure,
            ROUND(JS_HIST.AVG_RUNTIME, 2) AS AVG_RUNTIME_MINUTES, 
            ROUND(JS_HIST.STDDEV_RUNTIME,2) AS STDDEV_RUNTIME_MINUTES, 
            /* Bayesian-smoothed probability of failure on the next run */
            ROUND((JS_HIST.FAILED_RUNS + 0.5 * JS_HIST.WARNING_RUNS) / (JS_HIST.TOTAL_RUNS), 2) AS PREDICTED_FAILURE_PROBABILITY,
            /* Probability that runtime exceeds historical SLA (p95 baseline) */
            ROUND(NVL(SP.SLA_BREACH_PROBABILITY, 0), 2) AS SLA_BREACH_PROBABILITY,
            /* Runtime volatility index (stddev / mean) */
            ROUND((JS_HIST.STDDEV_RUNTIME / NULLIF(JS_HIST.AVG_RUNTIME, 0)), 3)    AS RUNTIME_VOLATILITY_INDEX,
            /* Trend slope indicating increasing or decreasing runtime over time */
            ROUND(TC.RUNTIME_TREND_SLOPE, 3)  AS RUNTIME_TREND_SLOPE_PER_RUN,
            /* Probability that a currently running job is stuck */
            ROUND(NVL(ST.STUCK_PROBABILITY, 0), 3)  AS STUCK_JOB_PROBABILITY,
            /* Confidence level indicating reliability of job health signals */
            CASE
                WHEN JS_HIST.SUCCEEDED_RUNS < NVL(:P_MIN_SUCCESS_RUNS,3) THEN 'LOW'
                WHEN
                (
                LEAST(JS_HIST.TOTAL_RUNS / 10, 1) * 0.5
                + (1 - LEAST(JS_HIST.STDDEV_RUNTIME / NULLIF(JS_HIST.AVG_RUNTIME, 0), 1)) * 0.3
                +
                    CASE
                        WHEN TC.RUNTIME_TREND_SLOPE IS NOT NULL THEN
                            0.2
                        ELSE
                            0
                    END
                ) >= 0.75
                THEN
                    'HIGH'
                WHEN
                (
                LEAST(JS_HIST.TOTAL_RUNS / 10, 1) * 0.5
                + (1 - LEAST(JS_HIST.STDDEV_RUNTIME / NULLIF(JS_HIST.AVG_RUNTIME, 0), 1)) * 0.3
                +
                    CASE
                        WHEN TC.RUNTIME_TREND_SLOPE IS NOT NULL THEN
                            0.2
                        ELSE
                            0
                    END
                ) >= 0.40
                THEN
                    'MEDIUM'
                ELSE
                    'LOW'
            END    AS CONFIDENCE_LEVEL,
            ROUND(
            LEAST(JS_HIST.TOTAL_RUNS / 10, 1) * 0.5
            + (1 - LEAST(JS_HIST.STDDEV_RUNTIME / NULLIF(JS_HIST.AVG_RUNTIME, 0), 1)) * 0.3
            + CASE WHEN TC.RUNTIME_TREND_SLOPE IS NOT NULL THEN 0.2 ELSE 0 END
            , 3) AS CONFIDENCE_SCORE_WEIGHTED,
           /* Composite risk score combining failure, SLA, volatility and trend */
            ROUND(
                    0.35 * (
                        (JS_HIST.FAILED_RUNS + 0.5 * JS_HIST.WARNING_RUNS)
                        / NULLIF(JS_HIST.TOTAL_RUNS,0)
                    )
                    + 0.25 * NVL(SP.SLA_BREACH_PROBABILITY, 0)
                    + 0.15 * LEAST(JS_HIST.STDDEV_RUNTIME / NULLIF(JS_HIST.AVG_RUNTIME, 0), 1)
                    + 0.15 * LEAST(ABS(TC.RUNTIME_TREND_SLOPE), 1)
                    + 0.10 * NVL(ST.STUCK_PROBABILITY, 0),
                    3
                 ) AS JOB_RISK_SCORE,   
            SCH.schedule_start_time,
            SCH.schedule_end_time,
            SCH.schedule_frequency,
            SCH.schedule_interval,
            SCH.schedule_ended_flag,
            SCH.schedule_overdue_flag,
            SCH.high_frequency_schedule_flag,
            lrq.last_request_id,
            lsr.last_success_request_id,
            lfr.last_failed_request_id,
            lrr.longest_running_request_id,
            SCHCNT.ADHOC_COUNT     ADHOC_COUNT,
            SCHCNT.SCHEDULE_COUNT  SCHEDULE_COUNT
        FROM
           /* Regression sensitivity parameter */
            (
                SELECT
                    NVL(:P_REGRESSION_FACTOR, 1.5) AS REGRESSION_FACTOR
                FROM
                    DUAL
            ) P, 
             /* Job master list within monitoring window */
            (
                SELECT /*+ PARALLEL(rh 8,3) */ 
                    rh.DEFINITION AS job_key,
                    SUBSTR(rh.definition, INSTR(rh.definition,'/',-1)+1) AS job_display_name,
                    rh.APPLICATION,
                    rh.PRODUCT,
                    rh.TYPE,
                    rh.PROCESSSTART AS last_run_time,
                    rh.PROCESSEND  AS last_run_end_time,
                    rh.REQUESTEDSTART AS request_start_time,
                    rh.COMPLETEDTIME AS request_complete_time,
                    rh.REQUESTID,
                    rh.PARENTREQUESTID,
                    rh.PROCESSGROUP,
                    rh.SUBMITTER,
                    rh.SCHEDULED,
                    rh.DEFINITION||':'||REQUESTID JOB_DEDUPE_KEY
                FROM
                    ESS_REQUEST_HISTORY rh,
                    (
                        /* Anchor: latest PROCESSSTART per job */
                        SELECT
                            DEFINITION,
                            MAX(PROCESSSTART) AS max_processstart
                        FROM ESS_REQUEST_HISTORY
                        WHERE 1=1
                        AND PROCESSSTART >  TO_TIMESTAMP_TZ(:P_FROM_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                        AND PROCESSSTART < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                        GROUP BY DEFINITION
                    ) mx,
                    (
                        /* Tie-breaker: highest REQUESTID for that PROCESSSTART */
                        SELECT
                            DEFINITION,
                            PROCESSSTART,
                            MAX(REQUESTID) AS max_requestid
                        FROM
                            ESS_REQUEST_HISTORY
                        WHERE 1=1
                        AND PROCESSSTART >  TO_TIMESTAMP_TZ(:P_FROM_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                        AND PROCESSSTART < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                        GROUP BY DEFINITION, PROCESSSTART
                    ) mr
                WHERE 1=1
                AND rh.DEFINITION       = mx.DEFINITION
                AND rh.PROCESSSTART     = mx.max_processstart
                AND rh.DEFINITION       = mr.DEFINITION
                AND rh.PROCESSSTART     = mr.PROCESSSTART
                AND rh.REQUESTID        = mr.max_requestid
                AND mx.DEFINITION       = mr.DEFINITION
                AND mx.max_processstart = mr.PROCESSSTART
            ) J,               
            /* Last and previous successful runtimes */
            (
                SELECT
                    JOB_KEY,
                    MAX(
                        CASE
                            WHEN RN = 1 THEN
                                COMPLETEDTIME
                        END) AS LAST_SUCCESS_TIME,
                    MAX(
                        CASE
                            WHEN RN = 1 THEN
                                RUNTIME_MIN
                        END) AS LAST_RUNTIME_MIN,
                    MAX(
                        CASE
                            WHEN RN = 2 THEN
                                RUNTIME_MIN
                        END) AS PREV_RUNTIME_MIN
                FROM
                    (
                        SELECT
                            DEFINITION                                                               AS JOB_KEY,
                            COMPLETEDTIME,
                            (CAST(NVL(PROCESSEND, SYSTIMESTAMP) AS DATE) - CAST(PROCESSSTART AS DATE)) * 1440 AS RUNTIME_MIN,
                            ROW_NUMBER() OVER ( PARTITION BY DEFINITION ORDER BY COMPLETEDTIME DESC ) RN
                        FROM
                            ESS_REQUEST_HISTORY
                        WHERE
                        STATE IN (12,17)
                        ---- HISTORICAL CONTEXT (prediction, baselines)
                        AND PROCESSSTART <=TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                    )
                GROUP BY
                    JOB_KEY
            ) LS,
            /* Runtime baseline statistics,HISTORICAL CONTEXT (prediction, baselines) */
            (
                SELECT
                    DEFINITION  AS JOB_KEY,
                    AVG((CAST(NVL(PROCESSEND, SYSTIMESTAMP) AS DATE) - CAST(PROCESSSTART AS DATE)) * 1440)   AS AVG_RUNTIME_BASELINE_MIN,
                    PERCENTILE_CONT(0.95) WITHIN GROUP ( ORDER BY (CAST(NVL(PROCESSEND, SYSTIMESTAMP) AS DATE) - CAST(PROCESSSTART AS DATE)) * 1440 ) AS P95_RUNTIME_BASELINE_MIN
                FROM
                    ESS_REQUEST_HISTORY
                WHERE
                    STATE IN (12,17)
                    -- HISTORICAL CONTEXT (prediction, baselines)
                     AND PROCESSSTART <=TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                GROUP BY
                    DEFINITION
            ) RB,
            /* Failure counts – rolling windows */
            (
                SELECT
                    DEFINITION AS JOB_KEY,
                    /* Failures in last 24 hours */
                    SUM(
                        CASE
                            WHEN PROCESSSTART  >= TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - NUMTODSINTERVAL(1, 'DAY')
                            AND PROCESSSTART   < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                            THEN 1
                            ELSE 0
                        END
                    ) AS FAILED_24H_COUNT,

                    /* Failures in last 7 days */
                    SUM(
                        CASE
                            WHEN PROCESSSTART  >= TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - NUMTODSINTERVAL(7, 'DAY')
                            AND PROCESSSTART   < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                            THEN 1
                            ELSE 0
                        END
                    ) AS FAILED_7D_COUNT

                FROM ESS_REQUEST_HISTORY
                WHERE STATE IN (10,18,19)
                AND PROCESSSTART  >= TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - NUMTODSINTERVAL(7,'DAY')
                 AND PROCESSSTART  < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                GROUP BY DEFINITION
            ) FC
            ,
           /* Currently running job metrics */
            (
                SELECT
                    DEFINITION AS JOB_KEY,
                    COUNT(1) AS RUNNING_COUNT,

                    /* Runtime in HOURS as of snapshot time */
                    MAX(
                        (
                            EXTRACT(DAY FROM
                                (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)
                            ) * 24
                        + EXTRACT(HOUR FROM
                                (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)
                            )
                        + EXTRACT(MINUTE FROM
                                (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)
                            ) / 60
                        )
                    ) AS RUNNING_HOURS
                FROM
                    ESS_REQUEST_HISTORY
                WHERE
                    STATE IN (3, 5, 7)
                    -- HISTORICAL CONTEXT (prediction, baselines)
                    AND PROCESSSTART <= TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                GROUP BY
                    DEFINITION
            ) CR,
            /* Job execution statistics, HISTORICAL CONTEXT (prediction, baselines) */
            (
            SELECT
                DEFINITION AS job_key,

                /* Total executions in time window */
                COUNT(1) AS total_runs,

                /* Failure counts */
                SUM(CASE WHEN state IN (10,15,18,19) THEN 1 ELSE 0 END) AS failed_runs,

                /* Success count (explicit, useful for APIs and dashboards) */
                SUM(CASE WHEN state IN (12,17) THEN 1 ELSE 0 END) AS succeeded_runs,

                /* Currently running */
                SUM(CASE WHEN state = 3 THEN 1 ELSE 0 END) AS running_runs,

                /* Blocked jobs */
                SUM(CASE WHEN state = 5 THEN 1 ELSE 0 END) AS blocked_runs,

                /* Cancelled */
                SUM(CASE WHEN state IN (7,9) THEN 1 ELSE 0 END) AS cancelled_runs,

                /* Warning */
                SUM(CASE WHEN state IN (4,11) THEN 1 ELSE 0 END) AS warning_runs,

                /* Queue Pressure */
                SUM(CASE WHEN state IN (1,2,6,13,14,20) THEN 1 ELSE 0 END) queue_pressure,                 

                /* Runtime metrics – ONLY from successful runs */
                AVG(
                    CASE
                        WHEN state IN (12,17) THEN
                            (CAST(NVL(processend, SYSTIMESTAMP) AS DATE)
                        - CAST(processstart AS DATE)) * 1440
                    END
                ) AS avg_runtime,

                STDDEV(
                    CASE
                        WHEN state IN (12,17) THEN
                            (CAST(NVL(processend, SYSTIMESTAMP) AS DATE)
                        - CAST(processstart AS DATE)) * 1440
                    END
                ) AS stddev_runtime
            FROM ess_request_history
            WHERE 1=1 
            AND PROCESSSTART < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
            GROUP BY
                DEFINITION
            ) JS_HIST
            ,
            /* Job execution statistics, windowed execution stats */
            (
            SELECT
                DEFINITION AS job_key,

                /* Total executions in time window */
                COUNT(1) AS total_runs,

                /* Failure counts */
                SUM(CASE WHEN state IN (10,15,18,19) THEN 1 ELSE 0 END) AS failed_runs,

                /* Success count (explicit, useful for APIs and dashboards) */
                SUM(CASE WHEN state IN (12,17) THEN 1 ELSE 0 END) AS succeeded_runs,

                /* Currently running */
                SUM(CASE WHEN state = 3 THEN 1 ELSE 0 END) AS running_runs,

                /* Blocked jobs */
                SUM(CASE WHEN state = 5 THEN 1 ELSE 0 END) AS blocked_runs,

                /* Cancelled */
                SUM(CASE WHEN state IN (7,9) THEN 1 ELSE 0 END) AS cancelled_runs,

                /* Warning */
                SUM(CASE WHEN state IN (11) THEN 1 ELSE 0 END) AS warning_runs,   

                /* Queue Pressure */
                SUM(CASE WHEN state IN (1,2,6,13,14,20) THEN 1 ELSE 0 END) queue_pressure    

            FROM ess_request_history
            WHERE 1=1 
            -- Get Execution Count in Time Bracket.
            AND PROCESSSTART  >=  TO_TIMESTAMP_TZ(:P_FROM_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
            AND PROCESSSTART  < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
            GROUP BY
                DEFINITION
            ) JS_WIN            
            ,
           /* SLA breach probability using analytic percentile - HISTORICAL CONTEXT (prediction, baselines) */
            (
                SELECT
                    JOB_KEY,    
                    SUM(
                        CASE
                            WHEN RUNTIME_MIN > P95_RUNTIME THEN
                                1
                            ELSE
                                0
                        END) / COUNT(1) AS SLA_BREACH_PROBABILITY
                FROM
                    (
                        SELECT
                            JOB_KEY,
                            RUNTIME_MIN,
                            PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY RUNTIME_MIN) OVER (PARTITION BY JOB_KEY) AS P95_RUNTIME
                        FROM
                            (
                                SELECT
                                    DEFINITION    AS JOB_KEY,
                                    (CAST(NVL(PROCESSEND, SYSTIMESTAMP) AS DATE) - CAST(PROCESSSTART AS DATE)) * 1440 AS RUNTIME_MIN
                                FROM
                                    ESS_REQUEST_HISTORY
                                WHERE
                                    STATE IN (12,17,11)
                                -- HISTORICAL CONTEXT (prediction, baselines)
                                AND PROCESSSTART < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                            )
                    )
                GROUP BY
                    JOB_KEY
            ) SP,
           /* Runtime trend (regression slope), HISTORICAL CONTEXT (prediction, baselines) */
            (
                SELECT
                    JOB_KEY,
                    REGR_SLOPE(RUNTIME_MIN, RUN_SEQ) AS RUNTIME_TREND_SLOPE
                FROM
                    (
                        SELECT
                            DEFINITION   AS JOB_KEY,
                            (CAST(NVL(PROCESSEND, SYSTIMESTAMP) AS DATE) - CAST(PROCESSSTART AS DATE)) * 1440 AS RUNTIME_MIN,
                            ROW_NUMBER() OVER ( PARTITION BY DEFINITION ORDER BY PROCESSSTART ) RUN_SEQ
                        FROM
                            ESS_REQUEST_HISTORY
                        WHERE
                            STATE IN (12,17)
                            -- HISTORICAL CONTEXT (prediction, baselines)
                            AND PROCESSSTART < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                    )
                GROUP BY
                    JOB_KEY
            ) TC
            ,
            /* ---------------------------------------------
            --Probability of a job being stuck
            --Normalized runtime overrun model
            --------------------------------------------- */
            (
            SELECT
                R.JOB_KEY,

                CASE
                    /* No historical baseline → cannot infer */
                    WHEN D.P95 IS NULL OR D.P50 IS NULL THEN 0

                    /* Runtime has not crossed P95 yet */
                    WHEN R.RUNTIME_MIN <= D.P95 THEN 0

                    /* Normalized overrun probability */
                    ELSE
                        ROUND(
                            LEAST(
                                1,
                                (R.RUNTIME_MIN - D.P95)
                                / NULLIF(
                                    GREATEST(D.P95 - D.P50, D.P50 * 0.25),
                                    0
                                )
                            ),
                            3
                        )
                END AS STUCK_PROBABILITY

            FROM
                /* Current running jobs */
                (
                    SELECT
                        DEFINITION AS JOB_KEY,
                        (
                            EXTRACT(DAY FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)) * 1440
                        + EXTRACT(HOUR FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)) * 60
                        + EXTRACT(MINUTE FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART))
                        + EXTRACT(SECOND FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)) / 60
                        ) AS RUNTIME_MIN
                    FROM ESS_REQUEST_HISTORY
                    WHERE STATE IN (3,5,7)   -- Running / Blocked / Cancelling
                    AND PROCESSSTART <= TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                ) R,

                /* Historical successful runtime distribution */
                (
                    SELECT
                        DEFINITION AS JOB_KEY,

                        PERCENTILE_CONT(0.50) WITHIN GROUP (
                            ORDER BY
                                (CAST(NVL(PROCESSEND, SYSTIMESTAMP) AS DATE)
                            - CAST(PROCESSSTART AS DATE)) * 1440
                        ) AS P50,

                        PERCENTILE_CONT(0.95) WITHIN GROUP (
                            ORDER BY
                                (CAST(NVL(PROCESSEND, SYSTIMESTAMP) AS DATE)
                            - CAST(PROCESSSTART AS DATE)) * 1440
                        ) AS P95

                    FROM ESS_REQUEST_HISTORY
                    WHERE STATE IN (12,17)  -- Successful runs only
                    AND PROCESSSTART <= TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                    GROUP BY DEFINITION
                ) D

            WHERE R.JOB_KEY = D.JOB_KEY
            ) ST,
            (
                SELECT
                    job_key,
                    schedule_start_time,
                    schedule_end_time,
                    schedule_frequency,
                    schedule_interval,
                    schedule_ended_flag,
                    schedule_overdue_flag,
                    high_frequency_schedule_flag
                FROM (
                    SELECT
                        /* Job key – consistent derivation */
                        rh.DEFINITION AS job_key,

                        /* Schedule start */
                        CASE
                        WHEN REGEXP_LIKE(
                            SUBSTR(
                            REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                            INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<start>') + 7,
                            30
                            ),
                            '^\d{4}-\d{2}-\d{2}'
                        )
                        THEN
                            TO_TIMESTAMP(
                            REPLACE(
                                SUBSTR(
                                REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<start>') + 7,
                                19
                                ),
                                'T', ' '
                            ),
                            'YYYY-MM-DD HH24:MI:SS'
                            )
                        END AS schedule_start_time,

                        /* Schedule End Time */

                        CASE
                        WHEN REGEXP_LIKE(
                            SUBSTR(
                            REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                            INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<end>') + 5,
                            30
                            ),
                            '^\d{4}-\d{2}-\d{2}'
                        )
                        THEN
                            TO_TIMESTAMP(
                            REPLACE(
                                SUBSTR(
                                REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<end>') + 5,
                                19
                                ),
                                'T', ' '
                            ),
                            'YYYY-MM-DD HH24:MI:SS'
                            )
                        END AS schedule_end_time,


                        /* Frequency */
                        SUBSTR(
                            REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                            INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<frequency>')
                                + LENGTH('<frequency>'),
                            INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '</frequency>')
                                - INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<frequency>')
                                - LENGTH('<frequency>')
                        ) AS schedule_frequency,

                        /* Interval */
                        TO_NUMBER(
                            SUBSTR(
                                REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<interval>')
                                    + LENGTH('<interval>'),
                                INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '</interval>')
                                    - INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<interval>')
                                    - LENGTH('<interval>')
                            )
                        ) AS schedule_interval,

                        /* Ended flag */
                        CASE
                            WHEN rh.state IN (16, 17) THEN 'Y'
                            ELSE 'N'
                        END AS schedule_ended_flag,

                        /* Overdue flag */
                        CASE
                        WHEN
                            REGEXP_LIKE(
                            SUBSTR(
                                REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<end>') + 5,
                                30
                            ),
                            '^\d{4}-\d{2}-\d{2}'
                            )
                            AND
                            TO_TIMESTAMP_TZ(
                            REPLACE(
                                SUBSTR(
                                REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<end>') + 5,
                                19
                                ),
                                'T', ' '
                            ) || ' +00:00',
                            'YYYY-MM-DD HH24:MI:SS TZH:TZM'
                            )
                            <
                            TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
                            AND rh.state NOT IN (16,17)
                        THEN 'Y'
                        ELSE 'N'
                        END AS schedule_overdue_flag,

                        /* High-frequency flag */
                        CASE
                            WHEN
                                SUBSTR(
                                    REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                    INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<frequency>')
                                        + LENGTH('<frequency>'),
                                    INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '</frequency>')
                                        - INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<frequency>')
                                        - LENGTH('<frequency>')
                                ) LIKE '%MINUTELY%'
                            OR
                                (
                                    SUBSTR(
                                        REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                        INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<frequency>')
                                            + LENGTH('<frequency>'),
                                        INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '</frequency>')
                                            - INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<frequency>')
                                            - LENGTH('<frequency>')
                                    ) LIKE '%HOURLY%'
                                    AND TO_NUMBER(
                                        SUBSTR(
                                            REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''),
                                            INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<interval>')
                                                + LENGTH('<interval>'),
                                            INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '</interval>')
                                                - INSTR(REPLACE(utl_raw.cast_to_varchar2(rh.adhocschedule), chr(0), ''), '<interval>')
                                                - LENGTH('<interval>')
                                        )
                                    ) <= 60
                                )
                            THEN 'Y'
                            ELSE 'N'
                        END AS high_frequency_schedule_flag,

                        /* Pick latest schedule per job */
                        ROW_NUMBER() OVER (
                            PARTITION BY rh.DEFINITION
                            ORDER BY rh.requestedstart DESC
                        ) rn

                    FROM fusion_ora_ess.request_history rh
                    WHERE rh.parentrequestid = -1
                    AND rh.requesttype     = 2
                    AND rh.adhocschedule   IS NOT NULL
                )
                WHERE rn = 1
            ) SCH
            ,
            (
            SELECT
                definition AS job_key,
                MAX(requestid) AS last_request_id
            FROM ess_request_history
            WHERE 1=1
            -- HISTORICAL CONTEXT (prediction, baselines)
            AND PROCESSSTART <=TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
            GROUP BY definition
            ) lrq,
            (
            SELECT
                job_key,
                MAX(requestid) AS last_success_request_id
            FROM (
                SELECT
                definition AS job_key,
                requestid,
                completedtime
                FROM ess_request_history
                WHERE state IN (12,17)
                -- HISTORICAL CONTEXT (prediction, baselines)
                AND PROCESSSTART <=TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')             
            )
            GROUP BY job_key
            ) lsr,
            (
            SELECT
                job_key,
                MAX(requestid) AS last_failed_request_id
            FROM (
                SELECT
                definition AS job_key,
                requestid,
                processstart
                FROM ess_request_history
                WHERE state IN (10,18,19)
                -- HISTORICAL CONTEXT (prediction, baselines)
                AND PROCESSSTART <=TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')          
            )
            GROUP BY job_key
            ) lfr,
            (
            SELECT
                job_key,
                MAX(requestid) KEEP (DENSE_RANK FIRST ORDER BY runtime_minutes DESC) AS longest_running_request_id
            FROM (
                SELECT
                definition AS job_key,
                requestid,
                (
                EXTRACT(DAY    FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)) * 1440
                + EXTRACT(HOUR   FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)) * 60
                + EXTRACT(MINUTE FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART))
                + EXTRACT(SECOND FROM (TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3') - PROCESSSTART)) / 60
                ) AS runtime_minutes
                FROM ess_request_history
                WHERE state IN (3,5,7)
                -- HISTORICAL CONTEXT (prediction, baselines)
                AND PROCESSSTART <=TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')         
            )
            GROUP BY job_key
            ) lrr,
            (
            SELECT erh.definition AS job_key,
                COUNT(CASE WHEN rh.adhocschedule IS NULL THEN 1 END)     AS ADHOC_COUNT,
                COUNT(CASE WHEN rh.adhocschedule IS NOT NULL THEN 1 END) AS SCHEDULE_COUNT
            FROM fusion_ora_ess.request_history rh,
                 ess_request_history erh
            WHERE erh.requestid = rh.requestid
            -- Get Counts for provided ranges.
            AND erh.PROCESSSTART > =  TO_TIMESTAMP_TZ(:P_FROM_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
            AND erh.PROCESSSTART < TO_TIMESTAMP_TZ(:P_TO_TIMESTAMP,'YYYY-MM-DD"T"HH24:MI:SS.FF3')
            GROUP BY erh.definition
            ) SCHCNT
        WHERE
            1=1
            AND J.JOB_KEY = LS.JOB_KEY(+)
            AND J.JOB_KEY = RB.JOB_KEY(+)
            AND J.JOB_KEY = FC.JOB_KEY(+)
            AND J.JOB_KEY = CR.JOB_KEY(+)
            AND J.JOB_KEY = JS_HIST.JOB_KEY(+)
            AND J.JOB_KEY = JS_WIN.JOB_KEY(+)
            AND J.JOB_KEY = SP.JOB_KEY(+)
            AND J.JOB_KEY = TC.JOB_KEY(+)
            AND J.JOB_KEY = ST.JOB_KEY(+)
            AND j.job_key = SCH.job_key(+)
            AND j.job_key = lrq.job_key(+)
            AND j.job_key = lsr.job_key(+)
            AND j.job_key = lfr.job_key(+)
            AND j.job_key = lrr.job_key(+)
            AND j.job_key = SCHCNT.job_key(+)
    ) Q  
ORDER BY Q.REQUEST_EVENT_TIME DESC

BI Report XSL Template

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">
  <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
  <xsl:template match="/">
    <DATA_DS xmlns="http://xmlns.oracle.com/cloud/adapter/nxsd/surrogate/readstagefilesfromSOAP">
      <!-- Header (single occurrence) -->
      <HEADER>
        <META_FROM_TIMESTAMP>
          <xsl:value-of select="/DATA_DS/G_1[1]/META_FROM_TIMESTAMP"/>
        </META_FROM_TIMESTAMP>
        <META_TO_TIMESTAMP>
          <xsl:value-of select="/DATA_DS/G_1[1]/META_TO_TIMESTAMP"/>
        </META_TO_TIMESTAMP>
        <META_GENERATED_AT>
          <xsl:value-of select="/DATA_DS/G_1[1]/META_GENERATED_AT"/>
        </META_GENERATED_AT>
        <META_RECORD_COUNT>
          <xsl:value-of select="/DATA_DS/G_1[1]/META_RECORD_COUNT"/>
        </META_RECORD_COUNT>
      </HEADER>
      <!-- Rows -->
      <ROWS>
        <xsl:for-each select="/DATA_DS/G_1">
          <G1>
            <JOB_IDENTIFIER><xsl:value-of select="JOB_IDENTIFIER"/></JOB_IDENTIFIER>
            <JOB_NAME><xsl:value-of select="JOB_NAME"/></JOB_NAME>
            <APPLICATION><xsl:value-of select="APPLICATION"/></APPLICATION>
            <PRODUCT><xsl:value-of select="PRODUCT"/></PRODUCT>
            <REQUEST_TYPE><xsl:value-of select="REQUEST_TYPE"/></REQUEST_TYPE>
            <LAST_RUN_TIMESTAMP><xsl:value-of select="LAST_RUN_TIMESTAMP"/></LAST_RUN_TIMESTAMP>
            <LAST_RUN_END_TIME><xsl:value-of select="LAST_RUN_END_TIME"/></LAST_RUN_END_TIME>
            <REQUEST_START_TIME><xsl:value-of select="REQUEST_START_TIME"/></REQUEST_START_TIME>
            <REQUEST_EVENT_TIME><xsl:value-of select="REQUEST_EVENT_TIME"/></REQUEST_EVENT_TIME>
            <REQUEST_COMPLETE_TIME><xsl:value-of select="REQUEST_COMPLETE_TIME"/></REQUEST_COMPLETE_TIME>
            <REQUESTID><xsl:value-of select="REQUESTID"/></REQUESTID>
            <PARENTREQUESTID><xsl:value-of select="PARENTREQUESTID"/></PARENTREQUESTID>
            <PROCESSGROUP><xsl:value-of select="PROCESSGROUP"/></PROCESSGROUP>
            <SUBMITTER><xsl:value-of select="SUBMITTER"/></SUBMITTER>
            <SCHEDULED><xsl:value-of select="SCHEDULED"/></SCHEDULED>
            <LAST_SUCCESS_TIMESTAMP><xsl:value-of select="LAST_SUCCESS_TIMESTAMP"/></LAST_SUCCESS_TIMESTAMP>
            <LAST_RUNTIME_MINUTES><xsl:value-of select="LAST_RUNTIME_MINUTES"/></LAST_RUNTIME_MINUTES>
            <PREVIOUS_RUNTIME_MINUTES><xsl:value-of select="PREVIOUS_RUNTIME_MINUTES"/></PREVIOUS_RUNTIME_MINUTES>
            <RUNTIME_REGRESSION_STATUS><xsl:value-of select="RUNTIME_REGRESSION_STATUS"/></RUNTIME_REGRESSION_STATUS>
            <AVG_RUNTIME_BASELINE_MIN><xsl:value-of select="AVG_RUNTIME_BASELINE_MIN"/></AVG_RUNTIME_BASELINE_MIN>
            <P95_RUNTIME_BASELINE_MIN><xsl:value-of select="P95_RUNTIME_BASELINE_MIN"/></P95_RUNTIME_BASELINE_MIN>
            <FAILURES_LAST_24_HOURS><xsl:value-of select="FAILURES_LAST_24_HOURS"/></FAILURES_LAST_24_HOURS>
            <FAILURES_LAST_7_DAYS><xsl:value-of select="FAILURES_LAST_7_DAYS"/></FAILURES_LAST_7_DAYS>
            <CURRENTLY_RUNNING_COUNT><xsl:value-of select="CURRENTLY_RUNNING_COUNT"/></CURRENTLY_RUNNING_COUNT>
            <MAX_RUNNING_DURATION_HOURS><xsl:value-of select="MAX_RUNNING_DURATION_HOURS"/></MAX_RUNNING_DURATION_HOURS>
            <TOTAL_RUNS><xsl:value-of select="TOTAL_RUNS"/></TOTAL_RUNS>
            <FAILED_RUNS><xsl:value-of select="FAILED_RUNS"/></FAILED_RUNS>
            <SUCCEEDED_RUNS><xsl:value-of select="SUCCEEDED_RUNS"/></SUCCEEDED_RUNS>
            <RUNNING_RUNS><xsl:value-of select="RUNNING_RUNS"/></RUNNING_RUNS>
            <BLOCKED_RUNS><xsl:value-of select="BLOCKED_RUNS"/></BLOCKED_RUNS>
            <CANCELLED_RUNS><xsl:value-of select="CANCELLED_RUNS"/></CANCELLED_RUNS>
            <WARNING_RUNS><xsl:value-of select="WARNING_RUNS"/></WARNING_RUNS>
            <ADHOC_COUNT><xsl:value-of select="ADHOC_COUNT"/></ADHOC_COUNT>
            <SCHEDULE_COUNT><xsl:value-of select="SCHEDULE_COUNT"/></SCHEDULE_COUNT>
            <QUEUE_PRESSURE><xsl:value-of select="QUEUE_PRESSURE"/></QUEUE_PRESSURE>
            <AVG_RUNTIME_MINUTES><xsl:value-of select="AVG_RUNTIME_MINUTES"/></AVG_RUNTIME_MINUTES>
            <STDDEV_RUNTIME_MINUTES><xsl:value-of select="STDDEV_RUNTIME_MINUTES"/></STDDEV_RUNTIME_MINUTES>
            <PREDICTED_FAILURE_PROBABILITY><xsl:value-of select="PREDICTED_FAILURE_PROBABILITY"/></PREDICTED_FAILURE_PROBABILITY>
            <SLA_BREACH_PROBABILITY><xsl:value-of select="SLA_BREACH_PROBABILITY"/></SLA_BREACH_PROBABILITY>
            <RUNTIME_VOLATILITY_INDEX><xsl:value-of select="RUNTIME_VOLATILITY_INDEX"/></RUNTIME_VOLATILITY_INDEX>
            <RUNTIME_TREND_SLOPE_PER_RUN><xsl:value-of select="RUNTIME_TREND_SLOPE_PER_RUN"/></RUNTIME_TREND_SLOPE_PER_RUN>
            <STUCK_JOB_PROBABILITY><xsl:value-of select="STUCK_JOB_PROBABILITY"/></STUCK_JOB_PROBABILITY>
            <CONFIDENCE_LEVEL><xsl:value-of select="CONFIDENCE_LEVEL"/></CONFIDENCE_LEVEL>
            <CONFIDENCE_SCORE_WEIGHTED><xsl:value-of select="CONFIDENCE_SCORE_WEIGHTED"/></CONFIDENCE_SCORE_WEIGHTED>
            <JOB_RISK_SCORE><xsl:value-of select="JOB_RISK_SCORE"/></JOB_RISK_SCORE>
            <SCHEDULE_START_TIME><xsl:value-of select="SCHEDULE_START_TIME"/></SCHEDULE_START_TIME>
            <SCHEDULE_END_TIME><xsl:value-of select="SCHEDULE_END_TIME"/></SCHEDULE_END_TIME>
            <SCHEDULE_FREQUENCY><xsl:value-of select="SCHEDULE_FREQUENCY"/></SCHEDULE_FREQUENCY>
            <SCHEDULE_INTERVAL><xsl:value-of select="SCHEDULE_INTERVAL"/></SCHEDULE_INTERVAL>
            <SCHEDULE_ENDED_FLAG><xsl:value-of select="SCHEDULE_ENDED_FLAG"/></SCHEDULE_ENDED_FLAG>
            <SCHEDULE_OVERDUE_FLAG><xsl:value-of select="SCHEDULE_OVERDUE_FLAG"/></SCHEDULE_OVERDUE_FLAG>
            <HIGH_FREQUENCY_SCHEDULE_FLAG><xsl:value-of select="HIGH_FREQUENCY_SCHEDULE_FLAG"/></HIGH_FREQUENCY_SCHEDULE_FLAG>
            <LAST_REQUEST_ID><xsl:value-of select="LAST_REQUEST_ID"/></LAST_REQUEST_ID>
            <LAST_SUCCESS_REQUEST_ID><xsl:value-of select="LAST_SUCCESS_REQUEST_ID"/></LAST_SUCCESS_REQUEST_ID>
            <LAST_FAILED_REQUEST_ID><xsl:value-of select="LAST_FAILED_REQUEST_ID"/></LAST_FAILED_REQUEST_ID>
            <LONGEST_RUNNING_REQUEST_ID><xsl:value-of select="LONGEST_RUNNING_REQUEST_ID"/></LONGEST_RUNNING_REQUEST_ID>
            <ALERT_FLAG><xsl:value-of select="ALERT_FLAG"/></ALERT_FLAG>
            <ALERT_SEVERITY><xsl:value-of select="ALERT_SEVERITY"/></ALERT_SEVERITY>
            <ALERT_REASON><xsl:value-of select="ALERT_REASON"/></ALERT_REASON>
            <ATTRIBUTE_NAME1><xsl:value-of select="ATTRIBUTE_NAME1"/></ATTRIBUTE_NAME1>
            <ATTRIBUTE_VAL1><xsl:value-of select="ATTRIBUTE_VAL1"/></ATTRIBUTE_VAL1>
            <ATTRIBUTE_NAME2><xsl:value-of select="ATTRIBUTE_NAME2"/></ATTRIBUTE_NAME2>
            <ATTRIBUTE_VAL2><xsl:value-of select="ATTRIBUTE_VAL2"/></ATTRIBUTE_VAL2>
            <ATTRIBUTE_NAME3><xsl:value-of select="ATTRIBUTE_NAME3"/></ATTRIBUTE_NAME3>
            <ATTRIBUTE_VAL3><xsl:value-of select="ATTRIBUTE_VAL3"/></ATTRIBUTE_VAL3>
            <ATTRIBUTE_NAME4><xsl:value-of select="ATTRIBUTE_NAME4"/></ATTRIBUTE_NAME4>
            <ATTRIBUTE_VAL4><xsl:value-of select="ATTRIBUTE_VAL4"/></ATTRIBUTE_VAL4>
            <ATTRIBUTE_NAME5><xsl:value-of select="ATTRIBUTE_NAME5"/></ATTRIBUTE_NAME5>
            <ATTRIBUTE_VAL5><xsl:value-of select="ATTRIBUTE_VAL5"/></ATTRIBUTE_VAL5>                                                         
          </G1>
        </xsl:for-each>
      </ROWS>
    </DATA_DS>
  </xsl:template>
</xsl:stylesheet>

Request JSON Payload

{
  "P_FROM_TIMESTAMP": "2026-01-12T06:10:38.808",
  "P_TO_TIMESTAMP": "2026-01-12T06:11:49.771",
  "P_REGRESSION_FACTOR": 1.5,
  "P_MIN_SUCCESS_RUNS": 3,
  "P_STUCK_JOB_PROBABILITY": 0.8,
  "P_JOB_RISK_SCORE": 0.6,
  "P_SLA_BREACH_PROBABILITY": 0.4
}

Response JSON Payload

{
    "meta": {
        "fromTimestamp": "2025-12-03T05:51:20Z",
        "toTimestamp": "2026-01-02T06:07:20Z",
        "generatedAt": "2026-01-03T16:13:45Z",
        "recordCount": 30
    },
    "jobs": [
        {
            "jobIdentifier": "JobDefinition://oracle.apps.ess.biccc/BICloudConnectorJobDefinition",
            "jobName": "BICloudConnectorJobDefinition",
            "application": "oracle.biacm",
            "product": "",
            "requestType": "JAVA_TYPE",
            "lastRunTimestamp": "2026-01-02T03:01:56.597+00:00",
            "lastRunEndTime": "2026-01-02T03:02:01.186+00:00",
            "requestStartTime": "2026-01-02T03:01:53.803+00:00",
            "requestEventTime": "2026-01-02T03:01:53.803+00:00",
            "requestCompleteTime": "2026-01-02T03:02:05.636+00:00",
            "scheduled": "2026-01-02T03:01:53.803+00:00",
            "lastSuccessTimestamp": "2026-01-02T03:02:05.636+00:00",
            "runtime": {
                "lastRuntimeMinutes": 0.08,
                "previousRuntimeMinutes": 0.12,
                "runtimeRegressionStatus": "NORMAL",
                "avgRuntimeBaselineMinutes": 0.47,
                "p95RuntimeBaselineMinutes": 3.38,
                "runtimeVolatilityIndex": 3.987,
                "runtimeTrendSlopePerRun": -0.001
            },
            "executionStats": {
                "totalRuns": 1260,
                "succeededRuns": 1243,
                "failedRuns": 17,
                "runningRuns": 0,
                "blockedRuns": 0,
                "cancelledRuns": 0,
                "warningRuns":0,
                "avgRuntimeMinutes": 0.58,
                "stddevRuntimeMinutes": 2.32,
                "adhocCount": 0,
                "scheduledCount": 0,
                "queuePressure": 0
            },
            "failureMetrics": {
                "failuresLast24Hours": 0,
                "failuresLast7Days": 0,
                "predictedFailureProbability": 0.01
            },
            "slaMetrics": {
                "slaBreachProbability": 0.05
            },
            "runningState": {
                "currentlyRunningCount": 0,
                "maxRunningDurationHours": 0,
                "stuckJobProbability": 0,
                "longestRunningRequestId": null
            },
            "confidence": {
                "confidenceLevel": "MEDIUM",
                "confidenceScoreWeighted": 0.7
            },
            "risk": {
                "jobRiskScore": 0.167
            },
            "schedule": {
                "scheduleStartTime": "",
                "scheduleEndTime": "",
                "scheduleFrequency": "",
                "scheduleInterval": null,
                "scheduleEndedFlag": "",
                "scheduleOverdueFlag": "",
                "highFrequencyScheduleFlag": ""
            },
            "requests": {
                "lastRequestId": 6017055,
                "lastSuccessRequestId": 6017055,
                "lastFailedRequestId": 5924078,
                "RequestId": 6017055,
                "parentRequestId": 601789,                
                "processGroup": "FADomain##bi_cluster",
                "submitter": "FAWService_APPID"
            },
            "alert": {
                "alertFlag": "N",
                "alertSeverity": "LOW",
                "alertReason": ""
            },
            "additionalInfo": {
                "attributeName1": "",
                "attributeVal1": "",
                "attributeName2": "",
                "attributeVal2": "",
                "attributeName3": "",
                "attributeVal3": "",
                "attributeName4": "",
                "attributeVal4": "",
                "attributeName5": "",
                "attributeVal5": ""
            }
        }
    ]
}

Custom XSD Schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://xmlns.oracle.com/cloud/adapter/nxsd/surrogate/readstagefilesfromSOAP"
  xmlns:ds="http://xmlns.oracle.com/cloud/adapter/nxsd/surrogate/readstagefilesfromSOAP"
  elementFormDefault="qualified">
  <!-- Root -->
  <xs:element name="DATA_DS">
    <xs:complexType>
      <xs:sequence>
        <!-- Header (single) -->
        <xs:element name="HEADER">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="META_FROM_TIMESTAMP" type="xs:string"/>
              <xs:element name="META_TO_TIMESTAMP" type="xs:string"/>
              <xs:element name="META_GENERATED_AT" type="xs:string"/>
              <xs:element name="META_RECORD_COUNT" type="xs:integer"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <!-- Rows -->
        <xs:element name="ROWS">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="G1" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:sequence>
                    <!-- Job Identity -->
                    <xs:element name="JOB_IDENTIFIER" type="xs:string"/>
                    <xs:element name="JOB_NAME" type="xs:string"/>
                    <xs:element name="APPLICATION" type="xs:string"/>
                    <xs:element name="PRODUCT" type="xs:string"/>
                    <xs:element name="REQUEST_TYPE" type="xs:string"/>
                    <!-- Job master request list -->
                    <xs:element name="LAST_RUN_TIMESTAMP" type="xs:string" minOccurs="0"/>
                    <xs:element name="LAST_RUN_END_TIME" type="xs:string" minOccurs="0"/>
                    <xs:element name="REQUEST_START_TIME" type="xs:string" minOccurs="0"/>
                    <xs:element name="REQUEST_EVENT_TIME" type="xs:string" minOccurs="0"/>
                    <xs:element name="REQUEST_COMPLETE_TIME" type="xs:string" minOccurs="0"/>
                    <xs:element name="REQUESTID" type="xs:string" minOccurs="0"/>
                    <xs:element name="PARENTREQUESTID" type="xs:string" minOccurs="0"/>
                    <xs:element name="PROCESSGROUP" type="xs:string" minOccurs="0"/>
                    <xs:element name="SUBMITTER" type="xs:string" minOccurs="0"/>
                    <xs:element name="SCHEDULED" type="xs:string" minOccurs="0"/>
                    <xs:element name="LAST_SUCCESS_TIMESTAMP" type="xs:string" minOccurs="0"/>
                    <!-- Runtime -->
                    <xs:element name="LAST_RUNTIME_MINUTES" type="xs:decimal" minOccurs="0"/>
                    <xs:element name="PREVIOUS_RUNTIME_MINUTES" type="xs:decimal" minOccurs="0"/>
                    <xs:element name="RUNTIME_REGRESSION_STATUS" type="xs:string"/>
                    <xs:element name="AVG_RUNTIME_BASELINE_MIN" type="xs:decimal" minOccurs="0"/>
                    <xs:element name="P95_RUNTIME_BASELINE_MIN" type="xs:decimal" minOccurs="0"/>
                    <!-- Failure Metrics -->
                    <xs:element name="FAILURES_LAST_24_HOURS" type="xs:integer"/>
                    <xs:element name="FAILURES_LAST_7_DAYS" type="xs:integer"/>
                    <!-- Running State -->
                    <xs:element name="CURRENTLY_RUNNING_COUNT" type="xs:integer"/>
                    <xs:element name="MAX_RUNNING_DURATION_HOURS" type="xs:decimal"/>
                    <!-- Execution Stats -->
                    <xs:element name="TOTAL_RUNS" type="xs:integer"/>
                    <xs:element name="FAILED_RUNS" type="xs:integer"/>
                    <xs:element name="SUCCEEDED_RUNS" type="xs:integer"/>
                    <xs:element name="RUNNING_RUNS" type="xs:integer"/>
                    <xs:element name="BLOCKED_RUNS" type="xs:integer"/>
                    <xs:element name="CANCELLED_RUNS" type="xs:integer"/>
                    <xs:element name="WARNING_RUNS" type="xs:integer"/>
                    <xs:element name="ADHOC_COUNT" type="xs:integer"/>
                    <xs:element name="SCHEDULE_COUNT" type="xs:integer"/>
                    <xs:element name="QUEUE_PRESSURE" type="xs:integer"/>
                    <xs:element name="AVG_RUNTIME_MINUTES" type="xs:decimal" minOccurs="0"/>
                    <xs:element name="STDDEV_RUNTIME_MINUTES" type="xs:decimal" minOccurs="0"/>
                    <!-- Probabilities -->
                    <xs:element name="PREDICTED_FAILURE_PROBABILITY" type="xs:decimal"/>
                    <xs:element name="SLA_BREACH_PROBABILITY" type="xs:decimal"/>
                    <!-- Trend & Risk -->
                    <xs:element name="RUNTIME_VOLATILITY_INDEX" type="xs:decimal"/>
                    <xs:element name="RUNTIME_TREND_SLOPE_PER_RUN" type="xs:decimal" minOccurs="0"/>
                    <xs:element name="STUCK_JOB_PROBABILITY" type="xs:decimal"/>
                    <!-- Confidence -->
                    <xs:element name="CONFIDENCE_LEVEL" type="xs:string"/>
                    <xs:element name="CONFIDENCE_SCORE_WEIGHTED" type="xs:decimal"/>
                    <!-- Risk -->
                    <xs:element name="JOB_RISK_SCORE" type="xs:decimal"/>
                    <!-- Schedule -->
                    <xs:element name="SCHEDULE_START_TIME" type="xs:string" minOccurs="0"/>
                    <xs:element name="SCHEDULE_END_TIME" type="xs:string" minOccurs="0"/>
                    <xs:element name="SCHEDULE_FREQUENCY" type="xs:string" minOccurs="0"/>
                    <xs:element name="SCHEDULE_INTERVAL" type="xs:integer" minOccurs="0"/>
                    <xs:element name="SCHEDULE_ENDED_FLAG" type="xs:string"/>
                    <xs:element name="SCHEDULE_OVERDUE_FLAG" type="xs:string"/>
                    <xs:element name="HIGH_FREQUENCY_SCHEDULE_FLAG" type="xs:string"/>
                    <!-- Requests -->
                    <xs:element name="LAST_REQUEST_ID" type="xs:long"/>
                    <xs:element name="LAST_SUCCESS_REQUEST_ID" type="xs:long"/>
                    <xs:element name="LAST_FAILED_REQUEST_ID" type="xs:long"/>
                    <xs:element name="LONGEST_RUNNING_REQUEST_ID" type="xs:long"/>
                    <!-- Alert -->
                    <xs:element name="ALERT_FLAG" type="xs:string"/>
                    <xs:element name="ALERT_SEVERITY" type="xs:string"/>
                    <xs:element name="ALERT_REASON" type="xs:string"/>
                    <!-- Additional Attribute -->
                    <xs:element name="ATTRIBUTE_NAME1" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_VAL1" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_NAME2" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_VAL2" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_NAME3" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_VAL3" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_NAME4" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_VAL4" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_NAME5" type="xs:string"/>
                    <xs:element name="ATTRIBUTE_VAL5" type="xs:string"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>