Part 1: Detecting Anomalies in OCI Cost Reports Using Autonomous Data Warehouse (ADW) and Built-In AI

As organizations embrace the cloud to drive agility and innovation, gaining clear visibility into cloud spend becomes a powerful lever for optimization. Oracle Cloud Infrastructure (OCI) provides rich and detailed cost and usage reports (FOCUS reports) that help teams understand where their cloud budget is going.

By combining these reports with the built-in machine learning capabilities of Oracle Autonomous Data Warehouse (ADW), customers can take cost analysis to the next level—proactively identifying unusual patterns or spending spikes without needing any data science background. This blog series showcases how you can enhance your cost management strategy using ADW’s in-database AI features to uncover valuable insights with just a few lines of SQL.

About This Blog Series

This is a 3-part blog series that walks through building a complete anomaly detection solution for OCI cost reports:

Part 1 (this post) – Introduction, solution overview, and code to copy FOCUS reports into ADW.
Part 2 – Set up ADW schema and build an anomaly detection model using in-database machine learning (OML4SQL).
Part 3 – Automate monthly model retraining and detect anomalies, trigger alerts via email or integrations.

Why This Matters

OCI customers often manage multiple regions, compartments, and services. Manually scanning through CSV reports for spikes, unusual charges, or misconfigurations is time-consuming and error prone. By automating anomaly detection with AI, you:

Save time analyzing data
Identify problems earlier
Avoid billing surprises
Enable chargeback optimization across teams

And the best part? You don’t need to build or train complex models manually. ADW’s OML4SQL and built-in algorithms make anomaly detection accessible with just SQL.

High-Level Solution Architecture

Detecting Anomalies in OCI Cost Reports

Object Storage – FOCUS_Cost_and_Usage_Reports – Store all daily cost and usage reports (CSV files) in an OCI Object Storage bucket. Refer to this blog post for more information.
Daily Scheduler – Automatically trigger a scheduled job every day using the OCI Scheduler.
Worker Node – Run a Python script (using the OCI SDK and Oracle DB drivers) on a compute instance to load the latest reports into Autonomous Data Warehouse (ADW).
Monthly Scheduler – Trigger a monthly job that initiates the anomaly detection process.
Machine Learning (OML4SQL) – Use Oracle Machine Learning with a one-class SVM model to detect anomalies in cost data directly from SQL.
Functions – Using OCI Resource Scheduler, execute an OCI Function to query and evaluate cost anomalies over the past 30 days.
Notifications – Notify stakeholders automatically if any anomalies are found.

Step-by-Step Guide to Load FOCUS Reports to ADW

Prerequisites

ADW instance
ADW user Password stored in OCI Vault Secrets
A bucket in OCI Object Storage with cost reports
A working OCI config file or instance principal setup
Python 3.9+
Packages: oci, oracledb, pandas

Python Script to Copy FOCUS CSVs to ADW

import oci
import oracledb
import pandas as pd
import os
import numpy as np
import time
import random
from datetime import datetime,timedelta
from concurrent.futures import ThreadPoolExecutor, as_completed
import base64
# === CONFIG ===
bucket_name = "cost_and_usage_reports" # change it based on your environment
namespace = "ociateam" # change it based on your environment
adw_user = "admin" # change it based on your environment
dsn = "(description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-xxx-1.oraclecloud.com))(connect_data=(service_name=xxxx_low.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes)))"
local_folder = "temp_csvs"
os.makedirs(local_folder, exist_ok=True)
# === OCI SETUP ===
signer = oci.auth.signers.InstancePrincipalsSecurityTokenSigner()
object_storage = oci.object_storage.ObjectStorageClient(config={}, signer=signer)
secrets_client = oci.secrets.SecretsClient(config={}, signer=signer)
vault_client = oci.vault.VaultsClient(config={}, signer=signer)
secret_ocid = "ocid1.vaultsecret.oc1.iad.xxxxx" # Replace with your actual OCID of the secret
base64_secret = secrets_client.get_secret_bundle(secret_ocid).data.secret_bundle_content.content
decoded_bytes = base64.b64decode(base64_secret)
adw_password = decoded_bytes.decode("utf-8")
# === PAGINATED OBJECT LISTING ===
def list_all_objects(namespace, bucket_name):
    all_objects = []
    next_start_with = None
    while True:
        response = object_storage.list_objects(
            namespace_name=namespace,
            bucket_name=bucket_name,
            start=next_start_with
        )
        all_objects.extend(response.data.objects)
        next_start_with = response.data.next_start_with
        if not next_start_with:
            break
    return all_objects
# Fetch all objects
objects = list_all_objects(namespace, bucket_name)
start_date = datetime.utcnow() - timedelta(days=1)
start_date = datetime(start_date.year, start_date.month, start_date.day)  
csv_files = []
for obj in objects:
    name = obj.name
    if name.endswith(".csv"):
        try:
            parts = name.split("/")
            if len(parts) >= 4:
                y, m, d = int(parts[1]), int(parts[2]), int(parts[3])         
                file_date = datetime(y, m, d)
                if file_date >= start_date:
                    csv_files.append(name)
        except Exception as e:
            print(f"Skipping '{name}' due to parsing error: {e}")
print(f"csv_files are {csv_files}")
# === GLOBAL CONNECTION POOL (created once) ===
pool = oracledb.create_pool(
    user=adw_user,
    password=adw_password,
    dsn=dsn,
    min=1,
    max=5,
    increment=1,
    timeout=60,
    getmode=oracledb.POOL_GETMODE_WAIT
)
# === PROCESS ONE FILE ===
def process_file(file):
    print(f"[Thread] Processing: {file}")
    local_path = os.path.join(local_folder, os.path.basename(file))
    try:
        # Download the file from OCI Object Storage
        obj = object_storage.get_object(namespace, bucket_name, file)
        with open(local_path, "wb") as f:
            f.write(obj.data.content)
        # Load and clean CSV
        df = pd.read_csv(local_path, low_memory=False)
        drop_cols = ['Tags', 'oci_BackReferenceNumber']
        for col in drop_cols:
            if col in df.columns:
                df.drop(columns=[col], inplace=True)
        for col in ["BillingPeriodStart", "BillingPeriodEnd", "ChargePeriodStart", "ChargePeriodEnd"]:
            if col in df.columns:
                df[col] = pd.to_datetime(df[col], errors='coerce')
        numeric_columns = df.select_dtypes(include=["float", "int"]).columns.tolist()
        for col in numeric_columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')
        df = df.astype({col: "float64" for col in df.select_dtypes(include="int64").columns})
        df.replace({np.nan: None, np.inf: None, -np.inf: None}, inplace=True)
        insert_columns = ",".join(df.columns)
        placeholders = ",".join([f":{i+1}" for i in range(len(df.columns))])
        insert_sql = f"INSERT INTO oci_cost_data ({insert_columns}) VALUES ({placeholders})"
        rows = df.values.tolist()
        # Retry connection logic
        conn = None
        for attempt in range(3):
            try:
                conn = pool.acquire()
                break
            except Exception as e:
                print(f"Failed to acquire DB connection (attempt {attempt+1}): {e}")
                time.sleep(2 ** attempt + random.uniform(0, 1))
        if not conn:
            raise Exception("Could not acquire DB connection after 3 retries.")
        with conn.cursor() as cursor:
            for i, row in enumerate(rows):
                try:
                    cursor.execute(insert_sql, row)
                except Exception as e:
                    print(f"\n Row {i} in {file} failed: {e}")
                    for col, val in zip(df.columns, row):
                        print(f"   {col}: {val} ({type(val).__name__})")
                    raise
        conn.commit()
        print(f"Loaded {len(rows)} rows from {file}")
        # Clean up local file after successful processing
        try:
            os.remove(local_path)
            print(f"Deleted local file: {local_path}")
        except Exception as cleanup_err:
            print(f"Could not delete file {local_path}: {cleanup_err}")
    except Exception as e:
        print(f"\n Error processing file '{file}': {e}")
# === MAIN EXECUTION ===
max_workers = 5  # Adjust based on compute capacity
with ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = {executor.submit(process_file, f): f for f in csv_files}
    for future in as_completed(futures):
        file = futures[future]
        try:
            future.result()
        except Exception as e:
            print(f"Uncaught error in thread for file '{file}': {e}")

You can schedule this script to run daily using a cron job or OCI Resource scheduler to ensure your database stays up to date.

What’s Next in Part 2

In the next blog, we’ll dive into how to use ADW’s built-in machine learning with DBMS_DATA_MINING and OML4SQL to train an anomaly detection model using your loaded cost data. You’ll see how to:

Prepare training data
Use one-class SVM for unsupervised learning
Query anomalies using SQL

Final Thoughts

With ADW and its built-in AI, anomaly detection becomes effortless—even for those with no machine learning background.

Part 1: Detecting Anomalies in OCI Cost Reports Using Autonomous Data Warehouse (ADW) and Built-In AI

About This Blog Series

Why This Matters

High-Level Solution Architecture

Step-by-Step Guide to Load FOCUS Reports to ADW

Prerequisites

Python Script to Copy FOCUS CSVs to ADW

What’s Next in Part 2

Final Thoughts

Ramesh Balajepalli

Master Cloud Architect

Fusion Data Intelligence – Global Bundle Repository

Validating OCI Policies in your CI/CD pipelines

Part 1: Detecting Anomalies in OCI Cost Reports Using Autonomous Data Warehouse (ADW) and Built-In AI

About This Blog Series

Why This Matters

High-Level Solution Architecture

Step-by-Step Guide to Load FOCUS Reports to ADW

Prerequisites

Python Script to Copy FOCUS CSVs to ADW

What’s Next in Part 2

Final Thoughts

Authors

Ramesh Balajepalli

Master Cloud Architect

Fusion Data Intelligence – Global Bundle Repository

Validating OCI Policies in your CI/CD pipelines