Extending SaaS by AI/ML- Part 9: Using Vector Embeddings and Classification models within Oracle Machine Learning(OML) in Database

Introduction

Welcome to part 9 of our blog series on enhancing SaaS applications with AI and machine learning capabilities. In this post, we’ll go through how to utilize Machine Learning in Oracle Database (OML) to generate vector embeddings and train a classification model using SaaS data.

Machine Learning in Oracle Database

Oracle provides over 30 high-performance, in-database algorithms, enabling immediate use of models in your applications. By keeping the data inside the database, organizations can simplify their infrastructure while ensuring better data synchronization and security. Oracle Machine Learning Notebooks support various languages such as SQL, PL/SQL, Python, R, and Conda. In addition, OML supports bringing your own models in Open Neural Network Exchange (ONNX) format. You can import text transformer, classification, regression, and clustering models in ONNX format to use in OML.

Vector Data Type

The Vector data type is introduced with the release of Oracle Database 23ai. Using embedding models, you can transform unstructured data into vector embeddings. You can then provide Vector emeddings as input to Oracle Machine Learning in-database algorithms to complement other structured data or be used alone. The Vector data type is supported for clustering, classification, anomaly detection, and feature extraction.

In this article, we’ll focus on training a classification model using OML. The classification model will use Vector embeddings generated using an imported HuggingFace ONNX model. For details on the dataset used for classification, refer to Part 2 of this blog series. To learn more about loading data into an autonomous database, check out Part 6.

Steps

Step 1: Create a database user with OML privileges

To use OML for training the model, do the following:

On the Autonomous Database Details page, click Database Actions and select Database Users.
Alternatively, select Database Actions and click View All Database Actions to access the Database Actions Launchpad. From the launchpad, under Administration, click Database Users.
Click + Create User.
In the Username field, enter a username for the account (e.g., OML_USER).
In the Password field, enter a password for the user.
In the Confirm Password field, re-enter the password to confirm it.
Select OML to enable Oracle Machine Learning for this user. Optionally, enable Web Access to allow web-based access.

oml1

Step 2: Create a database table for storing the dataset

On the Autonomous Database Details page, click Database Actions.
Use the credentials of the OML_USER you just created to log in.
Create a table named ITEMS to store the dataset for item classification. This table has the following 3 columns:

ITEM_CLASS_NAME VARCHAR2(100) (the target label)
ITEM_DESCRIPTION VARCHAR2(4000) (the feature used for prediction)
ID NUMBER ( Primary Key Column)

The model we create will predict the⁣ ITEM_CLASS_NAME based on the ITEM_DESCRIPTION feature.

Step 3: Import a HuggingFace Model for Generating Embedding

Steps to Download and Load the Model into Oracle Database

Download the HuggingFace model from here, Hugging Face all-MiniLM-L12-v2 model and extract the zip file.
Upload the .onnx file to an Object Storage bucket.
Create a pre-authenticated request of the .onnx file in the bucket.

ONN

4. Select Permit object reads option, set an expiration date for the URL, generate and copy the Pre-Authenticated URL.

onn

5. Connect to the database as oml_user, execute the following script, replacing LOCATION_URI with the Pre-Authenticated URL from the previous step.

DECLARE

   ONNX_MOD_FILE VARCHAR2(100) := 'all_MiniLM_L12_v2.onnx';

   MODNAME VARCHAR2(500);

   -- Set the LOCATION_URI

    LOCATION_URI VARCHAR2(200) := 'https://objectstorage.us-ashburn-1.oraclecloud.com/p/nVX....n/idprle0k7dv3/b/bucket-20250110-1032/o/';


BEGIN

    DBMS_OUTPUT.PUT_LINE('ONNX model file name in Object Storage is: '||ONNX_MOD_FILE);

    -- Define a model name for the loaded model

     SELECT UPPER(REGEXP_SUBSTR(ONNX_MOD_FILE, '[^.]+')) 
     INTO MODNAME 
     FROM DUAL;

     DBMS_OUTPUT.PUT_LINE('Model will be loaded and saved with name: '||MODNAME);


     -- Read the ONNX model file from Object Storage into

     -- the Autonomous Database data pump directory


     BEGIN 
         DBMS_DATA_MINING.DROP_MODEL(model_name => MODNAME);

     EXCEPTION WHEN OTHERS THEN 
        NULL; 
     END;
     DBMS_CLOUD.GET_OBJECT(
       credential_name => NULL,
       directory_name  => 'DATA_PUMP_DIR',
       object_uri      => LOCATION_URI || ONNX_MOD_FILE
     );


    -- Load the ONNX model to the database

    DBMS_VECTOR.LOAD_ONNX_MODEL(
    directory   => 'DATA_PUMP_DIR',
    file_name   => ONNX_MOD_FILE,
    model_name  => MODNAME
   );


    DBMS_OUTPUT.PUT_LINE('New model successfully loaded with name: '||MODNAME);

END;

6. The above step should give the results as below if successfully completed.

ONNX model file name in Object Storage is: all_MiniLM_L12_v2.onnx
Model will be loaded and saved with name: ALL_MINILM_L12_V2
New model successfully loaded with name: ALL_MINILM_L12_V2

Step 4: Train a Classification Model

To train the model, we’ll use OML Notebooks. OML Notebooks is an advanced web-based platform designed for data analysts and data scientists. It enables users to write code, create visualizations, and perform data analytics and machine learning.

To connect to the OML notebook, follow the steps below:

On the Autonomous Database Details page, click Database Actions.

oml2

2. Use the credentials of OML_USER you just created to log in.

3. In the left navigation pane of Oracle Machine Learning, expand Project and click on Notebooks. Click Create, then provide a name for the notebook.

notebook

To train a machine learning model, follow the steps below:

1. Open the notebook you just created.

2. Create Training and Test Data Sets from the ITEMS Table.

For a supervised learning model such as classification, it is essential to split the data into training and test sets before building the model. The training set is used to train the model, while the test set is used to evaluate its accuracy through prediction queries.

For our classification model, we will use the vector embeddings of the ITEM_DESCRIPTION column to predict the ITEM_CLASS_NAME.The vector embeddings are generated using the VECTOR_EMBEDDING function. This function requires a model name as a parameter, which corresponds to the imported Hugging Face embedding model that performs the embedding function. The embeddings are then stored in the⁣ ITEM_DESCRIPTION_V column of the training and test data set.

Create embeddings and store training data set in a table

%script

CREATE TABLE TRAIN_DATA_SET AS 
SELECT 
    ID, 
    ITEM_CLASS_NAME, 
    (SELECT VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING ITEM_DESCRIPTION AS data)) AS ITEM_DESCRIPTION_V 
FROM ITEMS SAMPLE (60) SEED (1);

%script

DESC TRAIN_DATA_SET;

Name               Null? Type          
------------------ ----- ------------- 
ID                      NUMBER 
ITEM_CLASS_NAME       VARCHAR2(100) 
ITEM_DESCRIPTION_V       VECTOR

Create embeddings and store test data set in a table

%script
CREATE TABLE TEST_DATA_SET AS
SELECT
    ID,
    (SELECT VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING ITEM_DESCRIPTION AS data)) ITEM_DESCRIPTION_V
FROM ITEMS
WHERE ITEMS.ID NOT IN (SELECT ID FROM TRAIN_DATA_SET);

Name               Null? Type   
------------------ ----- ------ 
ID                      NUMBER 
ITEM_DESCRIPTION_V       VECTOR

Build an SVM Classifier Model

To build the model,use the CREATE_MODEL2 procedure in OML. It has the following parameters,

DATA_QUERY parameter specifies a query that provides training data for building the model.
SET_LIST parameter defines model settings.
MODEL_NAME parameter specifies the model name.
Since we are using classification, the MINING_FUNCTION parameter is set to CLASSIFICATION.

A sample script to train the model is given below:

%script

BEGIN
    DBMS_DATA_MINING.DROP_MODEL('SVM_CLASSIFICATION_MODEL');
EXCEPTION 
    WHEN OTHERS THEN NULL;
END;

/

DECLARE

v_settings DBMS_DATA_MINING.SETTING_LIST;

BEGIN
-- Settings of the SVM model

v_settings('PREP_AUTO') := 'ON';

v_settings('ALGO_NAME') := dbms_data_mining.algo_support_vector_machines;

v_settings('SVMS_KERNEL_FUNCTION') := 'SVMS_LINEAR';

v_settings('SVMS_CONV_TOLERANCE') := '0.001';

v_settings('SVMS_BATCH_ROWS') := '2000';

v_settings('SVMS_REGULARIZER') := 'SVMS_REGULARIZER_L1';

v_settings('SVMS_SOLVER') := 'SVMS_SOLVER_SGD';

-- Train the model
DBMS_DATA_MINING.CREATE_MODEL2(
        MODEL_NAME => 'SVM_CLASSIFICATION_MODEL',
        MINING_FUNCTION => 'CLASSIFICATION',
        DATA_QUERY => 'SELECT * FROM TRAIN_DATA_SET',
        SET_LIST => v_settings,
        CASE_ID_COLUMN_NAME => 'ID',
        TARGET_COLUMN_NAME => 'ITEM_CLASS_NAME'
    );
END;

Make Predictions using the Model

The APPLY procedure in DBMS_DATA_MINING is a batch apply operation that writes the results of scoring directly to a table. Classification produces a prediction and a probability for each case.

%script

--Drop Existing Results Table (If Any) and Apply model prediction
BEGIN
    EXECUTE IMMEDIATE 'DROP TABLE SVM_RESULT PURGE';
EXCEPTION
    WHEN OTHERS THEN NULL;
END;

/
-- Apply the model to the test data set
BEGIN

DBMS_DATA_MINING.APPLY('SVM_CLASSIFICATION_MODEL','TEST_DATA_SET','ID','SVM_RESULT');

END;

/

-- Select from the SVM_RESULT to see the predictions

SELECT * FROM svm_result;

/

Step 5: Create an Oracle Machine Learning Services model endpoint for scoring

Once the model is trained, it will be available under the Models menu in Oracle Machine Learning.

Model

Deploy the model using the Deploy button on the Models page. After a model is successfully deployed, it is listed in the Deployments tab.

To get the model invocation URL, select a deployed model and click the link under URI to view the REST API URI of the model.

Model Deploy

Conclusion

By leveraging Oracle Machine Learning (OML), you gain access to a platform that accelerates the development and deployment of machine learning solutions for enterprise business challenges. You can run machine learning algorithms within the database kernel, bypassing the complexities and performance issues associated with transferring large enterprise datasets between external tools.

Extending SaaS by AI/ML- Part 9: Using Vector Embeddings and Classification models within Oracle Machine Learning(OML) in Database

Introduction

Steps

Step 1: Create a database user with OML privileges

Step 2: Create a database table for storing the dataset

Step 3: Import a HuggingFace Model for Generating Embedding

Step 4: Train a Classification Model

Step 5: Create an Oracle Machine Learning Services model endpoint for scoring

Conclusion

Rekha Mathew

Cloud Solution Architect | A-Team

Restrict Root Compartment Access with Oracle Cloud Infrastructure Policies

How to Work With JSON in OCI Resource Manager

Extending SaaS by AI/ML- Part 9: Using Vector Embeddings and Classification models within Oracle Machine Learning(OML) in Database

Introduction

Steps

Step 1: Create a database user with OML privileges

Step 2: Create a database table for storing the dataset

Step 3: Import a HuggingFace Model for Generating Embedding

Step 4: Train a Classification Model

Step 5: Create an Oracle Machine Learning Services model endpoint for scoring

Conclusion

Authors

Rekha Mathew

Cloud Solution Architect | A-Team

Restrict Root Compartment Access with Oracle Cloud Infrastructure Policies

How to Work With JSON in OCI Resource Manager