Machine Learning with Oracle Database Advanced Analytics

TL;DR:

  • Oracle DB + Oracle Data Mining + Oracle R Enterprise = Database Advanced Analytics (OAA)
  • The Database Advanced Analytics option allows running Machine Learning algorithms within the database itself.
  • The on-prem flavor of OAA is available as part of database Enterprise Edition 12c and upward releases.
  • In OPC, OAA is available as part of DBCS High Performance Edition in OCI Classic, a High Performance Edition of DB System in OCI, and as part of Autonomous Data Warehouse Cloud(ADWC) in OCI which is available in 18c version only.
  • “Oracle Machine Learning” is a Zeppelin based SQL notebook that is available with ADWC .

1. Introduction

If you are reading this blog then you already know what is Machine Learning(ML). But it always helps to have a formal frame of reference of its definition, as provided by Tom Mitchell :

Machine Learning is the study of algorithms that learn from experience E with respect to some class of tasks T and performance measure P, such that the algorithms’ performance at tasks in T, as measured by P, improves with experience E.

The most important part of the definition above, as any data scientist would agree, is the experience E or the data the algorithm (a.k.a. ML model) trains on. Almost always it is the data that differentiates a great ML model from a good one.

Now, if you’re an enterprise customer embarking on an ML project, chances are the data is being generated by one of the back-end systems, and it is being stored in Oracle database. If your data science team is using the Open Source libraries (say Python scikit-learn) for ML, then this data must typically be packaged and moved over to a different computing infrastructure for further analysis. With such data movement there’s time involved, security issues etc.

But what if the data never had to be moved, and the Oracle database itself could do all the Machine Learning for you ? This is where the Oracle Database Advanced Analytics(OAA) comes in.
OAA provides parallel, in-database implementation of the commonly used Machine Learning algorithms, ensuring the data always stays within the database.

OAA is the best place to start the ML journey for any enterprise customer, because as any Data Scientist would tell you, most of the modern enterprise ML problems can be solved by the simplest of regression algorithms if the data is good.

In this blog I will provide an overview of Oracle Database AA, describe its on-prem as well as Oracle Public Cloud(OPC) avatars, provide an overview of the provisioning process for OPC, and finally point you to some awesome blogs and documentation pages to bookmark to keep yourself up to date.

2. Product Overview

Oracle Advanced Analytics (OAA) provides an in-database implementation of various Machine Learning algorithms, and it integrates with the open source R language.

The description above can be broken down into two components, which are essentially what OAA consists of :

2.1 Oracle Data Mining

Historically ODM used to be a separate product, but it has now been bundled as part of the Advanced Analytics offering.

ODM provides a set of pre-implemented Machine Learning models that are available to use as SQL functions. These functions are executed in-memory within the database itself, taking full advantage of all the parallelism built within the database.

The list of algorithms include most of the commonly used ones for for various ML problem categories such as:

  • Classification : Naive Bayes, SVM, Decision Tree
  • Regresssion : GLM, Logistic Regression
  • Anomaly Detection: One Class SVM
  • Clustering : K-Means, Orthogonal Partition Clustering
  • Association : Apriori
  • Feature Extraction : Matrix Factorization, PCA, SVD

The list above is by no means exhaustive, and with every release of the product more algorithms are being added.
The most up-to-date list of algorithms and their usage can be found here and here.

Also, not only does ODM do all heavy-lifting of implementing the ML algorithms, it also provides a SQL-Developer based GUI component, called Data Miner GUI , which enables building the ML model in a UI-driven workflow, right from SQL Developer itself.

2.2 Oracle R Enterprise

ORE essentially integrates the R programming language with Oracle database. It is a set of R packages and Oracle database features that enable the R user to operate on database-resident data without using any SQL and execute R scripts that run directly on the database, thus offering the data scientist an R interface to the database.

R users can develop and test R scripts interactively, use CRAN and other packages with the database, and use Oracle database tables as R objects. ORE has overloaded functions that translate R operations into SQL that executes in the database. Similarly the output from the database operation is converted back to R objects.

3. OAA on-prem

Now that we know what the product is, how do we try it out ?

Easy. Advanced Analytics is basically a database product, available as part of the ‘Enterprise Edition’ of Oracle Database on-premise. This can be downloaded here, along with the associated R and SQL Developer components.

4. OAA in the Oracle Public Cloud (OPC)

While the on-prem approach described above works, much easier options exist if you have OPC subscription.

There are three ways to test-drive OAA in OPC , as described below.

4.1.1 Database Cloud Service

The Oracle Database Cloud Service is the defacto database available with OPC. It comes in various options depending on the desired functionality, and as described in the pricing sheet here , with the High Performance Package and above the database is provisioned with Advanced Analytics features.

On the create database instance screen, simply select the right database version and the edition, and you’re good to go.

Please note that the option described here provisions the instance in Oracle Compute Infrastructure Classic.

4.1.2. DB System

The second approach is to launch a DB System in the newer Oracle Cloud Infrastructure environment. In OCI , simply to go DB Systems -> Launch DB System, and select the right flavor.

4.1.3 Autonomous Data Warehouse Cloud (ADWC)

The third approach is to use the newly introduced, self-driving ADWC. Go to Autonomous Data Warehouse, click on Create, fill in the details, and the instance is provisioned for you. Please note that ADWC comes only with the 18c database version.

4.2 ADWC and ‘Oracle Machine Learning’

The Oracle Autonomous Data Warehouse Cloud is a fully-managed database service that is easy to setup (as evident from the provisioning screenshot above) , based on Exadata technology, and truly elastic such that compute and storage can be scaled up or down without any downtime.
It integrates with a number of other Oracle Cloud services including Analytics Cloud, Data Integration Cloud, etc.

ADWC provides two interfaces to access the database:

   1. Using traditional SQL Developer based on SQL*Net connection,
   2. Using the newly introduced “Oracle Machine Learning” (OML) notebook. OML is a Zeppelin based SQL Notebook interface, available with ADWC only. It allows writing SQL scripts along with supporting the documentation, assumption, approaches etc to increase productivity.

5. Conclusion

I hope this blog provided a good overview of Oracle Advanced Analytics landscape.

In my next blog I will go through a worked example that uses OAA to detect anomalies in a sample dataset.

In the meantime, below are a few resources to get started with OAA :
1. Oracle Data Mining Tutorial Series
2. Oracle R Tutorial Series
3. Oracle Machine Learning Tutorial

Also, OAA is constantly evolving with new features in every release, and I strongly recommend following Charlie Berger (who leads the Product Management of OAA) at https://blogs.oracle.com/author/charlie-berger to remain updated with the latest features.

Add Your Comment