Best Practices from Oracle Development's A‑Team

Deploy machine learning environment in OCI IaaS Like a Pro

Rajesh Chawla
Principal Cloud Architect

I found myself continually creating and tearing down IaaS resources for my machine learning experiments. Each time, I would do things just a bit differently, such as naming of resources or forget some step, such as to turn on backups for the disks. So, to save myself time I automated the procedure with Terraform.

My goal is to create a standalone environment for me to run experiments in. The experiments should be on the computer power I need (CPU / GPU), have Jupyter notebooks configured, perform backups automatically and finally be locked down from a networking / security point of view.

As I wished to leverage all tools I had at my disposal, I choose to automate the scripts in Terraform and use the existing Data Science VM from the HPC team in the OCI marketplace. The HPC team has done the work to ensure the GPU drivers are configured properly as well as pre-installed many of the packages I wanted to work with including TensorFlow, PyTorch and Keras. In the automation I wrote, I added a few more things that I use, which are connectivity to ATP / ADW. This required the oracle client libraries as well as the python libraries cx_oracle python package. See overview for info on cx_oracle. I also found that a bit of tweaking resulting in better performance. As with all performance tuning, your mileage will vary, but for my data, I found the most value in tuning the arraySize and preFetch parameters.

For the python packages I added, I placed them in the 'mlenv' python environment, which was preconfigured in the !Data Science VM. I also took the step of registering the kernel with the Jupyter notebooks via the python package ipykernel. This way, I could choose the 'mlenv' kernel from the Jupyter notebooks.

As another safety precaution, from my tendency to fat finger things, I turned on backups for all the disks I allocated in the VM. By default, I turned on silver policy, but you can easily tweak this to gold or bronze level.

The network I created for this is straightforward. The environment consists of a vcn, 2 subnets, a route table and ingress rules locked down except for port 22. You can see the environment in the diagram below.

I wanted the system secure, but I really didn't want to hassle with firewall rules. I also needed to access this environment from multiple places on the Internet. Since, the environment was for my individual use and a few of my colleagues so I opted for SSL tunneling as a way to get access to the Jupyter notebooks. I realize this is not a valid solution for larger teams, however, I suggest that Oracle Data Science  or Oracle Machine Learning may be better tools for that use case.

To deploy this solution, grab the code from github quick start and follow the instructions in the README. If you are familiar with Terraform, this should be an easy configuration for you. If you're not familiar with Terraform, you can review introductory materials from Hashicorp or check out this course about Terraform on OCI. After you’ve downloaded the Terraform code, the next steps boil down to:

  • Get an OCI account
  • Configure public / private keys
  • Tweak the configuration files for your interest
  • Execute the Terraform

This automation has saved me quite a bit of time and I’m curious to hear if this has value for you as well.

Join the discussion

Comments ( 1 )
  • Dr. Ram Srinivasan Tuesday, October 20, 2020
    Well written article that creates the spark in the reader to play with. Thanks for the line that refers to Oracle Data Science and Oracle Machine Learning.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha