Lifecycle Management of Instance Pools

November 10, 2020 | 6 minute read
Text Size 100%:
Instance Pools (especially in combination with rules for automatic scaling) are a great feature of OCI to automate the scaling of compute resources to match fluctuating demand. Examples can be seasonal or weekly peaks or a compute pool in a DR scenario. Instance Pools are quickly configured in the OCI console, although you can of course also manage them with the OCI CLI, terraform or other supported methods for Infrastructure as Code.

One aspect of instance pools that will come up eventually is the question of updating the individual worker nodes in such a pool to keep up with OS or application updates. The OCI Console lends itself as the tool of choice for building the initial Instance Configuration and then the Instance Pool and Autoscaling Configuration. However, the Console currently doesn't offer any way to update the instances managed by the pool.

There are several aspects to consider for such updates:

  • OS updates
    These could potentially be delegated to the OS Management Service. However, the subscription to a group of managed instances would need to be part of the boot process configured in the instance configuration. While this might work well for instances that are used for a longer time, the expectation for an Instance Pool is that instances are created and terminated as the pool fluctuates. In addition, depending on the number of available updates, patching a new member of the pool could take a long time and would add additional CPU and IO load to the instance which was just started to ease the overall load on the pool. So for new instances joining the pool, the ideal would be to spawn them using an image that has already been patched.
  • Application updates
    Similar to OS updates, there are multiple ways to update the application running in the pool. But since new instances will always start from the original image that was defined in the instance configuration, whatever update process is used will always need to be run right after the first boot of the instance, facing the same issues as the OS updates.

The obvious conclusion is that we will need to apply whatever updates there are to the source image used to instantiate new instances when they join the pool. The solution for existing pool members would be to trigger their replacement with new instances.

The way Instance Pools work, they don't allow you to update the Instance Configuration with new parameters or a new source image. However, you can create a new Instance Configuration and then update the existing Instance Pool with this new configuration. This will cause any new instances to spawn off the updated Instance Configuration.

Creating the Initial Instance Configuration

Of course, there are multiple ways to create the initial image and then the Instance Configuration. For example:

  • A very simple use case would work with a standard image and a cloud-init script to create the application. In such a case, you could create the initial Instance Configuration by using a minimal configuration file in json format, supplying the base64-encoded cloud-init script in the user-data and referring to a standard image provided by OCI. Creating the Instance Configuration is then as simple as
    	oci compute-management instance-configuration create 
            --instance-details file:///instance.details.json
            --compartment-id <your compartment-id> 
            --display-name "mgoldbuildconfig"
    
    An example for the file instance.details.json can be found in this blog.
  • Another approach is to deploy the application to a standalone instance and then create the instance configuration from that instance. This is most easily done using the console and is probably the most common way to do this. In this case it is advisable to keep the boot volume of this source instance after the Instance Configuration has been created so you can use it for updating later.

Whatever you do, the end result will always include an image which will be used as the source for the worker nodes in your Instance Pool. Once you're done with the pool configuration, it will look something like this in the Console:

The initial pool

In this example, it is configured to run two instances. Checking the pool shows that both of them use the initial Instance Configuration:

Instance in the initial pool

Updating the Source Image to Create a New Instance Configuration

How you update this image will depend on how you created it. For example:

  • You may have made changes to the cloud-init script used in the first example above.
    In this case, you would create a new Instance Configuration using the same command as above, just with a modified configuration in the json file and a new name.
  • You may want to base your new instances off a new release of the OCI standard image which you used when building your Instance Configuration. 
    Again, you'd simply create a new Instance Configuration.
  • You might have created a standalone instance off the source image (or just reused the original instance) and applied whatever updates were available to that instance. 
    In this case, you can either use the Console to create a new Instance Configuration right from that instance, or you can create a custom image and create a new Instance Configuration with the above command, where the source image would point to the OCID of the new custom image you created.

Whatever you do, you will need to create a new Instance Configuration that will point to the updated image.

Updating the Instance Pool

Once you created a new Instance Configuration (based on the new or updated image), the next step is to update the existing Instance Pool, replacing the previous version of the Instance Configuration with the new one. All we need for this is the OCIDs of the existing Instance Pool and the new Instance Configuration. We can then update the pool with the command

oci compute-management instance-pool update --instance-pool-id <instancepool-ocid> 
    --instance-configuration-id <newinstanceconfiguration-ocid>

The resulting pool configuration will look like this in the Console:

The new pool configuration

This will have no immediate effect on the running instances in the pool, so any active production workloads will continue unaffected. However, any new instance that is spawned because of a scaling operation will now use the updated Instance Configuration and thus the updated image. The result will be that any new instances will run on the updated image, while all the "old" instances will continue to work on the older version:

New instances using the new configuration

The final step, therefore, is to replace the old instances with new ones. 

The simplest method to replace these old instances is to terminate them - either from the console or via the CLI. Make sure to terminate their boot volumes along with the instances. The Instance Pool will notice this and spawn replacements as needed:

Pool with updated instances

Note that terminating the instances might interrupt the workload running on that instance. However, this is also the case in a scale-down operation triggered by the pool, so it is usually safe to assume that the workload is transactional in nature and will recover from an unexpected instance termination.

Of course, if something is wrong with the new Instance Configuration, reverting back to the old configuration is as simple as repeating the above command, using the OCID of the previous, known good Instance Configuration.

Summary

As we've seen, it is not too difficult to update the instances in an Instance Pool, even while the pool is in production. To summarize, the steps are:

  1. Create the initial Instance Configuration along with the pool. (Keep the original image if needed for the update process.)
  2. Update the source image.
  3. Create a new Instance Configuration using the updated source image.
  4. Update the Pool configuration by replacing the original Instance Configuration with the updated one.
  5. Replace the old instances in the pool one by one.

References

Stefan Hinker

After around 20 years working on SPARC and Solaris, I am now a member of A-Team, focusing on infrastructure on Oracle Cloud.


Previous Post

How I set up my first Natural Language Process (NLP) project with SparkNLP

Rajesh Chawla | 5 min read

Next Post


Preparing an Exportable and Filterable Analysis in Oracle Analytics for Embedding into Custom Applications

Dayne Carley | 3 min read