A Patch Train Solution for OCI OS Management

July 17, 2020 | 9 minute read
Text Size 100%:

With the OS Management Service (OSMS), Oracle Cloud Infrastructure provides a service to fully automate the patching of your Oracle Linux or Windows instances. It allows you to organize your systems into groups and then schedule jobs to apply the latest updates to all these systems. There is a wide range of predefined software sources to choose from, providing the full wealth of the Oracle yum repositories to your Linux systems. In the simplest case, this will keep all your systems up to date with the latest patches all the time.

In many cases, this eliminates the burden of constantly chasing security updates and keeping your fleet patched. However, sometimes your patching goals are more complex than just having the "latest and greatest" all the time. In this article, I'll describe one such scenario and a solution.

Requirements: The Patch Train

Having the latest patches installed is usually desirable from a security point of view. However, imagine you have several hundred systems to manage, with some "coming and going" all the time. This isn't unusual, especially in a cloud environment where instance deployment and termination is easy and a normal part of overall system management. If all of these systems patch to "the latest," then within a few days, you'll have multiple systems, all with different versions of packages, depending on when they were last patched. This will quickly converge to something known as "dim sum patching," where N systems have N different combinations of packages and versions  This is often the opposite of what you want in a production environment, where everything is tested before it starts running. In such a controlled environment, a well defined set of packages (and package versions) is deployed to systems after that combination has gone through intensive testing. The complete fleet of systems is then brought to that patch level and will stay there until the next set of patches is tested and released. This is very much like a train that leaves the station at regular times. Anyone (i.e. any patch or package update) that makes it to the platform in time will come along for the ride. All those that show up later will queue up for the next train.

This is exactly what we want to achieve here:  Define a set of packages with defined versions that will be used to patch all the systems in a group. Any new systems joining the group will be patched to the same level. New updates arriving from upstream will not be available to those systems. Rather, those will be included in the next set of patches that is being prepared. The following diagram shows this workflow:

The Patch Train

The green bar at the top shows the continuous flow of new updates as they come in from upstream. On the very left, a snapshot of the then current packages is taken (the train closes it's doors) and bundled into something we will call "Great Linux 1". After some testing, it is released (the train leaves) and applied to all systems. The bar at the bottom shows the patch level of these systems. At first, they each have their individual package versions as they come in from deployment. Once patched to "GL1", they are all exactly the same with regards to package versions. Likewise, some time later, a new snapshot is taken and bundled into "GL1.1" and so forth. In this way, the systems being managed will all be at the same patch level for a certain time and will then all be patched to the next release of "Great Linux", again leaving them at exactly the same package versions. This workflow is a compromise between "always the latest updates" and "only run what's been tested". The shorter the cycle between two releases of "Great Linux", the closer you come to continuous patching. Of course, this increases the effort required for the test cycle.

So how do you take advantage of the OS Mangagement Service to implement this?  As mentioned above, OSMS currently only provides automation for "always the latest". With custom software sources, it also provides the missing piece you can use for the patch train approach. What you still need is an easy way to create these custom software sources.

Custom Software Sources

In OSMS, a software source is a list of packages and package versions. By attaching a software source to an instance, you make these exact packages available to that instance. OSMS manages access to the package repository as part of the service, so you don't have to worry about that. The default software sources provided by Oracle are updated regularly, so any new package version is made available by simply adding it to the software source. But you also have the option to create your own, independent software sources - called custom software sources. Custom software sources are not updated by Oracle, so you have full control over which packages and which package versions are listed in a custom software source. By detaching an instance from the default software source and instead attaching it to one you defined, you gain full control over which packages or updates are available to this instance, even if the repository provided by OSMS might contain newer versions. This is exactly the tool you need for the patch train: Each "release" of "Great Linux" is a new custom software source that lists all the package versions which have passed the testing process. Once you have created the custom software source, all you need to do is replace any software source previously attached to the instances in a group with this new custom software source. You can then leave it up to a scheduled job to apply these updates to the instances. The next sections will cover how to create a custom software source, along with a few handy tools to operate the full workflow.

A Few Building Blocks

For the overall workflow, we'll use several scripts. They'll work with Managed Instance Groups, to list the instances in a group, to query installed or available packages, etc. For that, there are a few common operations:

Getting the details of a group by group name

Assuming you have the name of a group in "GroupName" and "osms" is an instance of the python osms client, you can then get the details of this group by:

try:
    AllGroups=osms.list_managed_instance_groups(CompartmentID)
except oci.exceptions.ServiceError as e:
    print(f'Error reading OSMS groups:\n\t{e.message}')
    sys.exit(1)
try:    
    GroupSummary=[group for group in AllGroups.data if group.display_name==GroupName][0]
except IndexError:
    print(f'Group "{GroupName}" not found.')
    sys.exit(1)    
Group=osms.get_managed_instance_group(GroupSummary.id)
Once you have the group, you can iterate over all the instances of that group and get the OSMS details for each instance like this:
for instance in Group.data.managed_instances:
    print(f'{instance.display_name} : ')
    managedinstance = osms.get_managed_instance(instance.id)

Once you have an InstanceID, getting the list of all installed packages is just one API call:

PackagesInstalled = oci.pagination.list_call_get_all_results(
          osms.list_packages_installed_on_managed_instance,instanceID)

Building a Custom Software Source

The easiest way to create a new custom software source is to take an existing instance, patch it with all the latest updates and then test whether this instance - and thus this combination of packages and package versions - will meet all the requirements. (These could be security compliance, application compatibility, completeness for monitoring, etc.)  Once the instance has passed these tests, it can be taken as a blueprint from which to create a custom software source. This is in many ways similar to how golden images are usually created, just that here you create a software source, not an image. But we will call this instance a "golden instance" for the purpose of this example.

Using the above building blocks, all you need to do is query the instance for the list of currently installed packages. Then you need to prepare this data in the format expected by the API (skipping a few special packages), create a new software source and populate it with this list:

InstalledNames=list([package.name for package in PackagesInstalled.data])
AllPackageNames = list(filter(lambda a: not (a.startswith('gpg-pubkey') 
                                          or a.startswith('ksplice-uptrack-release') 
                                          or a.startswith('oracle-cloud-agent')),
                               InstalledNames ))
print (f'Creating new parent software source {SourceName}...',end='')
try:
    software_source_details = CreateSoftwareSourceDetails(compartment_id=CompartmentID,
                                                      display_name=SourceName,
                                                      description='Sensible description here',
                                                      arch_type='X86_64')
except oci.exceptions.ServiceError as e:
    print(f'Error creating software source details:\n\t{e.message}')
    sys.exit()

try:
    software_source = osms.create_software_source(software_source_details)
except oci.exceptions.ServiceError as e:
    print(f'Error creating software source:\n{e.message}')
    sys.exit(1)    
print ('ok')

print ('Adding packages to software source...',end='')
try:
    packages = AddPackagesToSoftwareSourceDetails(package_names=AllPackageNames)
    osms.add_packages_to_software_source(software_source.data.id, packages)
except oci.exceptions.ServiceError as e:
    print(f'Error adding packages to software source:\n{e.message}')
    sys.exit(1)    
print ('ok')    

This creates a new custom software source in your compartment. Next, populate a group with a few systems and replace these systems' default software sources with your own. In this example, you'll just get a list of all instances in the compartment and add them to an already existing group. (Note that systems can be members of multiple groups. Also note that instances need to be running for this to work.)

Instances = oci.pagination.list_call_get_all_results(compute.list_instances,CompartmentID)
for instance in Instances.data:
    print (f'Adding {instance.display_name} to group {GroupName}...',end='',flush=True)
    if (instance.lifecycle_state=='RUNNING'):
        try:
            osms.attach_managed_instance_to_managed_instance_group(Group.data.id,instance.id)
            print ('ok')
        except oci.exceptions.ServiceError as e:
            print(f'Error attaching instance:\n\t{e.message}')
    else:
        print ('not running - skipped.')
print ('done')

Once all the systems are in the group, you detach them from their previous (parent) software source:

for instance in Group.data.managed_instances:
    print(f'{instance.display_name} : ',end='')
    managedinstance = osms.get_managed_instance(instance.id)
    psource=managedinstance.data.parent_software_source
    detached=False

    if psource is not None:
        detachdetails = oci.os_management.models.DetachParentSoftwareSourceFromManagedInstanceDetails(
                        software_source_id=psource.id)
        try:
            osms.detach_parent_software_source_from_managed_instance(instance.id,detachdetails)
            print(f'Detached {psource.name}')
            detached=True
        except oci.exceptions.ServiceError as e:
            print(f'Error unsubscribing from software source:\n{e.message}')
    if not detached:
        print ('Nothing detached')
print('done')    
And of course, attach them to the new software source, given in "SourceName":
AllSources=osms.list_software_sources(CompartmentID)
try:    
    Source=[source for source in AllSources.data if source.display_name==SourceName][0]
except IndexError:
    print(f'Software Source "{SourceName}" not found.')
    sys.exit(1)

SourceDetails=oci.os_management.models.AttachParentSoftwareSourceToManagedInstanceDetails(software_source_id=Source.id)
print(f'Attaching software source to instances:')
for instance in Group.data.managed_instances:
    print(f'{instance.display_name} : ',end='')
    try:
        osms.attach_parent_software_source_to_managed_instance(instance.id,SourceDetails)
        print (f'Attached {Source.display_name}')
    except oci.exceptions.ServiceError as e:
        print(f'Error attaching software source:\n\t{e.message}')
print('done')

Once this is done, all the systems are attached to the correct software source. Assuming there is a scheduled job that will run at regular intervals, these systems will be patched to the level defined by this software source. If you add new systems to the group, simply rerun the script to attach them to the correct software source. (This is not automatically done by adding them to the group, rather, it has to be done for each system individually.)

 

Completing the Lifecycle

Essentially, with the above, you have all the tools you need to keep your fleet of systems at a patch level which you control through the use of custom software sources. To provide updates to these systems, all you need to do is patch the golden instance with a new set of updates, complete any required testing, create a new custom software source and replace the old software source with this newer one for all the instances in the group. This, of course, can be done with the scripts discussed so far. Here are a few more hints what else is possible with the OSMS API to make life a bit easier:

Manually Triggering Updates

Instead of creating a one-off job to install a set of updates in a group, you can also use a little script to achieve the same. Here's an example how to install all available security updates to all instances in a group. You could, for example, run this on your group right after completing the steps above - instead of waiting for the scheduled job to do this for you.

Group=osms.get_managed_instance_group(GroupSummary.id)
print(f'Applying security updates to instances:')
for instance in Group.data.managed_instances:
    print(f'{instance.display_name}')
    instanceID = instance.id
    updates = oci.pagination.list_call_get_all_results(
              osms.list_available_updates_for_managed_instance,instanceID)
    for update in updates.data:
        if update.update_type=='SECURITY':
            print (f' - {update.display_name} {update.installed_version} -> {update.available_version}')
            updatename=update.display_name+'-'+update.available_version+'.'+update.architecture
            print (f'   ->  {updatename}')
            osms.install_package_update_on_managed_instance(instanceID,updatename)
    print(f'done with {instance.display_name}')
print('Done')

The above iterates over all instances in a group and for each instance, gets the list of available updates. Then, for each update, if it is of type "SECURITY", it will ask the API to create a work request to apply that update to the instance. After this script completed, you will see all of these work requests in the details page of each managed instance in the web console, for example.

Reporting

Of course, you will also want to report on the status of the systems you manage. Here's how to run through the instances in a group and count the number of outstanding patches:

InstanceStatistics=collections.defaultdict(dict)
print ('Scanning instances for available updates...',end='',flush=True)
for instance in Group.data.managed_instances:
    PackageUpdates = oci.pagination.list_call_get_all_results(
                     osms.list_available_updates_for_managed_instance,instance.id)
    print ('.',end='',flush=True)
    InstanceStatistics[instance.id]['SECURITY']=0
    InstanceStatistics[instance.id]['BUG']=0
    InstanceStatistics[instance.id]['ENHANCEMENT']=0
    InstanceStatistics[instance.id]['OTHER']=0
    for update in PackageUpdates.data:
        InstanceStatistics[instance.id][update.update_type]+=1
print ('done')    
You can imagine all sorts of statistical and graphical representations of this data. A very simple table showing the number of patches missing in each category, and whether or not any security patches are missing, could look like this:
CompliantSystems=0
data_matrix=[['Instance Name','Security Patches','Bug Patches',
              'Enhancement Patches','Security Compliant']]
for i in Group.data.managed_instances:
    if InstanceStatistics[i.id]['SECURITY']==0:
        CompliantSystems+=1
    data_matrix.append([i.display_name,InstanceStatistics[i.id]['SECURITY'],
                      InstanceStatistics[i.id]['BUG'],
                      InstanceStatistics[i.id]['ENHANCEMENT'],
                      InstanceStatistics[i.id]['SECURITY']==0])

head = ff.create_table([['System Count for "'+GroupName+'"',
                         'Not Security Compliant','Security Compliant','Security Compliant %'],
                        [len(data_matrix)-1,len(data_matrix)-1-CompliantSystems,
                         CompliantSystems,CompliantSystems/(len(data_matrix)-1)*100]])
fig = ff.create_table(data_matrix)
headoutput=plotly.io.to_html(head,full_html=False)
figoutput=plotly.io.to_html(fig,full_html=False)

with open(OutFile+".html","w",newline='') as outfile:
    outfile.write(headoutput)
    outfile.write(figoutput)
print(f'Report available in {OutFile}.html')
Sample output would look like this

Sample Compliance Report

With this, you've seen a full cycle of how to automate the patching of your systems while at the same time remaining in full control over which package versions are shipped to your fleet. You've also seen how to use the OSMS API to provide reports on your systems and to enhance the existing functionality. Happy patching!

Stefan Hinker

After around 20 years working on SPARC and Solaris, I am now a member of A-Team, focusing on infrastructure on Oracle Cloud.


Previous Post

eBGP On-Premise to OCI fast failover detection

Kevin Miles | 5 min read

Next Post


How to connect in OCI between Tenancies and across Regions

Marius Radulescu | 13 min read