Sometimes all you want to do is create regular backups of your instances. Policy-Based Backups to the rescue. All you need to do is choose from one of the existing policies or define your own and then apply that to the instance or volume you want to protect. This works like a charm and opens the door to cross-region copies and very simple restore operations. However, what do you do if you have multiple data volumes attached to your instance, in addition to the boot volume (which, as a best practice, should only contain your OS)? The first thought will be for Volume Groups, as they are designed to group multiple volumes into one administrative unit and allow for simultaneous backups of all the volumes in the group. However, you can not control these backups with a policy (yet). Fortunately, it is rather straight forward to bridge this gap with some scripting, for example using the Python SDK for OCI.
In this blog post, I will describe one possible approach for a solution, using a simple use case as an example.
Let's assume we have a system that provides shared files access for Windows clients by running Samba to serve out multiple filesystems. These filesystems are hosted on a set of block volumes attached to this server VM. We want to be able to do several things:
The general idea is simple:
To simplify the usage of such a script, the backup group should be linked to the instance it belongs to. We can do this by querying the instance for its boot volume and then looking up the backup group this boot volume belongs to. Since the API only allows 1:1 relationships between volumes and groups, this is easy.
Taking the instance OCID as the only input, the first step is to do some basic discovery:
compute = oci.core.ComputeClient(ociconfig) instance = compute.get_instance(instanceOCID) def fetch_metadata(instance): comp = instance.data.compartment_id ad = instance.data.availability_domain bva = compute.list_boot_volume_attachments(ad,comp,instance_id=instance.data.id) bootvolume = blockvolume.get_boot_volume(bva.data[0].boot_volume_id) setid = bootvolume.data.volume_group_id if setid != None : try: volumegroup=blockvolume.get_volume_group(setid) except oci.exceptions.ServiceError as e: print(f'Error looking up Volume Group: {e.message}\nExiting') sys.exit(1) print(f'Found Volume Group {volumegroup.data.display_name}') else: # no backup set defined, we need to create the volume group # first, get the list of all attached block volumes and boot volumes volumeIDs = [] blockvolumes = compute.list_volume_attachments(comp,availability_domain=ad,instance_id=instance.data.id).data for b in blockvolumes: volumeIDs.append(b.volume_id) bootvolume = compute.list_boot_volume_attachments(ad,comp,instance_id=instance.data.id).data volumeIDs.append(bootvolume[0].boot_volume_id) # then configure and create the volume group backupgroup=oci.core.models.CreateVolumeGroupDetails( availability_domain=ad, compartment_id=comp, display_name=instance.data.display_name + "_backup_group", source_details=oci.core.models.VolumeGroupSourceFromVolumesDetails(volume_ids=volumeIDs)) volumegroup=blockvolume.create_volume_group(backupgroup) print(f'Created Volume Group {volumegroup.data.display_name}') setid=volumegroup.data.id return(setid)
Here, if we find a valid volume group, we return it's OCID to the caller. Otherwise, we collect a list of all volumes attached to the instance and create a new block volume group, using the instance name as a prefix for the group name. We then return the setid. After returning from this function, we now know that a valid block volume group exists for the instance, with all volumes attached to it.
Next, let's create a first backup:
def do_backup(setid): now=datetime.now() datestring=now.strftime("%Y-%m-%d_%H:%M") backupjob=oci.core.models.CreateVolumeGroupBackupDetails( type='INCREMENTAL', display_name=datestring, volume_group_id=setid) try: backup=blockvolume.create_volume_group_backup(backupjob) print(f'Backup started: {datestring}') except oci.exceptions.ServiceError as e: print (f'Backup failed: {e.message}')As you can see, there's not very much to do here. All we need is a name for the backup - using a sensibly formatted date string is a first approach. Then we just use the OCID of the volume group we got from fetch_metadata and ask the API to create a backup.
Before we go into deleting old backups, let's see if we can get some information about existing backups:
def show_backups(comp,setid): backups=blockvolume.list_volume_group_backups(comp,volume_group_id=setid,sort_by="TIMECREATED",sort_order="ASC") goodbackups=[] for b in backups.data: if (b.lifecycle_state != 'TERMINATED'): goodbackups.append(b) print(f'Available Backups:') count=len(goodbackups) for b in goodbackups: if (count > keep): print (f' - {b.display_name}') else: print (f' * {b.display_name}') count -= 1We might also want a little function to ask for the latest available backup:
def latest_backup(comp,setid): backups=blockvolume.list_volume_group_backups(comp,volume_group_id=setid,sort_by="TIMECREATED",sort_order="ASC") latest=None for b in backups.data: if (b.lifecycle_state != 'TERMINATED'): latest=b return latest
Running this for the first time on our example fileserver:
show_backups(instance.data.compartment_id,backupsetID) latest=latest_backup(instance.data.compartment_id,backupsetID) if (latest is None): print ('No backups available') else: print (f'Latest Backup: {latest.display_name}')The output will be:
$ ./instance.backup.py -s -i ocid1.instance.oc1.eu-frankfurt-1.xyz Created Volume Group smbserver_backup_group Available Backups: No backups availableRunning the first backup:
./instance.backup.py -b -i ocid1.instance.oc1.eu-frankfurt-1.xyz Found Volume Group smbserver_backup_group Backup started: 2020-07-06_11:46And checking the status again:
$ ./instance.backup.py -s -i ocid1.instance.oc1.eu-frankfurt-1.xyz Found Volume Group smbserver_backup_group Available Backups: * 2020-07-06_11:46 Latest Backup: 2020-07-06_11:46
Finally, after having created a few backups, we will want a way to delete old backups we no longer need. For automated backups using policies, you can choose from various options for your retention policy. Here, until this feature catches up to support Block Volume Groups, we'll have to implement these ourselves. For the purpose of this example, a simple "erase all but the newest n backups" should be fancy enough. You're welcome to enhance this on your own ;-)
First, a word of warning: Before deleting backups, make sure that the backups you keep are actually valid! Especially if you run a script like this in short intervals, say once per hour, you will quickly have many backups, leading you to believe that it's safe to delete all but the last 10, for example. However, if you have some systematic error on your server - say it won't boot anymore - that state will also be reflected in your backups. Which means you may well end up with a backup of your server which won't boot, because the error was introduced to the server before the last available backup was made. So always check that a newer backup preserves good state before deleting the older ones!
If we want to keep N backups, we could use a function like this:
def do_deleteBackups(comp,setid,keep): backups=blockvolume.list_volume_group_backups(comp,volume_group_id=setid,sort_by="TIMECREATED",sort_order="ASC") goodbackups=[] for b in backups.data: if (b.lifecycle_state != 'TERMINATED'): goodbackups.append(b) count=len(goodbackups) for b in goodbackups: if (count > keep): print (f'Deleting backup: {b.display_name} ...',end='') result=blockvolume.delete_volume_group_backup(b.id) print ('OK') else: print (f'Keeping backup: {b.display_name}') count -= 1
This will first collect all your "good" (non-terminated) backups, then go through that list and terminate all but the last "keep" ones.
Here's some sample output after we've made a few backups first:
$ ./instance.backup.py -s -i ocid1.instance.oc1.eu-frankfurt-1.xyz Found Volume Group smbserver_backup_group Available Backups: - 2020-07-06_11:46 * 2020-07-06_12:17 * 2020-07-06_12:19 * 2020-07-06_12:22 * 2020-07-06_12:28 Latest Backup: 2020-07-06_12:28 $ ./instance.backup.py -d -k 4 -i ocid1.instance.oc1.eu-frankfurt-1.xyz Found Volume Group smbserver_backup_group Deleting all but the latest 4 backups... Deleting backup: 2020-07-06_11:46 ...OK Keeping backup: 2020-07-06_12:17 Keeping backup: 2020-07-06_12:19 Keeping backup: 2020-07-06_12:22 Keeping backup: 2020-07-06_12:28
Restoring one of your backups is very simple. In the Console, just go to the Volume Group Backup which you want to use for your restore and select "Create Volume Group". A new Volume Group will be created for you, with a restored copy of all the volumes from the backup attached. This usually only takes a few seconds. To finalize the recovery, just spin up a new instance from the restored boot volume and attach the data volumes. If critical, you can even "reuse" the IP addresses of the original VM, although for that, you'll need to terminate it first.
After around 20 years working on SPARC and Solaris, I am now a member of A-Team, focusing on infrastructure on Oracle Cloud.