X

Best Practices from Oracle Development's A‑Team

Making OCI Metrics Available in Oracle Management Cloud

Oracle Cloud Infrastructure (OCI) Monitoring Service provides a convenient way of accessing metric data for multiple OCI services (listed in “Supported Services” section of the Monitoring Overview page). OCI customers can access metric data in multiple ways – in OCI console or by using the Grafana Plug-in for creating dashboards for querying and displaying monitoring data in graphical format or by using SDKs/APIs. New features for OCI are being released at a very fast pace.

Oracle also provides another solution known as Oracle Management Cloud (OMC) which is a suite of integrated monitoring, management, and analytics cloud services. The offerings in the OMC suite provide real-time monitoring, alerting, rapid diagnostics, operational insight, and business analytics for customers’ enterprise and cloud IT landscape. 

OCI Customers who also use OMC might want to access OCI metrics in OMC so that they have only once place to go to get a unified view of metrics for their IT assets.

This blog presents a design outline for a solution to export metrics data out of OCI and upload/import into OMC. Please note that there are multiple integration approaches possible. The components that extract data out of OCI using OCI Monitoring Service APIs and upload data into OMC using OMC REST APIs are the crux of the solution and will need to be used by all the various approaches. These are the components that are the subject of this blog post. Depending on customers’ use-cases and preferences, there are various ways of packaging the export/import logic. For example, the following are some of the possible options:

  • A Serverless Function deployed on the OCI Functions platform and executed on demand/periodically.
  • Using OCI Streaming service and Publishers/Consumers for exporting/importing metrics data.
  • A program that runs as a daemon process and periodically, extracts data from OCI and uploads into OMC.

This blog uses Oracle Functions as the delivery mechanism. However, please keep in mind that this is not the only option. Because this is not a blog post about Oracle Functions, those details are out of scope. Readers are expected to have basic familiarity with OCI Monitoring, OCI Functions, OMC and Oracle Identity Cloud Service (IDCS – for OAuth authentication).

Lets’ start by looking at the main components of the solution at a high level: We will describe each step in more detail in the remainder of the post:

  • We will use OMC REST APIs for uploading metrics data into OMC. Using OAuth for authentication is better as compared to using user name/password. Hence first we need to enable OAuth authentication for OMC REST APIs which can be done for the most part using the UI.
  • Define OMC entity types, if needed and entities. Preexisting OMC entity types could also be used. Defining new entity types will help keep the OCI related metrics data separate from other entity types. Note that the new entities will need to be licensed in OMC once they are created.
  • Develop an Oracle Serverless Function that uses OCI monitoring APIs to extract metric data from OCI and uses OMC REST APIs to upload that data into OMC after processing and transforming it to the shape that OMC expects. As expected, most of the magic happens here and we will dedicate a good portion of this blog post to this step.
  • The function can either be invoked using “fn” cli or also using OCI cli and APIs. Schedule the execution of the function using your preferred scheduler to automate the entire solution.

The example that we are going to look at is for extracting CPU utilization metrics of a host from OCI monitoring service. Although for getting CPU utilization and other host based metrics, it is also possible to install an OMC agent on the host and get metrics directly loaded into OMC, there are use-cases where it is not possible to install an agent (for example OCI Load Balancers, DNS Zone Management etc) to get the metrics out and it is only available through the OCI monitoring service or in certain cases a customer might not prefer to install an agent on each one of their hosts. The example of CPU utilization is used as an illustration. The concepts for extracting other metrics remain the same.

Let’s start looking at these components one by one.

Enabling OAuth for OMC REST API Authentication:

OMC REST APIs support multiple ways of authentication along with OAuth as described here. OAuth is a better choice for scenarios like this where there is no interactive user session involved. OAuth authentication can be enabled via UI (for the most part) and the steps are described here. The documentation in its current form is missing a few details (We are working to get these included in the documentation.) which are as follows:

  • The Personal Access Token obtained in one of the earlier steps needs to be used while creating the OAuth client application via the Authorization Header. For example, the following header will need to be added to the cURL commandline for creating the OAuth client app (Step 2) :

-H ‘Authorization: Bearer <Provide OAuth Access Token Here>’

  • The name of the OAuth client application needs to end in “_APPID”
  • Even after completing all the steps carefully, if you still get 401 unauthorized from OMC when presented with an access token obtained from IDCS, open the “OMCEXTERNAL_<OMC TENANT NAME> application in IDCS (you may need to contact an IDCS Administrator for this step):
  • Open the WebTier Policy for this application. Go to the “Resources” section. Start editing the Resource for URI “/serviceapi/.*”  
  • Make sure that Authentication Method for the Resource “/serviceapi/.*” is set to "multitoken" and not  “http” as shown in the screen capture below. If needed make the change. 
  • Wait for cache sweep (about 10-15 minutes) to retry the OMC REST Access.

Here is a screen-capture of WebTier Policy Changes:

After completing these steps, you should be all set to be able to call OMC APIs by presenting an OAuth token generated by IDCS.

Lets’ now move on to creating OMC entity types and entities.

I created a new OMC entity type called “OCIHostType” to represent a host on OCI for my demo purposes by following the OMC REST API documentation for Entity Type operations and an example here. Since this entity type represents a host on OCI, I created the entity type with the metrics definitions that are usually made available by OCI, for example – “cpuUtilization”, “memoryUtilization”, “diskReadIO”, “diskWriteIO”, “networkTransmitBytes”, “networkReceiveBytes” etc. I also defined some OCI attributes for a host as properties of the entity Type, for example – “hostOCID”, “compartmentId”, “faultDomain”, “imageId” etc. These metric and properties definitions should be self explanatory. Please note that all OMC REST API calls will need to be authenticated by providing an IDCS access token.

After defining the entity type, I created an entity for one of my hosts in OCI by following the OMC REST API documentation for Entity operations and an example here. Along with the entity, I also set the above defined properties for the host by getting the corresponding information from OCI. Please don’t forget to license your newly created entity in OMC.

Next up, is creating the required logic for extracting monitoring data from OCI using OCI Monitoring Service APIs. In order to do this, you can either use the OCI SDKs provided by Oracle for some of the most popular languages or you can choose to directly call the OCI REST APIs in a language of your choice. I am going to describe the steps using Python SDK as an example but the steps should be similar for other SDKs as well. SDKs might include examples showing the code usage.

For Python SDK, the MonitoringClient API we need to call is:

summarize_metrics_data (compartment_idsummarize_metrics_data_details**kwargs)

Here is a brief explanation of the arguments:

  • compartment_id – OCID of the compartment that contains the resource(s) for which we need to get metrics
  • summarize_metrics_data_details – Model Object of type SummarizeMetricsDataDetails that has details like metric namespace (for example - oci_computeagent for OCI Compute Instances), start and end times for the metric data to be obtained, metric Query (for example - CPUUtilization[5m].max() – which gets the maximum CPU utilization for a 5 minute interval) and resolution. Please refer to OCI Monitoring Service Overview for more details on how to set these values.

Here is a sample of OCI compute host metric data:

[{
  "aggregated_datapoints": [
    {
      "timestamp": "2019-08-16T18:27:00+00:00",
      "value": 100.0
    },
    {
      "timestamp": "2019-08-16T19:27:00+00:00",
      "value": 100.0
    },
    {
      "timestamp": "2019-08-16T20:27:00+00:00",
      "value": 4.45445445445446
    },
    {
      "timestamp": "2019-08-16T21:27:00+00:00",
      "value": 4.404404404404405
    },
    …….<More entries like these>………….

],
"compartment_id": “<Compartment OCID of the Compute Host>",
"dimensions": {
    "availabilityDomain": "<Availability Domain of the Compute Host >",
    "faultDomain": "<Fault Domain of the Compute Host >",
    "imageId": "<Image OCID of the Compute Host >",
    "instancePoolId": "<Instance Pool of the Compute Host >",
    "region": "<Compute Host Region>",
    "resourceDisplayName": "<Compute Host Display Name>",
    "resourceId": "<Compute Host OCID>",
    "shape": "<Compute Host Shape>"
},
  "metadata": {
    "displayName": "CPU Utilization",
    "maxRange": "100",
    "minRange": "0",
    "unit": "Percent"
  },
  "name": "CpuUtilization",
  "namespace": "oci_computeagent",
  "resolution": null
},
……………
{
……………….
}
]

All API calls to OCI must be authenticated and authorized. For example – OCI Functions can use OCI Resource Principals. In Python, one can obtain the Resource Principal Signer using the following code snippet:

ociResPrncplSigner = oci.auth.signers.get_resource_principals_signer()

and then obtain the Monitoring Service Client Object as follows:

ociMonitoringSvc = oci.monitoring.MonitoringClient(config={}, signer=ociResPrncplSigner)

The next step is to transform OCI metric data to OMC metric data by iterating through the OCI metric data set. Here is a sample mapping for the “OCIHostType” entityType that we defined earlier:

OCI Attributes
OMC Attribute Name Value
entityType OCIHostType
entityName <Name of the defined Entity of type “OCIHostType”>
collectionTime Timestamp from the aggregated data point in UTC format (“YYYY-MM-DDTHH:mm:ss.SSSZ”)
metric <As defined in entityType. For example –cpuUtilization>
value Value from the aggregated data point

Once we have the OMC payload prepared, we use the Upload Metrics OMC REST API to upload the metrics data to OMC. You can find a sample here. OMC should return a “202 accepted” status on successful upload. The response should look similar to the sample response below: It contains a field “ecId” and status (shown in boldface font). The status is typically “IN_PROGRESS”.

{

    "rejected": 0,

    "ecId": "08d0c1c4-788b-63a2-ac7a-af69b2b9f5e3",

    "selfLink": "/serviceapi/entityModel/uds/metrics/status/08d0c1c4-788b-63a2-ac7a-af69b2b9f5e3",

    "statusUri": "/serviceapi/entityModel/uds/metrics/status/08d0c1c4-788b-63a2-ac7a-af69b2b9f5e3",

    "loaded": 0,

    "errorMessage": "",

    "status": "IN_PROGRESS",

    "startTime": "2019-09-10T21:02:37.463Z",

    "count": 0,

    "message": "",

    "size": 85396

}

You can use the Get status for batch metric upload API to check the status of the upload. You will need to provide the ecId as part of the API URL.

Here are two use-case examples of the OMC REST APIs that we used in steps above. These describe the OMC API usage in a step-by step manner in a end-to-end scenario.

As I mentioned above, the solution can be packaged in a variety of ways. My POC sample for this blog post used a Serverless Function developed using Python FDK deployed on OCI. If you are interested in building a Function for this use-case, you can get started by referring to the sample here.

Unless you are building a POC kind of solution, this will need to be scheduled on an on-going basis so that we can continuously upload data from OCI into OMC. Customers can use their favorite scheduler to schedule a solution like this. One of the associated issues in scheduled solutions such as this one is how to maintain the state of where we left off in the last run so that the next run can continue from that point onwards (this information could be used as part of SummarizeMetricsDataDetails model object mentioned above). The end-time of previous run will be start time of the next run. Depending on the methodology, multiple options might be possible. For example, in the case of a OCI Function, we can use the update_function method in FunctionsManagementClient class to store this state as a parameter of the function which can be updated from inside the Function code itself.

Once the data is in OMC, customers can use the data for various use cases similar to data obtained from other Cloud Agents, for example – to create alert rules and generate alerts. Here is a sample alert rule that I created:

After running some stress tests on a server, an alert similar to the following is generated and sent to the configured notification channels:

That completes our solution. We were able to upload OCI metrics into OMC using a custom entity type (as mentioned above, customers can also use an already available entity type in OMC)  and triggered an alert notification based on the uploaded data and alert rule configured in OMC.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha