Customers could use Oracle Cloud Infrastructure (OCI) Alarms to notify their IT staff about events happening in their environment that need attention. Some customers have a central service management system (also known as ticketing system) where they track and manage all such events via alerts and tickets. I recently helped a customer with a solution for managing OCI alarms in their ServiceNow system which is one of the popular service management systems. I think that other customers may also find the use-case of some value. Hence, in this blog post, I want to outline the design of the solution and provide configuration/implementation guidelines so that it is easy for others to build similar solutions. Here is a high level architecture diagram of the solution.
The simple example use-case that I am going to use for the blog is this: We have an alarm set-up for memory utilization going beyond a certain threshold on a server. This alarm notification should end-up in creating a ticket in ServiceNow. In the screen captures below you will find that the thresholds were set to a low value but that was done so that we could trigger the alarm and test the solution without worrying about running heavy workloads on the system. The threshold value is just a number and easily set to an appropriate value in the alarm definition.
Lets’ start with the major system components that come together and interact to make this solution possible.
With that brief introduction to the principal components of the solution, we are now ready to talk about the design. We used Oracle Functions for this integration between OCI alarms and ServiceNow. The design of the solution is quite simple. OCI triggers alarms either when thresholds defined in alarm definitions are violated or in case of absence of metrics which indicates a resource that is down or unreachable. Here I am going to use alarms based on thresholds as an example. Alarm notifications are sent to a topic. A notification topic has various defined subscriptions. In this case, we used a Function subscription. The Function is invoked by OCI Notifications service. We need to build logic as part of the Function code to transform the OCI notification data to the form as expected by ServiceNow and then invoke ServiceNow APIs to create/manage tickets there.
In the remaining part of the post I am going to describe function implementation in detail. Rather than repeating the product documentation here, I will provide references to the relevant parts and focus mostly on the design elements of the solution.
An alarm message belongs one of these four types: OK_TO_FIRING, FIRING_TO_OK, REPEAT and RESET. These message types are described here. Alarm message data depends on the message types. Message type data formats are described here. The documentation has an example of an alarm message. I have included some more examples below.
Following are some salient points about the alarm messages:
Please make a note of the “resourceDisplayName” (pointed by a red arrow). We will see this value mapped to “Node” field in ServiceNow Event. Also, alarm body is mapped to “Description” field in the event.
Please note that the “dedupekey” is identical for both these notifications
Having covered the basics of alarms and message types we are now ready to talk about the Function that could be used for the integration.
Notification data is identical for all subscription types. For alarm notifications, the data is a serialized JSON object like what is shown above.
If you are new to Oracle Functions, here is good quick-start tutorial to get you up to speed. The service documentation is here and more details on how they work are here
Oracle Functions is based on open source Fn project. You can use multiple programming languages to develop your function code. You have complete control of how you want to build the integration. You could also invoke OCI APIs from inside your function code for various-use-cases. For example, instead of storing sensitive information like passwords, tokens and other secrets in plaintext in either Function configuration or environment variables, you could manage secrets in OCI Vault service and read them using APIs which is obviously a much better and secure way. There is a nice blog post by my colleague Kiran Thakkar on the subject complete with code examples.
I am going to use Python as the language of choice for this example. However, the concepts remain the same in other languages as well. In Python, the entry point of your Function is the handler method which has the following signature:
def handler(ctx, data: io.BytesIO = None):
As mentioned above, in case of alarm notifications this input data is a serialized JSON object, which could be converted to JSON using something like:
funDataStr = data.read().decode('utf-8')
funDataJSON = json.loads(funDataStr)
Once you have the JSON object you could use the usual JSON manipulation techniques to extract and analyze the contained information and take appropriate action.
Functions also accept configuration parameters that are passed in the “ctx” parameter. For example, we can pass ServiceNow URL, User Id and Password as function’s configuration parameters so that we don’t need to hard-wire any of these inside function code. As mentioned above, we should store the User Id and Password in a vault and pass the OCIDs of these secrets into the function configuration. The function can then use OCI APIs to get the secrets from the vault. You will need to grant the function privileges for reading secrets from a particular vault using OCI IAM Policies. Here is a screen capture of the function’s configuration:
This data is available inside the function in a dictionary like object:
ctxConfig = ctx.Config()
snowURL = ctxConfig['SNOW_URL']
snowUsrIDSec = ctxConfig['SNOW_USER_ID_SEC']
snowUsrPwdSec = ctxConfig['SNOW_USER_PWD_SEC']
Please note that “snowUsrIDSec” and “snowUsrPwdSec” are OCIDs of the corresponding secrets stored in an OCI vault. OCI provides APIs to read secrets from vault provided proper authorizations are in place. The Python SDK APIs are here. Kiran’s blog also has samples in Python.
For integration with ServiceNow, the information from alarm notification data could be used to create a ServiceNow event. The attributes of the event could simply be derived by extracting and mapping (/transforming) the information contained in the alarm data. You have the full freedom to create and enrich the ServiceNow event as per your use-cases by using OCI APIs and the power that the programming environment provides. Here are a couple of examples of such mapping:
OCI Severity Level | ServiceNow Severity Level |
CRITICAL | 1 |
WARNING | 2 |
ERROR | 3 |
INFO | 4 |
Once you have constructed the event, it could be sent via a HTTP POST call to the “https://<Instance Name>. service-now.com/api/now/table/em_event” end-point exposed by ServiceNow. Please note that you need to provide appropriate authentication information along with the API call.
Please keep in mind that the purpose of this post is to demonstrate the rich integration capabilities that OCI Notifications and Functions provide for managing and servicing your alarms in a service management system like ServiceNow rather than prescribing any API on the ServiceNow side or any particular mapping/transformation for creating the ServiceNow event.
For advanced integration scenarios, you could also keep track of (in a cache) what individual resources the alarm has fired for so that when the alarm switches back to OK, appropriate CLEAR events could be created in ServiceNow and information correlated.
Let’s look at how the whole integration is wired. Since pictures are worth thousand words, I will let them speak:
I have already provided a screen capture of the Function definition above.
When the alarm fires, the notification is invoked which in turn triggers the Function and if the Function is set-up properly, it will in turn create an event in ServiceNow. Following are the screen captures from ServiceNow:
That completes our design and configuration/implementation of the use-case.
Hopefully this will enable you to manage your OCI alarms in a centralized way in your service management systems, if your use-cases ask for it.
Previous Post
Next Post