Protecting OCI Generative AI Endpoints with Web Application Firewall

Generative AI-based applications have quickly gained popularity, with many companies integrating large language models (LLMs) into their operations to enhance customer value. As AI becomes more central to business processes, protecting generative AI endpoints is essential not only for mitigating financial risks but also for safeguarding brand reputation and customer trust. Security threats against AI services, such as adversarial attacks, data poisoning and unauthorized access are on the rise. Refer to OWASP Top 10 for Large Language Model Applications and Mitre ATLAS to learn about different threat vectors in LLMs.

In this blog post, we’ll discuss strategies to protect generative AI endpoints and secure your network. By using OCI Web Application Firewall, you can protect your applications against various OWASP top 10 web attacks, Common Vulnerabilities and Exposures (CVEs), and malicious bot attacks.

Let’s quickly walk through the flow and implementation steps.

Step 1&2: User request goes to Load Balancer that is associated with OCI WAF. All traffic is inspected at this layer and malicious or unauthorized traffic will be blocked at this point. Please refer to this blog post authored by my colleague on setting up WAF rules.

Step 3: The allowed traffic is then sent to private API Gateway, where the user is Authenticated and Authorized using either a Customer Authorizer function or native OAuth validation.
Step 4:Using a custom Authorizer function will allow you to write the Auth logic in multiple supported languages like Python, Java, Go and more and you can implement OAuth, SAML or other Auth mechanisms. Refer to this page for additional details on writing custom Auth functions.
Step 5&6: Backend Route using OCI Function – This function will act as an additional layer to filter content, apply advanced prompt engineering techniques, and then query the data against the LLM deployed on OCI Generative AI service.

Below is an example python code that you can use to query the Cohere Large model. Ensure that the API Gateway has access to invoke functions and that the functions have access to the Generative AI service.

Policy Examples
ALLOW dynamic-group fn-group-name TO use generative-ai-family in TENANCY
ALLOW dynamic-group api-gw-group-name to manage functions-family in TENANCY

import io
import json
from datetime import datetime, timedelta
import logging
import oci

compartment_id = "ocid1.compartment.oc1..xxx"
CONFIG_PROFILE = "DEFAULT"
config = {}
signer = oci.auth.signers.get_resource_principals_signer()
input = "how do I use object Storage service with CLI"

# Service endpoint
endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"

generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config,signer=signer, service_endpoint=endpoint, retry_strategy=oci.retry.NoneRetryStrategy(), timeout=(10,240))

chat_detail = oci.generative_ai_inference.models.ChatDetails()

chat_request = oci.generative_ai_inference.models.CohereChatRequest()
chat_request.message = input
chat_request.max_tokens = 600
chat_request.temperature = 1
chat_request.frequency_penalty = 0
chat_request.top_p = 0.75
chat_request.top_k = 0

# Initialize chat history
chat_history = []

# Add first interaction to chat history
previous_chat_message_1 = oci.generative_ai_inference.models.CohereUserMessage(message="tell me about oci")
previous_chat_reply_1 = oci.generative_ai_inference.models.CohereChatBotMessage(message="OCI stands for Oracle Cloud Infrastructure. It's a suite of cloud services offered by Oracle Corporation for businesses to build, deploy, and manage resources in the cloud. Oracle Cloud Infrastructure provides the foundational services needed to run applications in the cloud, including computing, storage, and networking, along with various high-level services for AI, machine learning, databases, and more.\n\nHere's a breakdown of some key aspects of Oracle Cloud Infrastructure:\n1. Compute Service: This service allows users to provision and manage virtual machines (VMs) or \"instances\" that can run applications. These instances can be quickly scaled up or down based on demand.\n2. Storage Service: OCI offers block storage, which can be attached to instances, as well as object storage for large-scale, scalable data storage. It also provides archival storage for long-term data retention at a low cost.\n3. Virtual Networking: Users can create and manage virtual networks to enable communication between instances, load balancing, and network security.\n4. Autonomous Database: This is a managed database service that uses machine learning to automate many traditional database management tasks, such as provisioning, scaling, backups, and security.\n5. Container Engine: Oracle's container service allows developers to deploy, manage, and scale Docker containers in the cloud.\n6. Functions: This is a serverless computing service that lets developers write and deploy code without managing infrastructure.\n7. Load Balancing: OCI provides load balancing capabilities to distribute incoming application traffic across multiple instances, ensuring high availability and fault tolerance.\n8. Security: Oracle Cloud Infrastructure offers various security features, including identity and access management, encryption, and network security rules.\n9. Monitoring and Management: OCI provides tools for monitoring resources, managing performance, and visualizing cloud usage.\n10. Integration with Other Oracle Services: OCI works seamlessly with other Oracle services like Oracle Functions, Autonomous Database, and Oracle Application Container Cloud.\n\nOracle Cloud Infrastructure is designed to handle enterprise-level workloads and is known for its scalability, reliability, and security. Oracle targets OCI at organizations looking to move their existing workloads to the cloud or develop new cloud-native applications. They offer a range of pricing plans and support options to cater to different customer needs.")
chat_history.extend([previous_chat_message_1, previous_chat_reply_1])

# Add second interaction to chat history
previous_chat_message_2 = oci.generative_ai_inference.models.CohereUserMessage(message="tell me about its sdk")
previous_chat_reply_2 = oci.generative_ai_inference.models.CohereChatBotMessage(message="Oracle Cloud Infrastructure (OCI) provides Software Development Kits (SDKs) that allow developers to interact programmatically with various services offered by Oracle Cloud. These SDKs are designed to make it easier for developers to build, deploy, and manage resources in the Oracle Cloud using applications developed in various programming languages.\n\nOracle Cloud Infrastructure SDK is available in multiple programming languages, including:\n1. Python: The Python SDK allows Python developers to integrate their applications seamlessly with Oracle Cloud Services. It provides a range of client libraries for compute, storage, networking, and other services. Developers can use this SDK to create, configure, and manage resources in their Oracle Cloud environment.\n2. Java: For Java developers, the Java SDK offers a collection of libraries and tools to interact with OCI services. It enables developers to write applications that provision and manage resources, handle storage, and perform other cloud operations.\n3. Go: The Go SDK enables developers who prefer the Go language to work with Oracle Cloud services. It provides packages for the core compute, storage, and network services, making it easier to develop applications that leverage OCI.\n4. JavaScript/TypeScript: The JavaScript/TypeScript SDK allows developers to integrate Oracle Cloud services into Node.js applications. This SDK is useful for developing serverless applications, web services, or front-end applications that interact with OCI.\n5. Ruby: Ruby developers can use the Ruby SDK to interface with Oracle Cloud. This SDK gives them the ability to manage resources, provision instances, and interact with other OCI services.\n6. .NET: The .NET SDK enables C# and VB.NET developers to work with OCI services. It includes client libraries for core cloud services and helps developers build .NET applications that leverage the cloud infrastructure.\n\nThese SDKs are designed to abstract the complexities of interacting directly with the Oracle Cloud API endpoints, making it simpler for developers to integrate and build applications on top of Oracle Cloud Infrastructure. They provide convenient methods and data models that align with the respective programming languages, making the code more readable and easier to write.\n\nIn addition to these language-specific SDKs, Oracle also provides CLI (Command Line Interface) tools that allow users to manage their Oracle Cloud resources from the command line, which can be useful for scripting and automation tasks.\n\nDevelopers can download the appropriate SDK from the Oracle Cloud Infrastructure SDK repository on GitHub, where they can also find detailed documentation, code samples, and instructions for using the SDKs. These SDKs are regularly updated to keep pace with the latest changes and additions to the Oracle Cloud Infrastructure services.")
#chat_history.extend([previous_chat_message_2, previous_chat_reply_2])

# Assign the accumulated chat history to the chat request
chat_request.chat_history = chat_history

# Set chat details and make the request
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceyawk6mgunzodenakhkuwxanvt6wo3jcpf72ln52dymk4wq")
chat_detail.chat_request = chat_request
chat_detail.compartment_id = compartment_id

chat_response = generative_ai_inference_client.chat(chat_detail)

# Print result
print("**************************Chat Result**************************")
print(chat_response.data)

Note: In the example code, we are sending chat history to build context for the LLM and this need to be handled in your client application.

Did you know?

Each inference with a Generative AI service triggers costly computations. For example, using the OCI Large Cohere model, a single inference with 6,000 characters, including chat history, repeated 1 million times, can cost around $12,000. If these AI endpoints aren’t protected, you could face substantial bills.

In this blog post, we explored how network perimeter protection strategies can be effectively applied to generative AI-based applications within OCI environments. The architecture described in this blog can be further extended to build and enhance your custom applications around Generative AI services on OCI.

For more insights into network perimeter protection and generative AI security, be sure to check out other posts in the OCI Security Blog Channel.

Protecting OCI Generative AI Endpoints with Web Application Firewall

Note: In the example code, we are sending chat history to build context for the LLM and this need to be handled in your client application.

Did you know?

Ramesh Balajepalli

Master Principal Cloud Architect

FastConnect, SD-WAN and Traffic Inspection

CIS Landing Zone SIEM Workload Module

Protecting OCI Generative AI Endpoints with Web Application Firewall

Note: In the example code, we are sending chat history to build context for the LLM and this need to be handled in your client application.

Did you know?

Authors

Ramesh Balajepalli

Master Principal Cloud Architect

FastConnect, SD-WAN and Traffic Inspection

CIS Landing Zone SIEM Workload Module