Introduction

Oracle Cloud Infrastructure (OCI) Vision is a serverless, multi-tenant service, that performs deep-learning–based image analysis at scale. It is one of the services that come under the OCI AI Services. 

OCI Vision helps you to identify and locate objects, extract text, and identify tables, document types, and key-value pairs from business documents. There are prebuilt models available out of the box, and developers don’t need ML expertise to work with them.

This service is accessible, using the Console, REST APIs, OCI SDK, or OCI CLI.
 
OCI Vision’s features are split into Document AI for document-centric images, and Image Analysis for object and scene-based images. 
Document AI comes with the following pre-trained models.

  • Optical Character Recognition (OCR): Vision can detect and recognize text in a document.
  • Document classification: Vision can classify a document, for example, whether the document is a tax form, an invoice, or a receipt.
  • Language classification: Vision classifies the language of a document based on its visual features.
  • Table extraction: Vision extracts content in tabular format, maintaining the row and column relationships of cells.
  • Key-value extraction: Vision identifies values for common fields in receipts.
  • Optical Character Recognition (OCR) PDF: Vision generates a searchable PDF file in your Object Storage.

For this solution, we are using the Key-value extraction model in Document AI.  

OCI Vision can automatically identify from a document of type Receipt, the fields like  Merchant Name, Merchant Phone Number, Merchant Address, Transaction Date, Transaction Time, Total, Subtotal, Tax, Tip, Item Name, Item Price, Item Quantity, Item Total Price,  and return them as a key-value pair. 

Receipts can be in the following formats: JPEG, PDF, PNG, and TIFF.

Below image, shows a sample extraction of key-value pairs from a Receipt.

 

Vision Receipt

 

In this blog, I will show how the creation of expense reports can be automated using OCI Vision.
Fusion Expenses use case is chosen in this blog, since it is familiar to many people; however the code and patterns demonstrated here can be used for many other use cases. As the code sample is only a prototype to show the use of OCI Vision, it considers only the minimal data required for creating an expense report.

 

Architecture Diagram

The following diagram illustrates the logical flow for this reference architecture.

Architecture
 


Solution Components

The architecture uses OCIVision, OCI Functions,  OCI Events, OCI Object Storage, OCI Vault and Oracle Visual Builder.

The Flow

The following section describes the flow in this architecture.

  1. Use the camera/File Picker option in the mobile application to upload images of the receipts.
  2. When Submit button in the mobile application is clicked, OCI Object Storage REST APIs are invoked to store the receipt images in an Object Storage bucket.
  3. The creation of objects in the bucket triggers an OCI Event Rule that invokes an OCI Function. 
  4. OCI Function calls the analyze_document API of OCI Vision SDK for each image in the bucket. analyze_document API generates json response containing values of fields like Transaction Date, Total, Merchant Name from the receipt images. Function reads & parses this json response to get the key-value pairs. It then calls the Fusion SaaS Expenses REST APIs to create expense reports by passing those key-value pairs. To call the Expenses REST APIs, the user credentials of a user with Expense Manager role is used. These credentials are obtained from OCI Vault.

Detailed steps are given below.

Step 1. Capture Receipt images

Mobile application using Oracle Visual Builder

 

The mobile application to capture the receipt images is a Progressive web application (PWA) developed in Oracle Visual Builder. Users can click the photos of receipts using the camera or select images from the gallery and submit them for processing.

 

Mobile app

 

When Submit button is clicked, images get stored in a bucket in OCI Object Storage. The additional details required for expense creation like expense purpose, expense currency, PersonID of the user who submits the expenses get stored as a file, metadata.json in the same bucket.

 

Oracle Visual Builder uses a service connection to OCI Object Storage for storing files in bucket.


The service connection is to the PutObject endpoint of the OCI Object Storage API.

Object Storage Server

 

PutObject

 


The Oracle Visual Builder Action Chain for Submit button calls the Object Storage PutObject API to store the captured receipt images in the bucket. Once all images are stored in bucket, Action Chain calls the Object Storage’s PutObject endpoint to store metadata.json, in the same bucket. metadata.json file will have values like the one shown below. 

{“ReimbursementCurrencyCode”:”USD”,”Purpose”:”Customer Visit”,”PersonId”:”123552323″}            

Refer this link for details on how to use Visual Builder for uploading files to OCI Object Storage.

Step 2. Store captured images for further processing

OCI  Object Storage


OCI Vision supports documents from a local file or the ones stored in Object Storage. There is 1 bucket used in this sample – receipts-bucket and it stores the receipt images and metadata.json file.

This bucket has ‘Emit Object Events’ property enabled.

Step 3. Trigger OCI Functions 

OCI Events 

 

There are rules defined in Events to trigger a Function when objects get created in receipts-bucket.

 

Event Rules


Step 4. Call OCI Vision SDK to extract receipt content and create expense reports.

OCI Functions

One Function is used in this solution, and it performs the following tasks.

  • Analyze the receipt images in receipts-bucket using Vision SDKs.
  • Gets the key values like transaction date, total amount, merchant name, etc from document analysis result.
  • Gets the SaaS expense admin user credentials from OCI Vault. This admin user credentials is already stored in the Vault with secret name as expense_admin.
  • Create Expense header, lines, and attachments using the extracted values by calling Fusion SaaS REST APIs.

Application Configuration

OCI Function Application has the following configuration keys. 

Function Application Configuration

 

Application Config

 

Function Code

Initialization and the handler methods

 

import base64
import io
import json
import logging
import os
import re
from datetime import datetime
import oci.object_storage
import pandas as pd
import requests
from fdk import response
try:
   signer = oci.auth.signers.get_resource_principals_signer()
   object_storage_client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)
   if os.getenv("COMPARTMENT_OCID") is not None:
      compartment_ocid = os.getenv('COMPARTMENT_OCID')
   else:
      raise ValueError("ERROR: Missing configuration key  COMPARTMENT_OCID ")
   if os.getenv("SAAS_API") is not None:
      saas_api = os.getenv('SAAS_API')
   else:
      raise ValueError("ERROR: Missing configuration key  SAAS_API ")
   if os.getenv("NAMESPACE_NAME") is not None:
      namespace = os.getenv('NAMESPACE_NAME')
   else:
      raise ValueError("ERROR: Missing configuration key  NAMESPACE_NAME ")
   if os.getenv("INPUT_STORAGE_BUCKET") is not None:
      input_storage_bucket = os.getenv('INPUT_STORAGE_BUCKET')
   else:
      raise ValueError("ERROR: Missing configuration key  INPUT_STORAGE_BUCKET ")
   if os.getenv("VAULT_OCID") is not None:
      vault_ocid = os.getenv('VAULT_OCID')
   else:
      raise ValueError("ERROR: Missing configuration key  VAULT_OCID ")
except Exception as e:
   logging.getLogger().error(e)
   raise
# Handler method
def handler(ctx, data: io.BytesIO = None):
   try:
      process_receipts()
      
   except Exception as handler_error:
      logging.getLogger().error(handler_error)
      return response.Response(
         ctx,
         status_code=500,
         response_data="Processing failed due to " + str(handler_error)
      )
   return response.Response(
      ctx,
      response_data="success"
   )


 

The code snippet below shows the Function code, that calls analyze_document  API of Vision Python SDK to invoke the KEY_VALUE_DETECTION model. This API returns a Response object with data of type AnalyzeDocumentResult. This result contains the extracted key-values along with other information like confidence percentage, bounding_polygons, etc.

def analyze_using_vision(object_name):
   ai_vision_client = oci.ai_vision.AIServiceVisionClient({}, signer=signer)
   analyze_response = ai_vision_client.analyze_document(
      analyze_document_details=oci.ai_vision.models.AnalyzeDocumentDetails(
         document=oci.ai_vision.models.ObjectStorageDocumentDetails(
            source="OBJECT_STORAGE",
            namespace_name=namespace,
            bucket_name=input_storage_bucket,
            object_name=object_name),
         features=[
            oci.ai_vision.models.DocumentKeyValueDetectionFeature(
               feature_type="KEY_VALUE_DETECTION")
         ],
         compartment_id=compartment_ocid,
         language="ENG",
         document_type="RECEIPT"
      ))
   return str(analyze_response.data.pages)

Here is a part of the json generated by analyze_document API.

 

[{
  "detected_document_types": null,
  "detected_languages": null,
  "dimensions": {
    "height": 747.0,
    "unit": "PIXEL",
    "width": 360.0
  },
  "document_fields": [
    {
      "field_label": {
        "confidence": 0.9999961,
        "name": "MerchantName"
      },
      "field_name": null,
      "field_type": "KEY_VALUE",
      "field_value": {
        "bounding_polygon": {
          "normalized_vertices": [
            {
              "x": 0.19444444444444445,
              "y": 0.0428380187416332
            },
            {
              "x": 0.7942141215006511,
              "y": 0.0428380187416332
            },
            {
              "x": 0.7942141215006511,
              "y": 0.0583598482880088
            },
            {
              "x": 0.19444444444444445,
              "y": 0.0583598482880088
            }
          ]
        },
        "confidence": null,
        "text": null,
        "value": "FISH & CHIPS RESTAURANT",
        "value_type": "STRING",
        "word_indexes": [
          0,
          1,
          2,
          3
        ]
      }
    },
    {
      "field_label": {
        "confidence": 0.9999987,
        "name": "MerchantAddress"
      },
      "field_name": null,
      "field_type": "KEY_VALUE",
      "field_value": {
        "bounding_polygon": {
          "normalized_vertices": [

 

Following code is to get the metatadata content from metadata.json , admin user credentials from OCI Vault and for processing  the receipts.

 

# Process images in the bucket
def process_receipts():
   # Get the list of objects in input bucket
   objecs_list = object_storage_client.list_objects(namespace, input_storage_bucket)

   admin_credential = get_secret_from_vault("expense_admin")

   try:
      object_names = [b.name for b in objecs_list.data.objects]
      expense_metadata = get_metadata_object_content()
      expense_report_id = create_expense_header(expense_metadata, admin_credential)

      for x in object_names:
         object_name = str(x)
         if object_name != "metadata.json":
            # Analyze all files other than metadata.json in object storage
            vision_json = analyze_using_vision(object_name)
            # Create lines and attachments corresponding to each imnage in the bucket

            create_expense_lines_and_attachments(vision_json, object_name, expense_report_id, expense_metadata,
                                        admin_credential)
   except Exception:
      raise e

# Get the contents in metadata.json
def get_metadata_object_content():
   metadata_object = object_storage_client.get_object(namespace, input_storage_bucket, "metadata.json")
   if metadata_object.status == 200:
      content = metadata_object.data.text
      return content

# This method is to get the secret content stored in vault using the secret name
def get_secret_from_vault(vault_secret_name):

   # get the secret client
   client = oci.secrets.SecretsClient({}, signer=signer)

   # Read the secret content
   secret_content = client.get_secret_bundle_by_name(secret_name=vault_secret_name,
                                         vault_id=vault_ocid).data.secret_bundle_content.content
   decrypted_secret_content = base64.b64decode(secret_content).decode("utf-8")
   return decrypted_secret_content
def create_expense_header(expense_metadata, admin_credential):
   expense_header = json.loads('''{

      "targetRestApi": "''' + saas_api + '''",
      "targetRestApiOperation": "POST",
      "targetRestApiPayload": ''' + expense_metadata + '''
       ,
      "targetRestApiHeaders": {

         "Content-Type": "application/vnd.oracle.adf.resourceitem+json",
         "Authorization": "''' + admin_credential + '''"
      }
   }''')
   response_content = execute_rest_api(expense_header)
   api_call_response = json.loads(response_content['response_text'])
   expense_report_id = api_call_response['ExpenseReportId']
   logging.getLogger().info("expense_report_id", expense_report_id)
   return expense_report_id


Following is a code snippet that parses the  analyze_document  response using pandas library. 
These extracted values are later used to create Expense Header and Lines. Function also gets the actual receipt images stored in the OCI Object Storage bucket and attaches them to respective Expense lines.

def create_expense_lines_and_attachments(vision_json, input_file_name, expense_report_id, expense_metadata,
                               admin_credential):
   try:
      # Parse the vision json
      data = json.loads(vision_json)
      df = pd.DataFrame.from_dict(data)
      temp_dict = {}
      counter = 0
      for entry in df['document_fields'][0]:
         temp_dict[counter] = entry
         counter += 1
      new_dataframe = pd.DataFrame.from_dict(temp_dict, orient='index')
      total = 0
      merchant_name = ""
      formatted_date = ""
      # Get the key-value section
      for ind in new_dataframe.index:
         if new_dataframe['field_type'][ind] == 'KEY_VALUE':
            fieldLabel = new_dataframe['field_label'][ind]
            filedValue = new_dataframe['field_value'][ind]
            if str(fieldLabel['name']) == "TransactionDate":
               transaction_date = str(filedValue['value'])
               try:
                  formatted_date = str(datetime.strptime(transaction_date, '%m/%d/%Y').date())
               except ValueError as ve:
                  formatted_date = str(datetime.strptime(transaction_date, '%m/%d/%y').date())
               except Exception :
                  pass
            if str(fieldLabel['name']) == "MerchantName":
               merchant_name = str(filedValue['value'])
            if str(fieldLabel['name']) == "Total":
               total = filedValue['value']
      total_numeric = re.findall('\d*\.?\d+', total)
      metadata_values = json.loads(expense_metadata)
      currency_code = metadata_values["ReimbursementCurrencyCode"]
      # Construct the API
      expense_lines = '''{
                            "targetRestApi": "''' + saas_api +  str(
         expense_report_id) + '''/child/Expense",
                            "targetRestApiOperation": "POST",
                            "targetRestApiPayload": {
            "ExpenseType": "Breakfast",
             "ReceiptAmount": ''' + str(total_numeric[0]) + ''',
            "ReceiptDate": "''' + formatted_date + '''",
            "ReceiptCurrencyCode": "''' + currency_code + '''",
            "ReimbursableAmount": ''' + str(total_numeric[0]) + ''',
             "MerchantName":"''' + merchant_name + '''",  
            
            "ReimbursementCurrencyCode": "''' + currency_code + '''"
        },
                            "targetRestApiHeaders": {
                                "Content-Type": "application/vnd.oracle.adf.resourceitem+json",
                               "Authorization": "''' + admin_credential + '''"
                            }
                        }'''

      api_response = execute_rest_api(json.loads(expense_lines))
      api_response_body = json.loads(api_response['response_text'])
      expense_report_line_id = api_response_body['ExpenseId']
      input_file = object_storage_client.get_object(namespace, input_storage_bucket, input_file_name)

      if input_file.status == 200:
         file_contents = input_file.data.content

         encoded_file_contents = base64.b64encode(file_contents)

         attachments = json.loads('''{
                            "targetRestApi": "''' + saas_api + str(
            expense_report_id) + '''/child/Expense/''' + str(expense_report_line_id) + '''/child/Attachments",
                            "targetRestApiOperation": "POST",
                            "targetRestApiPayload":
            {
        "DatatypeCode" : "FILE",
        "FileName": "''' + input_file_name + '''",
        "Title": "attachment",
        "Description": "attachment",
       "FileContents":"''' + str(encoded_file_contents, 'utf-8') + '''"

        },
                            "targetRestApiHeaders": {

                                "Content-Type": "application/vnd.oracle.adf.resourceitem+json",
                                "Authorization": "''' + admin_credential + '''"
                            }
                        }''')

         execute_rest_api(attachments)

   except Exception as lines_and_attachments_ex:
      logging.getLogger().error("Error occured in create_lines_and_attachments", str(lines_and_attachments_ex))

# Common method for executing any REST API
def execute_rest_api(content):
   target_rest_api = content["targetRestApi"]
   target_rest_api_operation = content["targetRestApiOperation"]
   target_rest_api_payload = content["targetRestApiPayload"]
   target_rest_api_headers = content["targetRestApiHeaders"]
   if target_rest_api_operation == 'POST':

      api_call_response = requests.post(target_rest_api, data=json.dumps(target_rest_api_payload),
                                headers=target_rest_api_headers)
   elif target_rest_api_operation == 'PUT':

      api_call_response = requests.put(target_rest_api, data=json.dumps(target_rest_api_payload),
                               headers=target_rest_api_headers)

   content['response_text'] = api_call_response.text
   content['status_code'] = api_call_response.status_code
   logging.getLogger().info("api response ", api_call_response.text)

   return content

 


Inserting expense record into Oracle Fusion SaaS

Expenses are created in Saved state, by using the Fusion SaaS REST APIs. 3 APIs are used,

Conclusion

  Business document processing can be a time consuming and labour intensive process.I hope this blog has given you some ideas on how you can use OCI Vision with other cloud native services and Oracle VIsual Builder to provide solutions in automating document processing.