Mashing fresh data from REST APIs with existing data stored in an Oracle database

Imagine a Customer Service department which receives dozen parallel phone calls and needs to quickly help their customers. To do so, they need accurate data for the customer accounts and their transactions. The easiest would be to tap into a Data Warehouse were the data model has been shaped to efficiently retrieve a customer account and its related data. Unfortunately most of the time the Data Warehouse is loaded with a daily or weekly batch so we would miss some information since the last load. We therefore need to also retrieve this missing data from transactional systems such as Fusion ERP.

Fusion ERP has different outbound data extraction mechanisms as described by Angelo in this post. For this near real-time use case where we only look at the delta since the latest Data Warehouse load, REST APIs seems perfect. We need to mash the data retrieved through REST API invocations with data already stored into a database in a timely fashion. We surely don’t want to put customers on hold for one minute before we get an accurate view of their transactions.

Our solution needs to be able to :

Invoke REST APIs to retrieve fresh data
Store the response body
Chain multiple invocations to retrieve the entire set if the data is paginated
Parallel invocations
Mash the response data with existing data stored in an Oracle database
Keep it as fast as possible

Invoke REST APIs to retrieve fresh data

Because the existing data is already stored in an Oracle database, we figured the fastest way to mash it would be to land the REST response in the same place. And what’s best to use the database capabilities to land the data there? Luckily the excellent APEX_WEB_SERVICE PL/SQL package lets us invoke REST APIs and handle the response since Oracle Database 11g. This package is part of Oracle Application Express (APEX). So if you are using an on-premises instance, Exadata Cloud Service or Database Cloud Service on Oracle Cloud Infrastructure, you first need to install APEX and enable network services through Access Control Lists (ACLs). ACLs can implement fine-grained access control to external network services.

Now if you are using our Autonomous Database (ADB), you have nothing to do. Application Express is already installed and your ADB instance is preconfigured with an ACL to permit outbound web service calls.

Let’s have a look at the MAKE_REST_REQUEST function. It takes at least an URL and an http method (GET, POST, PUT, DELETE or HEAD) as parameter and it returns the response in a CLOB. p_username, p_password, p_scheme and p_token_url can be used for authentication to the REST endpoint if the API is not public. You can pass the payload of your request as a CLOB using p_body, as a BLOB using p_body_blob or as name/value pairs using p_parm_name and p_parm_value.

Note that MAKE_REST_REQUEST_B is available if you prefer the response to be returned in a BLOB.

In the following example we retrieve the standardReceipts from an Fusion ERP instance.

DECLARE
    l_resp_body CLOB;
BEGIN
    l_resp_body := apex_web_service.make_rest_request(
        p_url         => ‘https://<pod_id>.fa.us6.oraclecloud.com/fscmRestApi/resources/latest/standardReceipts‘,
        p_http_method => ‘GET’,
        p_username => ‘<username>’,
        p_password => ‘<password>’,
        p_parm_name => APEX_UTIL.string_to_table(‘onlyData:totalResults:limit:offset’),
        p_parm_value => apex_util.string_to_table(‘true:true:200:0’)
    );

    DBMS_OUTPUT.PUT_LINE (apex_web_service.g_status_code);
    DBMS_OUTPUT.PUT_LINE (dbms_lob.substr(l_resp_body, 32000,1));
END;
/

We passed some parameters to only get the data, retrieve the total number of results and only get the first 200 results in this call.

The status code of the REST invocation is stored in apex_web_service.g_status_code. The headers are stored as name/value pairs in apex_web_service.g_headers.

Store the response body

Now we need to store the content of l_resp_body into the database. In our example the response body is in JSON so we can leverage the native JSON support of the Oracle database. The support for JSON in the Oracle database has improved over years so let’s have a look at the different solutions.

CREATE TABLE “REST_RESPONSE”
(   “INVK_ID” NUMBER GENERATED ALWAYS AS IDENTITY,
   “RUN_ID” NUMBER,
   “RUN_SEQ” NUMBER,
   “FETCHED_AT” TIMESTAMP (6) DEFAULT systimestamp,
   “REST_URL” VARCHAR2(1000 CHAR),
   “HTTP_METHOD” VARCHAR2(6 CHAR),
   “PARM_NAME” VARCHAR2(4000 CHAR),
   “PARM_VALUE” VARCHAR2(4000 CHAR),
   “STATUS_CODE” NUMBER,
   “REQUEST_BODY” CLOB,
   “RESPONSE_BODY” CLOB,
   “JOIN_ID” NUMBER AS (NVL(“RUN_ID”,”INVK_ID”)),
   CONSTRAINT “REST_REQ_ISJSON” CHECK (REQUEST_BODY is json) ENABLE,
   CONSTRAINT “REST_RESP_ISJSON” CHECK (RESPONSE_BODY is json) ENABLE
);

As you can see here, we have two CLOBs attributes to store body of the request and the body of the response. We also added a constraint on each of them that will check that the content is a valid JSON syntax.

So now we can edit our PL/SQL block and insert the content of l_resp_body in the table and query it :

    insert into REST_RESPONSE (REST_URL, HTTP_METHOD, PARM_NAME, PARM_VALUE, STATUS_CODE, REQUEST_BODY, RESPONSE_BODY)
            values (l_url, l_http_method, l_parm_name, l_parm_value, apex_web_service.g_status_code, l_body, l_resp_body)
            RETURNING INVK_ID INTO l_response_id;
    commit;

    open rc for SELECT r.RESPONSE_BODY.totalResults FROM REST_RESPONSE r where invk_id = l_response_id;
    dbms_sql.return_result(rc);

What is interesting here is to find the total number of results so we can invoke the same API again to get the next page or results.

We can query the result in 2 different ways :

the simple dot-notation syntax is what I used in the example above. It requires to set an alias on the table (r in this case) and then access the CLOB column, followed by the JSON path you want to retrieve. For instance r.RESPONSE_BODY.totalResults to get something at the root of the document. Or r.RESPONSE_BODY.items[*].StandardReceiptId to list all the StandardReceiptIds returned in the response. A limitation with this notation is that a value longer than 4K bytes will be displayed as NULL.
the SQL/JSON functions and conditions provide more capabilities and flexibility.
- json_exists tests for the existence of a particular value within some JSON data. It can for instance be used in a CASE WHEN statement.
- json_value selects a scalar value from some JSON data, as a SQL value.
- json_query selects one or more values from some JSON data, as a SQL string representing the JSON values.
- json_table projects some JSON data as a virtual table, similarly to inline views.

Here is an example of json_table to display the standardReceipts from our last call to the REST api :

SELECT r.join_id, jt_sr.ReceiptNumber, jt_sr.ReceiptDate document_date, jt_sr.Amount, jt_sr.Currency
FROM REST_RESPONSE r,
json_table(response_body , ‘$.items[*]’
COLUMNS (ReceiptNumber VARCHAR2(200) PATH ‘$.ReceiptNumber’,
ReceiptDate VARCHAR2(200) PATH ‘$.ReceiptDate’,
Amount VARCHAR2(50) PATH ‘$.Amount’,
Currency VARCHAR2(200) PATH ‘$.Currency’)) jt_sr
WHERE join_id = 44;

Chain multiple invocations to retrieve the entire set if the data is paginated

Because we have several thousands standardReceipts, we didn’t retrieve all of them in a single call to the rest API. We used the limit parameter to only retrieve the first 200 results. We now need to create a loop to invoke the same API with a different offset until we have retrieve all the result. The totalResults field we retrieve in the response of the original call helps us to determine how many iterations we need to perform. We can then create a procedure to extract the remaining results.

IF l_total_results > p_rows_per_page THEN
nb_pages := CEIL(l_total_results / p_rows_per_page);
FOR pgs in 1..nb_pages-1
LOOP
l_parm_value := ‘true:true:’||p_rows_per_page||’:’||pgs*p_rows_per_page;
l_run_seq := l_run_seq + 1;
l_resp_body := apex_web_service.make_rest_request(
p_url => l_url,
p_http_method => l_http_method,
p_username => l_username,
p_password => l_password,
p_parm_name => APEX_UTIL.string_to_table(l_parm_name),
p_parm_value => apex_util.string_to_table(l_parm_value)
);

insert into REST_RESPONSE (RUN_ID, RUN_SEQ, REST_URL, HTTP_METHOD, PARM_NAME, PARM_VALUE, STATUS_CODE, REQUEST_BODY, RESPONSE_BODY)
values (p_run_id, l_run_seq, l_url, l_http_method, l_parm_name, l_parm_value, apex_web_service.g_status_code, l_body, l_resp_body)
RETURNING INVK_ID INTO l_response_id;
commit;
END LOOP;
END IF;

When inserting this time we populate the RUN_ID attribute with the value of INVK_ID of the original call. This will help us to keep track of all the calls that together create our entire dataset. To get the entire dataset, we can filter on the virtual column JOIN_ID which is equal to the RUN_ID or the INVK_ID if RUN_ID is null (in case of the original call).

Parallel invocations

If the dataset is large, it might make sense to send multiple requests at the same time to reduce the overall extraction time. Since 11g Release 2 the Oracle Database comes with a pre-installed package to handle parallelism : DBMS_PARALLEL_EXECUTE. It lets us create a task and create chunks to split the workload. We can then run the task and monitor it.

The chunking requires an associated table to define the chunks. This table should contain the RUN_ID, the RUN_SEQ and all the information required to send the request such as REST_URL, HTTP_METHOD, PARM_NAME, PARM_VALUE and REQUEST_BODY. We can create a new table called REST_REQUEST for that. Alternatively we can change the way our PL/SQL works to use our existing REST_RESPONSE table which already contains all these fields. We would need to insert all the request data before doing the parallel invocation and we can update each rows after its request has been sent. Either way, once we have each invocation represented by a row in the table we can create our chunks. There are different ways to create chunks but in this case we want to use the CREATE_CHUNKS_BY_SQL procedure. So we need to write an SQL statement to retrieve each chunk. That statement should return a start ID and an end ID for each chunk to identifies the rows in the table that should be processed. Because we want one chunk per row in our case, we return the same value for the start and the end values : invk_id.

t_name := ‘task_’||p_init_id;
l_stmt := ‘SELECT invk_id, invk_id FROM REST_RESPONSE WHERE run_id = ‘||p_init_id||’ AND run_seq > 0′;

DBMS_PARALLEL_EXECUTE.create_task (task_name => t_name);
DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => t_name,
sql_stmt => l_stmt,
by_rowid => FALSE);
DBMS_PARALLEL_EXECUTE.run_task(task_name => t_name,
sql_stmt => ‘BEGIN INVOKE_REST(‘||p_init_id||’, null, :start_id ,:end_id); END;’,
language_flag => DBMS_SQL.NATIVE,
parallel_level => p_parallel_level);

The SQL statement we pass to the RUN_RASK procedure is a PL/SQL block that will retrieve the row from the table based on the start ID, send the request and log the result.

We can monitor the execution of a task by using the TASK_STATUS function checking it against the package constants.

l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(t_name);
WHILE(l_try < 2 and l_status NOT IN (DBMS_PARALLEL_EXECUTE.FINISHED))
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK(t_name);
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(t_name);
END LOOP;

We can also monitor it or troubleshoot it by querying the [DBA|USER]_PARALLEL_EXECUTE_TASKS views. If there is an error we can dig deeper and troubleshoot it at the chunk level with [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views.

Once we are done with the task we can drop it :

DBMS_PARALLEL_EXECUTE.DROP_TASK(t_name);

Mash the response data with existing data stored in an Oracle database

The magic of all this is that the data stored in the JSON document can directly be joined to existing data in the database. In the query below you can see that we retrieve the important information for each standardReceipt and we join it with some external comments that were already stored in our Data Warehouse.

WITH rest_src as (
SELECT r.join_id, jt_sr.RECEIPTNUMBER, jt_sr.RECEIPTDATE, jt_sr.AMOUNT, jt_sr.CURRENCY
FROM REST_RESPONSE r,
json_table(response_body , ‘$.items[*]’
COLUMNS (RECEIPTNUMBER VARCHAR2(200) PATH ‘$.ReceiptNumber’,
RECEIPTDATE VARCHAR2(200) PATH ‘$.ReceiptDate’,
AMOUNT VARCHAR2(50) PATH ‘$.Amount’,
CURRENCY VARCHAR2(200) PATH ‘$.Currency’)) jt_sr
),
existing_src as (
SELECT RECEIPTNUMBER, ADDITIONAL_COMMENT
FROM EXISTING_DATA )
SELECT rest_src.join_id, rest_src.RECEIPTNUMBER, rest_src.RECEIPTDATE, rest_src.AMOUNT, rest_src.CURRENCY, existing_src.ADDITIONAL_COMMENT
FROM rest_src LEFT OUTER JOIN existing_src ON (rest_src.RECEIPTNUMBER = existing_src.RECEIPTNUMBER)
WHERE join_id = 44;

The query ran in less than 0.1 seconds in an Autonomous Database so the performance is sufficient in my case but let’s see how we could improve it further.

Keep it as fast as possible

For each JSON field that can be exposed using the json_value SQL/JSON function we can improve the performance by creating a function-based index for that function. It can be a B-Tree or a bitmap index.

CREATE INDEX rest_response_link_idx1
ON REST_RESPONSE (json_value(RESPONSE_BODY, ‘$.links.href’));

We can interchangeably use the simple DOT notation to create or use the function.

CREATE INDEX rest_response_link_idx1
ON REST_RESPONSE r (r.RESPONSE_BODY.links.href);

Now it won’t help us to index the receiptNumber field in our JSON response because it’s nested inside an array. For this use case we would need a JSON Search Index.

CREATE SEARCH INDEX rest_response_search_idx ON REST_RESPONSE (RESPONSE_BODY) FOR JSON;

By default a search index is synchronized On Commit but we can specify we want to synchronize it periodically or on demand.

CREATE SEARCH INDEX rest_response_search_idx ON REST_RESPONSE (RESPONSE_BODY) FOR JSON PARAMETERS(‘SYNC (EVERY “FREQ=SECONDLY; INTERVAL=1”)

More information about indexes for JSON DATA can be found in the documentation.

Other generic methods also apply to table containing JSON data such as putting the table in-memory or building a materialized view on top of our query using JSON_TABLE.

Troubleshooting

REST request failing

The first thing to do when the REST request is failing is to check the status of the response with apex_web_service.g_status_code. That status code gives information on the type of error with 2xx being good news, 3xx indicating a redirection, 4xx indicating client errors and 5xx indicating server errors.

There might also be some information in the headers of the response. We can access it using apex_web_service.g_headers.

Here is an example :

            FOR i IN 1.. apex_web_service.g_headers.count LOOP
                DBMS_OUTPUT.PUT_LINE (apex_web_service.g_headers(i).name || ‘ – ‘ || apex_web_service.g_headers(i).value);
            END LOOP;

Parallel Execution failing

As mentioned above the [DBA|USER]_PARALLEL_EXECUTE_TASKS and [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views help us to understand what is going wrong.

Conclusion

REST APIs are an important source of data that can be used to enrich our existing data or in this case give a fresh view of our Oracle Fusion ERP data. When performance is key, getting the data directly from the database and storing it as JSON document allows to quickly join it with our existing data.

If you want to learn more about Fusion SaaS data extraction, check this video.

First published on 2021-11-10

Mashing fresh data from REST APIs with existing data stored in an Oracle database

Invoke REST APIs to retrieve fresh data

Store the response body

Chain multiple invocations to retrieve the entire set if the data is paginated

Parallel invocations

Mash the response data with existing data stored in an Oracle database

Keep it as fast as possible

Troubleshooting

REST request failing

Parallel Execution failing

Conclusion

Jerome Francoisse

Consulting Solution Architect

Fusion SaaS Data Extraction Approaches

OIC and ICS File based integrations for Oracle HCM Cloud using UCM webservices

Mashing fresh data from REST APIs with existing data stored in an Oracle database

Invoke REST APIs to retrieve fresh data

Store the response body

Chain multiple invocations to retrieve the entire set if the data is paginated

Parallel invocations

Mash the response data with existing data stored in an Oracle database

Keep it as fast as possible

Troubleshooting

REST request failing

Parallel Execution failing

Conclusion

Authors

Jerome Francoisse

Consulting Solution Architect

Fusion SaaS Data Extraction Approaches

OIC and ICS File based integrations for Oracle HCM Cloud using UCM webservices