Uploading data from a flat file into a database table is a common use case found in many integration applications. Oracle Integration Cloud (OIC) includes the ATP adapter that can read rows of data in chunks from a CSV file and insert them into a table in the ATP database. However, with the support of new cloud libraries available in ATP database, this operation can now be completed much faster. The rest of this blog provides the details about this new performance enhancement that can be achieved with OIC and ATP Adapter to insert data from large flat files into ATP databases.
The performance enhancement is demonstrated here by picking a simple integration flow as baseline that picks up a CSV file from the embedded SFTP server, which is now part of OIC-FileServer Gen2. One can learn more about the OIC-File Server in the standard product documentation[1] and the blog[2] published by Kundan Vyas of OIC Product Management team. The file is then processed in chunks by the ATP adapter that performs a standard insert operation into an ATP database table. The integration flow is built using basic OIC components, whose details can be found in Oracle Integration Cloud product documentation[3]. The overall integration architecture is show in Fig. 1 below.
Fig. 1 Baseline Integration Flow - Loading data from CSV file into ATP database table using insert operation of ATP adapter
Next, to demonstrate the performance enhancement, the same file is read from the same SFTP server location and uploaded to an object storage location within OCI via a REST adapter invoke. The ATP adapter then invokes a PL/SQL procedure COPY_DATA from the standard package library DBMS_CLOUD, which reads the file from the object storage location and inserts the data into the target table within ATP database. More details about the new PLSQL package library can be found in the ATP documentation[4]. Fig. 2 describes the modified integration flow for the performance enhancement.
Fig. 2 Enhanced Integration Flow - Loading data from CSV file into ATP database table by calling PL/SQL library from ATP Adapter
The configuration process is described in 2 sections. First, we build the standard integration flow for baseline. Next we build the enhanced integration that uses the PLSQL libraries of the ATP database.
Steps to build the standard integration flow are described below.
Fig 3. Standard flow in OIC
Steps to build the enhanced integration flow is described below.
Fig. 4 Enhanced Flow in OIC
A CSV file containing text data was placed in the OIC fileserver area. OIC integration flows for the 2 options were submitted to pickup the CSV file via SFTP adapter and load the data into an empty database table. After completion, the Activity Stream in the monitoring section of OIC console was used to note down the times spent in each part of the flow and the total time for the end-to-end process. The integration flow was repeated twice for each file and the average time was then calculated for reporting purposes.
Different file sizes tested are tabulated below in Table 1.
No. of rows | Approx. File Size | |
---|---|---|
1 | 50K | ~22 MB |
2 | 100K | ~43 MB |
3 | 200K | ~86 MB |
4 | 400K | ~171 MB |
Table 1. CSV Test File Size Details
The results for the baseline integration flow with different file sizes are shown in Fig. 5 below.
Fig. 5 Results for Baseline Integration Flow
The results for the enhanced integration flow with different file sizes are shown in Fig. 6 below.
Fig. 6 Results for Enhanced Integration Flow
To summarize, all the times noted for different parts of the flow are tabulated and shown below in Table 2.
Record Count in CSV File (Type of Flow) |
Time for FTP Transfer (min:ss.d) |
Time for transfer to OCI Object Storage (min:ss.d) |
Time for DB insert (min:ss.d) |
Total Time for Completion (min:ss.d) |
---|---|---|---|---|
50K (Baseline) |
00:07.5 |
|
01:42.1 |
01:49.6 |
50K (Enhanced) |
00:07.3 |
00:01.7 |
00:03.7 |
00:12.5 |
100K (Baseline) |
00:12.4 |
|
03:05.2 |
03:17.6 |
100K (Enhanced) |
00:12.1 |
00:02.9 |
00:06.7 |
00:21.8 |
200K (Baseline) |
00:16.3 |
|
06:24.4 |
06:40.7 |
200K (Enhanced) |
00:17.1 |
00:04.0 |
00:05.9 |
00:27.0 |
400K (Baseline) |
00:40.3 |
|
12:50.1 |
13:30.4 |
400K (Enhanced) |
00:31.9 |
00:07.0 |
00:11.9 |
00:50.8 |
Table. 2 Summary of Test Results
A comparison of the results shown in Table 2 clearly demonstrates the significant performance gains achieved by using the PL/SQL cloud libraries in ATP database. To provide an alternate perspective of the improvement, the corresponding completion times of the integration flows are plotted side-by-side in a cluster bar chart and shown in Fig. 7 below.
Fig. 7 Comparison of End-to-End Completion Times for both Integration Flows with different record counts in CSV file
Although the integration flow had to implement an extra hop via the Object Storage Cloud Service to leverage the use of new ATP cloud libraries, the performance gains clearly justifies the overall approach. The simple comparison exercise was carried out in a lab environment primarily to highlight the capability of the new cloud library in ATP databases. The documentation for the library provides more information on advanced options that can be used to address complex use case requirements in real life situations.
At the end, I would like to acknowledge Kundan Vyas of OIC Product Management, who provided valuable assistance with OIC Fileserver and Cemil Alper of Database Product Management, who introduced me to the new PL/SQL cloud library in ATP database.