Using an External Database for your XML Schema, Part II: Optimizing the Use of the External Schema

September 20, 2013 | 8 minute read
Christophe Dupupet
Sr Director, A-Team - Cloud Solutions Architects
Text Size 100%:

Introduction

Whether you are processing XML files or XML messages on JMS, ODI provides a JDBC driver that will translate the XML structure into a relational schema. By default, this schema is created in an in-memory database. While this approach will work extremely well for small XML files or messages, it can become challenging for larger volumes. In this two parts article we are first reviewing how the driver works, then we are detailing the benefits of using an actual database instead of the default in-memory storage for the creation of the relational schema.

Using an External Database for your XML Schema, Part II:  Optimizing the Use of the External Schema

When dealing with a large XML payload (file or JMS message) the most efficient way to process the data will be to leverage the XML JDBC driver to load the data into the target database and then use ODI to process the data directly from that database, bypassing the XML JDBC driver altogether. In other words, once the data is loaded by the driver we completely ignore the fact that the data comes from XML: it is just another schema of the target database.

To implement this we will take a slightly different approach for plain XML Files vs. XML messages over JMS as the constraints are a little bit different: messages on JMS will arrive at a more or less continuous pace, while files tend to be more static. We also have to confirm to JMS that we have processed the data (so as to remove the message from the queue) whereas there is no such process with XML files.

As we go over the steps to use for an optimal leverage of the external schema, you may want to have the following references at hand:

-          The Oracle® Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator contains all the necessary details on parameters and options. This can be found in Section B.3.3 Using an External Database to Store the Data located here:http://docs.oracle.com/cd/E28280_01/integrate.1111/e12644/appendix_xml_driver.htm#CHDICHDB

-          Chapter 9 of the Oracle Data Integrator 11g Cookbook from Packt Publishing contains step by step instructions on how to setup the XML JDBC driver with an external database.

Using an external database to store the XML schema

What we mean by “external database” is a database that is not the agent’s in memory database. For ODI 11.1.1.x that database can be Oracle, Microsoft SQL server or MySQL (and of course HSQLDB, which is used for the in-memory database).

There are two ways to indicate to the driver that an external database will be used:

-         Specific properties can be added to the JDBC URL to specify where to connect. These properties are dp_driver, dp_url, dp_user, dp_password and dp_schema (plus dp_catalog if you are connecting to Microsoft SQL server).

-         One single property: db_props can be used to point to a properties file that will contain all additional parameters (driver name, URL, user name, password, schema and catalog for SQL Server). This file must be located in the CLASSPATH of your agent (typically with the JDBC drivers that you are using for ODI), NOT with the XML file.

Note that in both cases the password must first be encrypted. You can use the encode.bat or encode.sh scripts that are installed with the standalone agent to perform this operation.

In addition to the mandatory parameters described above, additional parameters can be specified to alter the behavior of the driver. The XML Driver documentation describes these in details here: http://docs.oracle.com/cd/E28280_01/integrate.1111/e12644/appendix_xml_driver.htm#CHDFIJEH

One benefit of using an external database is that we can use the target database to store the XML schema. But if we let the XML JDBC driver handle the data, ODI will still think that there are 2 connections to the data: XML and the target database. The construct is the same as if the data was still outside of the database.

XML Schema read

Figure 1: processing data from an XML connection into a database

If we use the data directly from a database connection (instead of using an XML connection) then all SQL operations can be done within the same database: from an ODI perspective, this means that there is no need for an LKM anymore: your IKM can do it all. This saves us the creation of at least one staging table – something that can improve performance quite a bit!

Read from database schema

Figure 2: loading XML using a database connection

This is why we restrict the use of the driver to only load the data in the database. After that we reverse engineer and process the tables and their data as a normal schema of the database. Figure 3 shows a view of Topology where such a schema is declared under an Oracle server connection.

XML schema in database

Figure 3: XML data is stored in a database schema

A simple trick for JMS XML connections is to use actual XML and XSD files to perform the first connection, rather than trying to connect to the queue and retrieve an XML message.

The sole purpose of that first connection is for the XML JDBC driver to create the tables in the target database schema. Once the tables are created, we can reverse engineer the tables from the database as shown on figure 4.

XML_DB_Model

Figure 4: Tables from an XML file reverse-engineered under an Oracle model.

You will note that in the database, the name of the tables is prefixed with the name of the XML schema defined in the connection parameters of the XML file (GEO in the example shown in figure 4).

All interfaces will be built using these tables as sources. The only element we need to add for the execution of these interfaces is an ODI procedure that loads the data from the XML structure into the target database.

Performing this operation will differ somewhat whether we are reading from JMS or reading an XML file, so we will now separate the two.

Plain XML File

We now have a JDBC URL that redirects the XML schema to our target database. As long as we do not set the parameter load_data_on_connect to false a simple Test of the connectivity to the file from Topology will create the schema and load the data immediately in the database. But this manual operation needs to be automated for packages and scenarios that will process the data from this schema. For this we create an ODI procedure with two simple steps. In both steps, make sure that the technology is set to XML and that the schema points to the XML schema we are working with. Figure 5 illustrates these settings.

Step1XML

Figure 5: XML step in procedure

The command on target for these steps are:

TRUNCATE SCHEMA

SYNCHRONIZE FROM FILE

Use this procedure in your packages before all interfaces that will read XML data, as shown on figure 6 below. Remember, since the model that represents the XML data model points to the target database schema, the source tables in these interfaces come directly from that target database.

XML File Procedure in Package

Figure 6: use of a procedure used to load XML data in a database schema.

JMS XML

For XML data extracted from JMS there are two things that we need to do. First, as we did for the XML file, we will need to load the data before we run interfaces.  Then once we have successfully processed all the data that we need we must “commit the read” in JMS so that the message gets removed from the queue.

As we did for the XML File, for each steps in these procedures we will have to set the technology to JMS-XML and to select the proper logical schema name.

All we have to do to load data from JMS is to look at the ODI LKMs that usually perform that operation for us. The LKM JMS XML to SQL has a step called Load JMS to XML. The code that we are interested in will be under the tab Command on Source. As you can see on figure 7, this step uses many of the options of the KM: you can either create the same options in your procedure and then copy the code from this step as-is, or you can decide on predefined values and replace all references to odiRef.getUserExit with the values that you have selected.

JMS KM Step

Figure 7: Extracting data from JMS and loading into the XML schema

Note that in the case of the procedure, you can copy the code on the Command on Target side. This procedure will have a single step, and will have to be called in your packages before any interface that needs access to the data.

We will need a second procedure that will be executed after the interfaces to commit the reads on JMS. Again the LKM JMS XML to SQL gives us the solution with the step Commit JMS Read. We have to look under the Command on source tab to find the code (quite simple this time: Commit). We can copy this under the Command on Target tab of the single step of our second procedure (and again, we can put this on the Command on Target in the procedure):

JMS KM Commit Step

Figure 8: commit JMS.

Remember to set the technology to JMS-XML and to select the proper logical schema name for each step in these procedures.

Now use both procedures in the package: the first one to read, the second to commit the read after all interfaces have been executed, as shown on figure 9 below.

JMS Pacakge

Figure 9: Read and commit JMS – then process data from database model

An additional benefit on the JMS XML side is that if you later on decide to process more of the data from the same message with additional interfaces, you will not have to worry about properly setting this commit option in the last one of the interfaces as you would normally have to do with the standard LKM JMS XML to SQL: now that you have externalized the command, you can process all the data you want between the load and commit procedures.

For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-Team Chronicles for ODI.

Christophe Dupupet

Sr Director, A-Team - Cloud Solutions Architects


Previous Post

Using an External Database for your XML Schema, Part I: Understanding the ODI XML JDBC Driver

Christophe Dupupet | 6 min read

Next Post


Implementing Early Arriving Facts in ODI, Part II: Implementation Steps

Benjamin Perez-Goytia | 15 min read