This article discusses the best practices for selecting and using the Oracle Data Integrator (ODI) Loading Knowledge Modules (LKMs) on both on-premises and on cloud computing. LKMs are code templates in ODI that can perform data upload operations from on-premises data servers to cloud services, between cloud services, or between on-premises data servers. ODI supports a variety of technologies such as SQL databases, Big Data, Files, Java Messaging Systems (JMSs), and many other technologies. Most of these technologies are now available on both on-premises and on data cloud services. For each of these technologies, a variety of LKMs are available. For instance, ODI offers LKMs for SQL databases such as Oracle, Teradata, MySQL, and MS SQL Server, among others. For Big Data, ODI offers LKMs for Spark, Hive, Sqoop, Kafka, and Pig, among others.
LKMs are one of seven categories of Knowledge Modules (KMs) found in ODI. Other categories of KMs include: Reverse-Engineering, Check, Integration, Extract, Journalizing, and Service. This article focuses on the selection and use of LKMs. If you would like to learn more about other categories of KMs, go to “Oracle Data Integrator (ODI) Knowledge Modules (KMs).”
When are Loading Knowledge Modules Required?
In ODI, LKMs are required in the following use-cases:
- Different Technologies – The source datastore and the target datastore are from different technologies. For instance, an LKM is required when loading data from a file into an Oracle table or when loading data from a Teradata table into an Oracle table. This use-case is illustrated on Figure 1, below:
- Different Data Servers – The source datastore and the target datastore are from the same technology, but they are not located on the same data server. For instance, an LKM is required when loading data between two Oracle tables that are located on different Oracle data servers. This use-case is illustrated on Figure 2, below:
- On-Premises to Cloud – On cloud computing, LKMs are required when uploading data from on-premises data servers into cloud data services. This use-case is illustrated on Figure 3, below:
- Different Instances of a Database Cloud Service – LKMs are also required when both the source and the target datastores are from the same database cloud service, but each datastore is located in a different instance of the service or the datastores are hosted on separate services. This use-case is illustrated on Figure 4, below:
Styles of Knowledge Modules
In ODI, there are two KM styles: template-style, and component-style. Template-style KMs are available in both ODI 11g and ODI 12c. Component-style KMs are available in ODI 12c only. A LKM is either a template-style KM or a component-style KM. Template-style KMs must be imported from an ODI directory called /<Oracle Home>/odi/sdk into an ODI repository. Component-style KMs are automatically installed in ODI when an ODI repository is created. By default, ODI 12c uses component-style LKMs when a LKM is required, unless ODI users choose to import and use template-style LKMs.
In ODI 12c, when a mapping is created and a LKM is required, ODI automatically assigns a component-style LKM to the mapping. If a template-style LKM has been already imported into an ODI project, then the template-style LKM is used instead. Figure 5 below shows how to identify the LKM that ODI automatically assigns to an ODI mapping. In this example, the LKM Oracle to Oracle (DB Link).GLOBAL – a component-style LKM – has been assigned to this mapping.
A template-style KM can be imported either as a global object or as a local object. When a template-style KM is imported as a global object, it can be used by any ODI project of an ODI repository. When a template-style KM is imported as a local object, the KM can only be used in the ODI project where it has been imported. A component-style KM is a global object. Other ODI objects such as variables, sequences, user functions, and reusable mappings can also be imported as either global objects or local objects. Thus, LKMs are either global or local objects in an ODI repository. Figure 6, below, summarizes the KM styles found in ODI, and the KM object types:
Figure 7, below, shows an example of how LKMs can be configured either as global KMs or local KMs in an ODI repository:
Figure 7, above, shows a global LKM called LKM Oracle to Oracle (datapump) v1.0. This LKM has been imported as a global object into this ODI repository; thus, it can be used by any of the following three ODI projects: ODI Project 1, ODI Project 2, and ODI Project 3. For ODI Project 1, there are three additional LKMs: LKM Oracle to Oracle (datapump) v2.0, LKM File to Teradata (TTU), and LKM SQL to SQL. These three LKMs have been imported as local objects into this ODI project; thus, they can only be used in this ODI project. Note there are two versions of the LKM Oracle to Oracle (datapump), v1.0 and v2.0. Both versions will be visible in ODI Project 1, and they can both be used in this project. For ODI Project 2 and ODI Project 3, no local LKMs have been configured or imported into either project. For additional information on how to import objects and KMs in ODI, go to “Importing Objects in Oracle Data Integrator.”
Loading Knowledge Modules Best Practices
For a given technology, ODI supports various ways of performing data upload operations. For instance, for Big Data, ODI has various LKMs to perform data upload operations between HDFS, Hive, Spark, HBase, and Pig. Each of these tools offers performance benefits, but some of them can upload data faster than others. Thus, the selection of a LKM has a significant impact on the overall performance of your data upload operations. For any of the technologies supported by ODI, follow these rules of thumb when selecting and using LKMs:
- Select LKMs that support the fastest method for uploading data between data servers. For instance, Oracle offers Data Pump and DBLINK – among other tools – to upload data between Oracle databases. These two Oracle tools are supported by ODI with two LKMs: LKM Oracle to Oracle (datapump), and LKM Oracle to Oracle (DBLINK). Both Data Pump and DBLINK offer great performance benefits, but Data Pump can upload data faster than DBLINK because it uses multiple threads to read, export, and import data in parallel. Thus, when using ODI to load data between Oracle databases, use the LKM Oracle to Oracle (datapump) – the fastest way of loading data between Oracle databases. Use this approach with other technologies as well, and select LKMs that support the fastest method for uploading data between data servers. To learn more about using Oracle Data Pump with ODI, go to “Using Oracle Data Pump with Oracle Data Integrator (ODI).”
- Select LKMs that perform best in your environment. Some LKMs offer tuning options such as number of parallel threads, direct path load options, and concurrent upload operations. Test these options and find the optimum configuration based on the available resources in your environment and the amount of concurrent data upload operations that your environment can support at a given time.
- Explore additional LKMs as well. Your technology may offer additional data upload tools and ODI may have additional LKMs to support these tools. If necessary, discuss with your technology experts – such as database administrators (DBAs) and data architects – which tools are recommended. Then, select the LKMs that support the recommended tools.
- If the out-of-the-box LKMs do not support the desired technology tools to perform data upload operations, build your own LKMs. The ODI framework allows ODI users to build their own KMs – this is one of the biggest benefits of using ODI. For a complete guide on how to develop knowledge modules in ODI, go to “Developing Knowledge Modules with Oracle Data Integrator.”
- Select LKMs that support the native tools of your technology. Typically, these LKMs have a broader number of options, and can easily be customized for additional tasks. Also, when using LKMs that support native tools, the ODI agent can be located on any physical server, since the upload operation is done by the actual technology and not by the ODI agent. For instance, when loading data from text files into the Oracle database, use the LKM File to Oracle (EXTERNAL TABLE). This LKM uses the Oracle External Table technology – a native tool of the Oracle database – to upload data in parallel from text files into the Oracle database. When using this LKM, the upload operation is done by the Oracle database and not by the ODI agent; thus, the ODI agent can reside on any computer. This topic is discussed in greater details at “Understanding Where to Install the ODI Standalone Agent.”
- LKMs can upload data from on-premises data servers to cloud data services. To see examples of how to upload data from on-premises Oracle databases to Oracle Database Cloud Service (DBCS), go to “Using ODI Loading Knowledge Modules on the Oracle Database Cloud Service (DBCS).” LKMs can also upload data into other non-Oracle cloud services such as Amazon Elastic Map Reduce (EMR) cloud service. To see examples of how to use LKMs with Amazon EMR, go to “Using Oracle Data Integrator (ODI) with Amazon Elastic MapReduce (EMR).”
- Use the ODI 12c Exchange option to download additional LKMs – these LKMs are free of cost. This option can be invoked from the ODI Studio, by selecting the Check for Updates option from the Help menu. The ODI 12c Exchange option allows the ODI user-community to share KMs and other ODI objects through update centers. The ODI 12c Exchange option offers both Oracle supported KMs and non-supported KMs. For additional information on the ODI 12c Exchange option, go to “Introducing Oracle Data Integrator (ODI) Exchange.”
- When using LKMs in ODI, take advantage of all the parallel features available in your technology, and orchestrate the data upload operations in parallel. Most technologies have options to perform data uploads in parallel, and these parallel options may be available through the LKMs options – get familiar with the KM options and configure them accordingly. Also, ODI 12c offers In-Session Parallelism – data upload operations can run in parallel if multiple execution units are defined in an ODI mapping. An example of this strategy can be found in the following blog: “Importing Data from SQL databases into Hadoop with Sqoop and Oracle Data Integrator (ODI).” Section “Using ODI 12c In-Session Parallelism with Sqoop” of this blog discusses how to use the ODI In-Session Parallelism option. The blog also discusses how to use ODI packages to load data in asynchronous (parallel) mode.
- When configuring the ODI Physical Architecture, create a single ODI data server for all the schemas that are physically located on the same physical data server – this will eliminate the need of using LKMs in mappings. For instance, if two schemas are located in the same Oracle database, create a single ODI data server to host these two schemas – do not create two ODI data servers, one for each schema. LKMs are not required when the source schema and the target schema are both located in the same physical data server. However, if the two schemas are located on different Oracle databases, then two ODI data servers are required, one for each schema, and a LKM will be required when performing data upload operations between these two ODI data servers.
- Figure 8, below, illustrates two ODI data servers – Staging Area and Warehouse – which have been incorrectly configured for two schemas, MY_STG_AREA and MY_WAREHOUSE, respectively. These two database schemas are located on the same Oracle database service – the JDBC URLs for these two ODI data servers are identical, they reference to the same Oracle database service.
- The ODI data server configuration on Figure 8, above, is incorrect because it forces ODI to use a LKM to perform a data upload operation between two schemas that are located in the same data sever – there is no need to upload data that is already in the same data sever. Also, in this case, the use of a LKM generates additional code that is not needed. This unnecessary code will be executed by the data server, and this will result in a suboptimal performance of a mapping execution. The unnecessary use of a LKM can be observed in the physical design of an ODI mapping – Figure 9, below, illustrates an example:
- Figure 9, above, shows an ODI mapping, Dimensions.W_STATUS_D, with two datastores: STATUS (the source datastore), and W_STATUS_D (the target datastore). This ODI mapping uses the physical architecture defined on Figure 8, above. The STATUS table is located on schema MY_STG_AREA, and the W_STATUS_D table is located on schema MY_WAREHOUSE. In this example, ODI forces the use of a LKM because the schemas have been defined in separate ODI data servers. This can be observed by exploring the loading access point called STATUS_AP. This loading access point shows that a LKM –LKM Oracle to Oracle (datapump) – has been selected to upload data from the STATUS table. To remove the loading access point for this mapping, the two schemas must be reconfigured under a single ODI data server. Figure 10, below, shows the correct configuration:
- Figure 11, below, shows the physical design of the same ODI mapping, Dimensions.W_STATUS_D. ODI has removed both the loading access point and the LKM from the physical design; thus, the mapping will perform the data integration task without having to unnecessarily upload or stage the data from the source datastore.
ODI Knowledge Modules are code templates that perform data integration tasks in ODI mappings. Out of the box, ODI offers over 150 knowledge modules – users can select, modify, and create their own knowledge modules as well. When selecting a LKM, select the LKM that supports the fastest method for uploading data between data servers. Follow the best practices discussed in this article to optimize the overall performance of your data upload operations in ODI.
For more Oracle Data Integrator best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit “Oracle A-team Chronicles for Oracle Data Integrator (ODI).”
ODI Related Articles
All site content is the property of Oracle Corp. Redistribution not allowed without written permission