Using Oracle Data Pump in Oracle Data Integrator (ODI)

January 10, 2014 | 17 minute read
Text Size 100%:

Validation

Content validated on 4/16/2021 with

  • ODI Version 12.2.1.4.200721.1540
  • Oracle Database System 19.7.0.0.0 (DBaaS)

Introduction

This article presents a modified version of the ODI load knowledge module called LKM Oracle to Oracle (datapump).   The ODI load knowledge module (LKM) presented in this article has been enhanced with additional options and steps to take advantage of the best features of Oracle Data Pump and Oracle External Tables.  Some of the enhancements include data compression, server to server file transport, threads control, and the use of Oracle Optimizer Hints.  This article shows how to configure and use this knowledge module.

If your source and target data stores are both Oracle, Data Pump is the fastest way to extract and load data between Oracle data stores.

If you would like to download a copy of this knowledge module, go to Oracle Data Integrator Code Samples, and download "LKM Oracle to Oracle Datapump Plus."

 

Using Oracle Data Pump in Oracle Data Integrator (ODI)

Oracle Data Pump is a component of the Oracle database technology that allows users to export and import data faster than other traditional Oracle utilities.  In addition to basic export and import functionality, Oracle Data Pump supports the use of Oracle External Tables, and keeps data in a binary format, further improving processing efficiency.

Oracle External Tables is another feature of the Oracle database technology that allows users to define external files as database tables.  This feature empowers users with the ability to query files using SQL syntax as if the files were tables of a database schema.

The combination of using Oracle Data Pump with External Tables is a great use-case for extracting, loading, and transforming (EL-T) data with ODI.

Oracle Data Pump works only between Oracle databases.

If you would like to learn more about using Oracle Data Pump with Oracle External Tables, please refer to the following documents for additional information:  “Using Oracle External Tables”, “Overview of Oracle Data Pump”, and “The Oracle Data Pump Access Driver.” The scope of this article focuses on how to use and configure an ODI knowledge module that takes advantage of Oracle Data Pump and External Tables to efficiently load data between Oracle data servers.

 

Overview: LKM Oracle to Oracle Datapump Plus

The ODI version of the LKM Oracle to Oracle (datapump) provides basic and standard functionality on how to export and import data between Oracle data stores using Data Pump and External Tables.

The load knowledge module presented in this article, LKM Oracle to Oracle Datapump Plus, has been customized with additional options and features such as data compression, number of Data Pump workers, degree of parallelism, file-transfer, optimizer hints, and suffix names for all temporary objects.

Figure 1 shows a list of features that have been added to the LKM Oracle to Oracle Datapump Plus.

 

Figure-01

Figure 1: LKM Oracle to Oracle Datapump Features

The following sections explain in detail how to use and take advantage of these new LKM features.

Load Knowledge Module (LKM) Options

Figure 2 shows a list of available options for the LKM Oracle to Oracle Datapump Plus.

Figure-02

Figure 2: LKM Oracle to Oracle Datapump Plus - Options

The default values for the above LKM options have been configured as follow:

  • The Data Pump active workers (parallel threads) for both export and import operations have been set to the same value: 3 workers.

  • The Data Pump directories for both export and import operations have been set to the same location: user home.

  • Data will be compressed before is written to the data pump set.

  • Data Pump files will not be transferred.

  • Data Pump files will be deleted when the upload operation is complete.

  • Log files will be deleted when the upload operation is complete.

  • The suffix name has no value; this feature is optional.

  • The optimizer hint option has no value; this feature is optional.

All these options can be configured, changed, and customized by the user.  The following sections describe how to configure and customize these LKM options.  

 

NUM_OF_ACTIVE_WORKERS_EXPORT

 

  • Oracle Data Pump can export data in parallel.  Each parallel process (the Data Pump worker) can create its own Data Pump file.   Therefore, multiple Data Pump files can be created in parallel.

  • This LKM has been enhanced to allow the user to specify the desired number of Data Pump workers to be used during the export operation.

  • A data pump file will be created for each Data Pump worker, so that each worker has an exclusive lock on each pump file, and data can be written in parallel.

  • Figure 3 shows the default value for this option:  3 Data Pump workers.

    Figure 3: Knowledge Module Option – Number of Data Pump workers on Export

Figure 3: Knowledge Module Option – Number of Data Pump workers on Export

Increasing the number of Data Pump workers does not necessary mean that a Data Pump process will export data faster.  It is important to test and find the optimum number of Data Pump workers based on database resources, and other Data Pump jobs running at the same time.

 

NUM_OF_ACTIVE_WORKERS_IMPORT

 

  • Oracle Data Pump can also upload data in parallel.  Oracle recommends having a number of threads (Data Pump workers) that is greater or equal to the number of Data Pump files, so that multiple threads can read from the file and can write to the target table, all at the same time.

  • On import, Data Pump workers can read multiple Data Pump files or even chunks of the same Data Pump file concurrently. Therefore, data can be uploaded into the target table in parallel even if there is only one Data Pump file.

  • Figure 4 shows the default value for this option:  3 Data Pump workers.

    Figure 4: Knowledge Module Option – Number of Data Pump workers on Import

Figure 4: Knowledge Module Option – Number of Data Pump workers on Import

Increasing the number of Data Pump workers does not necessary mean that a Data Pump process will upload data faster.  It is important to test and find the optimum number of Data Pump workers based on database resources, and other Data Pump jobs running at the same time.

 

DATA_PUMP_EXPORT_DIR

 

  • This option allows the user to specify the physical path or location where Data Pump files will be exported.  This can be a hard-coded value or the name of an ODI variable that contains the actual path or location.

  • The value specified in this option should be a physical path in the source data server, or a shared network drive where both the source and target data servers can access the datapump files.

  • Figure 5 shows an example on how to define an ODI variable with this option.  The variable contains the actual physical path or location to be used when exporting Data Pump files.

  • By default, this option is set to the home directory of the user running the ODI agent.

    Figure 5: Knowledge Module Option – Data Pump Export Directory

  • Figure 5: Knowledge Module Option – Data Pump Export Directory

 

DATA_PUMP_IMPORT_DIR

 

  • There are cases when the export and import directories cannot be the same physical location, and an additional directory must be defined in the target data store.

  • This option allows the user to specify the physical path or location to be used for Data Pump import operations.  This can be a hard-coded value or the name of an ODI variable that contains the actual path or location.

  • Figure 6 shows an example of using an ODI variable with this option.  The variable contains the actual physical path or location to be used when importing Data Pump files.

  • The value specified in this option should be a physical path in the target data server, or a shared network drive to access the source files from the target data server.

  • By default, this option is set to the home directory of the user running the ODI agent.

    Figure 6: Knowledge Module Option – Data Pump Import Directory

Figure 6: Knowledge Module Option – Data Pump Import Directory

 

COMPRESS_DATA

 

  • This option allows the user to compress data before it is written to the Data Pump file set.  On export, Data Pump compression is an inline operation, allowing smaller Data Pump files to be created on disk.  This translates to a significant savings in disk space.

  • Data Pump compression is fully inline on the import side as well.  There is no need to uncompress Data Pump files before importing them into the target data store.

  • This feature is only available in versions of Oracle 11g and higher.

  • Figure 7 shows the compress data option.

  • By default, this option is set to false.

    Figure 7: Knowledge Module Option – Data Compression

Figure 7: Knowledge Module Option – Data Compression

 

TRANSFER_FILES

 

  • This option allows the user to copy the Data Pump files from the source data server to the target data server.

  • If a shared network drive is used for both export and import operations, then this option is not required.

  • However, this option should be used in environments where the target data server has no access to the source Data Pump files, and the files must be transferred to a location in the target server for the import operation.

  • Also, this option is useful when it is too slow to import the Data Pump files from a remote location such as a shared network drive, and a local directory in the target data server provides better performance.

  • The option uses the Oracle database file transfer package called dbms_file_transfer.  For additional information on this Oracle package, please refer to “Oracle DBMS File Transfer Package.

  • A database link must be configured in the target data server in order to transfer files between data servers.  See section Configuring your Database Link for File-Transfer Operations for details on how to configure your database link.

  • Figure 8 shows the file transfer option.  By default, this option is set to false.

    Figure 8: Knowledge Module Option –Transfer Files

Figure 8: Knowledge Module Option –Transfer Files

 

DELETE_DATA_FILES

 

  • This option allows the user to delete or keep the Data Pump files created by the knowledge module during the export and import operations.

  • After the Data Pump files have been exported and imported successfully, the knowledge module deletes the Data Pump files created in the source data server.  If the Data Pump files have been transferred to a target data server, the knowledge module deletes the target Data Pump files as well.

  • Figure 9 shows this option.  By default, this option is set to true.

    Figure 9: Knowledge Module Option – Delete Data Files

Figure 9: Knowledge Module Option – Delete Data Files

 

DELETE_LOG_FILES

 

  • This option allows the user to delete or keep the log files created by the knowledge module during the export and import operations.

  • After the Data Pump files have been exported and imported successfully, the knowledge module deletes the log files created for both operations.

  • Figure 10 shows this option.  By default, this option is set to true.

    Figure 10: Knowledge Module Option – Delete Log Files

Figure 10: Knowledge Module Option – Delete Log Files

 

SUFFIX_NAME

 

  • This option allows the user to define a suffix name in all temporary objects created by the knowledge module at runtime.  This can be a hard-coded value or an ODI variable that contains the actual suffix name.

  • This option provides another degree of parallelism: multiple executions of the same ODI mapping can run in parallel when unique suffix names are used.  An ODI variable can be used to store a suffix name.  The variable can be refreshed with a unique value before launching another instance of the same mapping.   Each execution of the same ODI mapping will have its own set of temporary object names.

  • Figure 11 shows an example of how to specify an ODI variable with this option.

  • By default, this option has no value.  It is optional, but it can be very useful.

    Figure 11: Knowledge Module Option – Suffix Name

Figure 11: Knowledge Module Option – Suffix Name

 

OPTIMIZER_HINT

 

  • Oracle Optimizer Hints offer a mechanism to instruct the Oracle Optimizer to choose a certain query execution plan based on a criteria specified by the user.  This feature of the Oracle database is a great way to optimize the execution of your database queries.

  • This KM option allows the user to add an Oracle Optimizer Hint in the SQL statement that defines the external table in the source data server.

  • The value for this option can be a hard-coded Oracle Optimizer Hint or an ODI variable that contains the hint.

  • The syntax for an Oracle Optimizer Hit is as follow:

  /*+ hint [text] [hint[text]]... */

  • Some of the most common Oracle Optimizer Hints such as APPEND, STAR_TRANSFORMATION, and ALL_ROWS are reusable hints: the hints can be defined inside the knowledge module and use by many ODI mappings.

  • Other Oracle Optimizer Hints such as INDEX, FULL, and PARALLEL depend on the tables used by each ODI mapping, and the user may choose to define the Oracle Optimizer Hint in the mapping itself.

  • Figure 12 shows an example of how to define an Oracle Optimizer Hint in an ODI mapping:

  1. Open the ODI mapping, and select the Physical tab.

  2. In the Staging Area of the Target Group box, select the Loading KM property.

  3. Expand the Loading Knowledge Module tree, and ensure that the LKM Oracle to Oracle Datapump Plus knowledge module has been selected.

  4. Select the Optimizer Hint option and enter the Oracle Optimizer Hint.

  5. Save your mapping changes.

Figure 12: Customizing Knowledge Module Options in Mappings

Figure 12: Customizing Knowledge Module Options in Mappings

  • For additional information on how to use Oracle Optimizer Hints, please refer to “Oracle Database Performance Tuning Guide, Using Optimizer Hints.”

  • By default, this option has no value.  It is optional, but it can be really useful in cases where hints are needed to speed up the execution of the SQL statement that defines the source external table.

 

Configuring your Environment to Work with Oracle Data Pump

The following sections describe the steps required to enable the LKM Oracle to Oracle Datapump Plus in your environment.  There are 3 areas where configuration is required:   the source data server, the target data server, and the ODI studio.

Server Configuration

  • Create a physical directory in the source data server to manage your Data Pump export operations.  Example:

               /usr/oracle/odi/datapump/export

  • If you plan to copy or transfer your datapump files from your source data server into your target data server, create another physical directory in the target data server.  Example:

               /usr/oracle/odi/datapump/import

  • If you plan to use a shared network drive for both operations, create one single physical directory in the shared data server.  Example:

               /usr/oracle/odi/datapump/shared

Data Server Privileges

  • In the ODI Topology Navigator, identify which user is configured to connect to the source data server.  For instance, Figure 13 shows the ODI physical data server connection for a source database.  The user configured to connect to this source data server is ODI_USER.

    Figure 13: ODI Topology Manager – Physical Source Data Server Connection

Figure 13: ODI Topology Manager – Physical Source Data Server Connection

  • Using a database tool such as SQL Developer, grant the following system privileges to the user configured to connect to the source data server.

           grant create any directory to ODI_USER;

           grant drop any directory to ODI_USER;

  • Follow the same steps for your target data server, and grant the same system privileges to the user configured to connect to the target data server:

           grant create any directory to <target_db_user>;

           grant drop any directory to <target_db_user>;

 

Configuring Your Database Link for File-Transfer Operations

If your environment requires copying or transferring your datapump files from the source data server to the target data server, follow the instructions in this section. 

If you configured a shared network drive for both export and import operations, you don’t need to setup a database link.

  • The LKM Oracle to Oracle Datapump Plus uses the Oracle dbms_file_transfer package to perform the transfer of Data Pump files from the source data server to the target data server.  Grant the “execute” system privilege on this package to the user or schema of the target data server:

          grant execute on dbms_file_transfer to <target_db_user>;

  • A database link is required in order to copy the Data Pump files from the source data server to the target data server.  Grant the “create database link” system privilege to the user or schema of the target data server:

          grant create database link to <target_db_user>;

  • Login as the user or schema of the target data server, and create the database link:

Example:  The following database link will connect to a source data server with a user called ODI_USER.  The source data server is called “OLTP_ORDERS”.

          create database link "ORDERS_DB" connect to "ODI_USER"
          identified by odi using 'OLTP_ORDERS';

  • Ensure that your new database link is also added in the tnsnames.ora file of your target data server.  Example:

        OLTP_ORDERS =

           (DESCRIPTION =
              (ADDRESS = (PROTOCOL = TCP)(HOST = orders1-svr
              (PORT = 1521))
              (CONNECT_DATA =
                (SERVER = DEDICATED)
                (SERVICE_NAME = orders.us.example.com)
              )
            )

  • In the ODI Topology Navigator, open your physical source data server configuration, and add the name of the database link in the Instance / dblink (Data Sever) text box as shown in Figure 14:

    Figure 14: ODI Topology Manager- Database Link Configuration

Figure 14: ODI Topology Manager- Database Link Configuration

 

How does LKM Oracle to Oracle Datapump Plus work?

Figure 15 shows the main steps of the LKM Oracle to Oracle Datapump Plus.  There are 5 main steps:

  1. On the source data server, the knowledge module creates an Oracle directory with the physical location where the Data Pump files will be exported.  Using the knowledge module options, an external table is defined in the source data server.

  2. The export operation starts when the external table definition is executed in the source data server. The Oracle Data Pump utility starts writing the files on disk in this step.  If the compress option is enabled, the data will be compressed before it is written on disk.  One log file will be created on disk to log any errors or data that cannot be exported.  The number of datapump workers specified in the knowledge module option will be used to write each file in parallel.

Figure 15: LKM Oracle to Oracle Datapump Plus – Main Steps

Figure 15: LKM Oracle to Oracle Datapump Plus – Main Steps

  1. Once the Data Pump files have been written on disk, if the file-transfer option is enabled, the files will be copied from the source data server to the target data server.  When the file-transfer operation is complete, the user will have a duplicate copy of each file in both places: the source and the target data servers.  If the file-transfer option is disabled, the knowledge module will skip the file-transfer step.  Therefore, it is expected that the target data server has access to the source Data Pump files or a shared network drive has been configured by the user to perform both operations:  the export and import of Data Pump files.

  2. On the target data server, an Oracle directory will be created to specify the location of the Data Pump files to be imported.  Another external table will be created on the target data server.  One log file will be created on disk to log any errors or data that cannot be imported.

  3. The datapump files will be loaded into an integration table by an ODI integration knowledge module (IKM).  The import operation will start when a select-statement is issued against the external table in the target data server.  The integration table can be an ODI I$ table or any other table specified by an ODI integration knowledge module.  Finally, if the user has chosen to delete the temporary objects, all temporary objects created by the knowledge module will be deleted.

 

Understanding the Code Generated by the Knowledge Module

It is recommended to get familiar with the steps that the LKM Oracle to Oracle Datapump Plus executes at runtime.  The ODI Simulation feature is a great tool to review the LKM steps.

Figure 16 shows two steps or tasks of the LKM Oracle to Oracle Datapump Plus: “Create Oracle directory on SOURCE,” and “Create Oracle directory on TARGET.” In this example, an ODI variable will be used to create the Oracle directory names with a suffix.  Also, ODI variables will be used to specify the physical location of the export and the import directories. At runtime, ODI evaluates these variables and replace them with their actual values.

Figure 16: LKM Oracle to Oracle Datapump Plus – Oracle Directory Name Creation

Figure 16: LKM Oracle to Oracle Datapump Plus – Oracle Directory Name Creation

Figure 17 shows an example of how ODI generates the code for the external table on the source data server.  This example highlights where in the code the KM options are being used. For instance, every temporary object contains a suffix name: MAR2013.  The compress data option is enabled; data will be compressed before it is written to the Data Pump file set. A total of 3 Data Pump files will be created during the export operation.  Three Data Pump active workers will write to each Data Pump file in parallel.  An optimizer hint will be used when selecting data from the source table.

Figure 17: LKM Oracle to Oracle Datapump Plus – Code Sample for External Table on Source

Figure 17: LKM Oracle to Oracle Datapump Plus – Code Sample for External Table on Source

Figure 18 shows a sample of the code that ODI will generate to transfer one Data Pump file from the source data server to the target data server. This example highlights where in the code the database link name will be used.  This database link name comes from the “Instance / dblink (Data Server)” text box of the source data server that has been defined in the Physical Architecture of the ODI Topology Navigator. Also, it highlights the actual names of the Oracle source and target directories.

This block of code will be duplicated for each Data Pump file that needs to be transferred.

Figure 18: LKM Oracle to Oracle Datapump Plus – Code Sample for File Transfer

Figure 18: LKM Oracle to Oracle Datapump Plus – Code Sample for File Transfer

 

Conclusion

Oracle Data Pump is the fastest way to extract and load data between Oracle data servers.  If you would like to download a copy of this knowledge module, go to Oracle Data Integrator Code Samples, and download "LKM Oracle to Oracle Datapump Plus."

For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-Team Chronicles for ODI.

 

Benjamin Perez-Goytia


Previous Post

BPM 11g Process Instances - Faults, Rollback and Recovery - Part 4

Mark Foster | 12 min read

Next Post


Understanding the ODI JKMs and how they work with Oracle GoldenGate

Christophe Dupupet | 22 min read