X

Best Practices from Oracle Development's A‑Team

Oracle GoldenGate: Speed-up Initial Loads With SQLPREDICATE

Introduction

The Oracle GoldenGate (OGG) Extract option SQLPREDICATE may be used to speed-up the initial load of large tables when the source database is Oracle, DB2, SQL Server, or Teradata. In this article we shall discuss the various OGG options for splitting data across multiple initial load Extract groups and demonstrate the effectiveness of each.

The scripts and information provided in this article are for educational purposes only. They are not supported by Oracle Development or Support, and come with no guarantee or warranty for functionality in any environment other than the test system used to prepare this article.

Main Article

When planning an initial load and the source database is either Oracle, DB2, SQL Server, or Teradata, the OGG Extract option SQLPREDICATE is a better choice for splitting large tables than WHERE or FILTER. As documented in the Oracle GoldenGate Reference Guide, SQLPREDICATE:

  • Forces the records returned by the Extract's SELECT statement to be ordered by the key values.
  • Keeps the Extract from fetching all of the records before filtering them.
  • For Oracle tables, reduces the amount of data that is stored in the undo segment.

It is important to note that the columns specified as part of the SQLPREDICATE WHERE clause must be part of a key or index to prevent full table scans which reduce efficiency and performance.

Source Data Size

In my source Oracle database I have one table that is 3 GB in size::

TABLE_NAME                         NUM_ROWS  AVG_ROW_LEN          TOTAL_BYTES ------------------------------ ------------ ------------ -------------------- MILLTBLEA                         2,000,205        1,188        3,019,898,880

We want split the load of MILLTBLEA across multiple OGG processes; so lets take a look at the performance for the various options; FILTER, WHERE, and SQLPREDICATE. To keep things simple, we will configure two initial load Extracts; where in a production scenario we would probably use more. So our test configurations will be:

Load Option First Extract Second Extract
FILTER (@RANGE()) extract tmysqlc
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge
table east.milltblea, filter (@range (1,2));
extract tmysqld
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge
table east.milltblea, filter (@range (2,2));
FILTER ([col value]) extract tmysqlc
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge
table east.milltblea, filter (cola < 499294558289218750);
extract tmysqld
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge
table east.milltblea, filter (cola >= 499294558289218750);
WHERE extract tmysqlc
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge
table east.milltblea, where (cola < 499294558289218750);
extract tmysqld
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge
table east.milltblea, where (cola >= 499294558289218750);
SQLPREDICATE extract tmysqlc
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge
table east.milltblea, SQLPREDICATE "WHERE COLA < 499294558289218750";
extract tmysqld
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge
table east.milltblea, SQLPREDICATE "WHERE COLA >= 499294558289218750";

 

FILTER(@RANGE()) may be used to divide the rows of any table across two or more OGG processes. A hash value is computed based upon the specified input column; which in this case will be the primary key.

To determine the value of the primary key column, COLA, for the filter and where clauses; we query the database for the median data value:

SQL> col COLA format 999,999,999,999,999,999 SQL> select median (cola) as COLA from milltblea;                     COLA ------------------------  499,294,558,289,218,750

In my test environment, the data read time for each test execution was:

Load Option TMYSQLC Records Read TMYSQLC Read Time (secs) TMYSQLD Records Read TMYSQLD Read Time (secs) Total Read Time (secs)
FILTER (@RANGE()) 1,000,571 465 999,634 345 810
FILTER ([col value]) 1,000,102  311 1,000,103 188 499
WHERE 1,000,102 138 1,000,103 171 309
SQLPREDICATE 1,000,102 124 1,000,103 141 265

As can be seen in the above table, WHERE and SQLPREDICATE provide better read rates than FILTER. However, SQLPREDICATE does have slightly better performance than WHERE because the Extract SELECT statement has better optimization.

When the source database is Oracle, the SQLPREDICATE clause "AS OF SCN [scn value]" may be specified to have Extract perform an Oracle Flashback Query. Using Flashback is faster that reading directly from the source table. To demonstrate this, we set-up the following test:

Direct Read Extract Flashback Query Extract
extract tmysqld
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge
table east.milltblea;
extract tmysqlc
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge
table east.milltblea, sqlpredicate "AS OF SCN 27260769";
Records Read: 2,000,205 Records Read: 2,000,205
Read Time (secs): 388 Read Time (secs): 343

 

Summary

Oracle GoldenGate provides the user with multiple options for partitioning large tables across multiple initial load processes. In this articles we demonstrated that, where supported, the SQLPREDICATE option is the most effective choice.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Recent Content