X

Best Practices from Oracle Development's A‑Team

Oracle GoldenGate: Tables Without Keys

Introduction

For data replication products like Oracle GoldenGate (OGG), it is typically a best practice to have a primary key or unique index defined for tables being replicated. However, in practice this may not be possible when working with poorly designed databases or legacy applications. In this article we shall detail OGG's rules for determining uniqueness in replicated data and present examples and best practices for working with tables without primary keys or unique indexes.

Main Article

The ANSI standard for RDBMS systems states that tables must be defined with constraints; which are rules that define which data values are valid during INSERT, UPDATE, and DELETE operations. The standard defines operational rules for four types of constraint that may be defined for a table, CHECK, PRIMARY KEY, UNIQUE, and FOREIGN KEY; however, different database platforms may allow other types. Furthermore, many RDBMS systems; including Oracle, do not enforce the standard and allow Database Engineers to create tables with lax or no constraints; which violates the standard that every row within a table must reflect a level of uniqueness across the data structure.

This poses a problem when replicating update and delete operations as the source data has no uniqueness, which may result in data integrity issues when applied to the target database.

How Oracle GoldenGate Ensures Row Uniqueness

OGG requires a unique row identifier on the source and target tables for update and delete operations. The row identifier is selected based upon the following order of priority, depending on the number and type of constraints.

1) Primary key.
2) Unique key.
3) If none of the preceding key types exist (even though there might be other types of keys defined on the table) OGG constructs a pseudo key of all columns.

These are the generic rules and there are additional specific considerations depending upon the database platform. For more details review your specific Oracle GoldenGate Installation and Configuration Guide.

To demonstrate the operational differences, let's setup replication for three tables:

create table loren.t_pkey (
c_num    number (10,0),
c_ts     timestamp(6),
c_txt    varchar (50),
primary key (c_num)
);
create table loren.t_ui (
c_num    number (10,0),
c_ts        timestamp(6),
c_txt       varchar (50)
);
create unique index  tui_idx on loren.t_ui (c_num);
create table loren.t_nopk (
c_num    number (10,0),
c_ts     timestamp(6),
c_txt    varchar (50)
);

 

For this example, we'll use Integrated Extract and Classic Replicat configured as:

extract eorcl
userid c##ggs, password AACAAAAAAAAAAAHAAIFBOIYAMCGIMARE BLOWFISH, ENCRYPTKEY DEFAULT
exttrail ./dirdat/eo
logallsupcols
updaterecordformat compact
table pdborcl.loren.*;
replicat cl_amer
userid ggs@pdbamer, password AACAAAAAAAAAAAHAAIFBOIYAMCGIMARE BLOWFISH, ENCRYPTKEY DEFAULT
assumetargetdefs
map pdborcl.loren.*, target pdbamer.loren.*;

 

When Extract resolves the table structure, we see the following in the Extract Report file:

2016-05-23 13:49:00  INFO    OGG-06509  Using the following key columns for source table PDBORCL.LOREN.T_PKEY: C_NUM.
2016-05-23 13:49:00  INFO    OGG-06509  Using the following key columns for source table PDBORCL.LOREN.T_UI: C_NUM.
2016-05-23 13:49:00  WARNING OGG-06439  No unique key is defined for table T_NOPK. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key.
2016-05-23 13:49:00  INFO    OGG-06509  Using the following key columns for source table PDBORCL.LOREN.T_NOPK: C_NUM, C_TS, C_TXT.

Likewise, the Replicat report file shows:

2016-05-23 13:49:03  INFO    OGG-06510  Using the following key columns for target table PDBAMER.LOREN.T_PKEY: C_NUM.
2016-05-23 13:49:06  INFO    OGG-06510  Using the following key columns for target table PDBAMER.LOREN.T_UI: C_NUM.
2016-05-23 13:49:06  WARNING OGG-06439  No unique key is defined for table T_NOPK. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key.
2016-05-23 13:49:06  INFO    OGG-06510  Using the following key columns for target table PDBAMER.LOREN.T_NOPK: C_NUM, C_TS, C_TXT.

Both OGG Groups will use the defined Primary Key to maintain uniqueness for the table T_PKEY, the defined Unique Index for T_UI, and a pseudo key consisting of all table columns for T_NOPK.

Now let's send and update operation through the replication feed and see what SQL DML statements are built by Replicat. To see the DML statements, I need to modify my Classic Replicat configuration. My Integrated Extract and Classic Replicat configuration files look like this:

extract eorcl
userid c##ggs, password AACAAAAAAAAAAAHAAIFBOIYAMCGIMARE BLOWFISH, ENCRYPTKEY DEFAULT
exttrail ./dirdat/eo
logallsupcols
updaterecordformat compact
table pdborcl.loren.*;
replicat cl_amer
userid ggs@pdbamer, password AACAAAAAAAAAAAHAAIFBOIYAMCGIMARE BLOWFISH, ENCRYPTKEY DEFAULT
assumetargetdefs
showsyntax
map pdborcl.loren.*, target pdbamer.loren.*;

 

To see the DML statements, I must start the Replicat from a command prompt.

On the source database, I execute the following update statements:

update loren.t_pkey set c_txt='Update of row 4.' where c_num=4;
update loren.t_ui set c_txt='Update of row 4.' where c_num=4;
update loren.t_nopk set c_txt='Update of row 4.' where c_num=4;
commit;

Replicat displays the DML created from this source transaction:

UPDATE "LOREN"."T_PKEY" x SET x."C_TXT" = 'Update of row 4.' WHERE x."C_NUM"='4'
UPDATE "LOREN"."T_UI" x SET x."C_TS" = TO_TIMESTAMP('2016-05-23 13:48:58.163000000','YYYY-MM-DD HH24:MI:SS.FF'),x."C_TXT" = 'Update of row 4.' WHERE x."C_NUM"='4'
UPDATE "LOREN"."T_NOPK" x SET x."C_NUM" = '4',x."C_TS" = TO_TIMESTAMP('2016-05-23 13:48:58.178000000','YYYY-MM-DD HH24:MI:SS.FF'),x."C_TXT" = 'Update of row 4.' WHERE x."C_NUM"='4' AND x."C_TS"=TO_TIMESTAMP('2016-05-23 13:48:58.178000000',' YYYY-MM-DD HH24:MI:SS.FF') AND x."C_TXT"='This is row 4\00\00\00' AND ROWNUM = 1

As you can see above, for the tables with a Primary Key and Unique Index, Replicat used that column in the where clause. For the table with neither, all columns and their before image data were used to build the where clause. Because there is no Primary Key or Unique Index on the target table, we must assume duplicate data rows exist; therefore, the number of rows update will be limited to one via the "ROWNUM = 1" statement appended to the WHERE clause.

Let's do another update using the column c_txt in the where clause of our source statement:

update loren.t_pkey set c_txt='Update of row 2.' where c_txt='This is row 2';
update loren.t_ui set c_txt='Update of row 2.' where c_txt='This is row 2';
update loren.t_nopk set c_txt='Update of row 2.' where c_txt='This is row 2';
commit;

The statements built and executed by Replicat on the target are:

UPDATE "LOREN"."T_PKEY" x SET x."C_TXT" = 'Update of row 2.' WHERE x."C_NUM"='2'
UPDATE "LOREN"."T_UI" x SET x."C_TS" = TO_TIMESTAMP('2016-05-23 13:48:58.163000000','YYYY-MM-DD HH24:MI:SS.FF'),x."C_TXT" = 'Update of row 2.' WHERE x."C_NUM"='2'
UPDATE "LOREN"."T_NOPK" x SET x."C_NUM" = '2',x."C_TS" = TO_TIMESTAMP('2016-05-23 13:48:58.178000000','YYYY-MM-DD HH24:MI:SS.FF'),x."C_TXT" = 'Update of row 2.' WHERE x."C_NUM"='2' AND x."C_TS"=TO_TIMESTAMP('2016-05-23 13:48:58.178000000','YYYY-MM-DD HH24:MI:SS.FF') AND x."C_TXT"='This is row 2\00\00\00' AND ROWNUM = 1

Once again we see that where a Primary Key or Unique Index exists, Replicat will use it in the where clause; while all columns and before image data is used in the where clause for the table with neither.

Why is using all of the columns in the an update statement a bad thing? For one, this type of DML is very expensive to execute as the database performs a full table scan. For my very small test table, this is no big deal; but, what if my table had millions or billions of rows? Then we would see a degradation in performance of both my database and Replicat.

Define A Pseudo Key

Now that we have an understanding of how Extract and Replicat functions when tables have primary keys, unique indexes, and neither defined; let's take things a bit further and have OGG override the database settings.

To do this, we set the KEYCOLS option of the Extract TABLE and Replicat MAP statements. KEYCOLS is shorthand for "key columns"; which is a user-defined column, or set of columns, OGG will use instead of an existing Primary Key or Unique Index. For tables without Primary Keys or Unique Indexes, OGG treats this pseudo-key as a Primary Key.

When using KEYCOLS, it is best practice to define the same pseudo-key in both Extract and Replicat. The pseudo-key data must also create uniqueness for the row being applied to the target; I.E., the where statement created by Replicat must modify one, and only, one row. Furthermore, the database must log the records for these columns as part of any update or delete operations performed. For my Oracle Database, I set this by executing the GGSCI command "add schematrandata pdborcl.lpenton allcols".

In my test tables, I have a timestamp column defined. To use this as a pseudo-key, modify my Extract and Replicat as follows:

extract eorcl
userid c##ggs, password AACAAAAAAAAAAAHAAIFBOIYAMCGIMARE BLOWFISH, ENCRYPTKEY DEFAULT
exttrail ./dirdat/eo
logallsupcols
updaterecordformat compact
table pdborcl.loren.t_pkey, KEYCOLS (c_ts);
table pdborcl.loren.t_ui, KEYCOLS (c_ts);
table pdborcl.loren.t_nopk, KEYCOLS (c_ts);
replicat cl_amer
userid ggs@pdbamer, password AACAAAAAAAAAAAHAAIFBOIYAMCGIMARE BLOWFISH, ENCRYPTKEY DEFAULT
assumetargetdefs
showsyntax
map pdborcl.loren.t_pkey, target pdbamer.loren.t_pkey, KEYCOLS (c_ts);
map pdborcl.loren.t_ui, target pdbamer.loren.t_ui, KEYCOLS (c_ts);
map pdborcl.loren.t_nopk, target pdbamer.loren.t_nopk, KEYCOLS (c_ts)

Normally, if a table has a valid primary key or unique index defined we will not use PKEY to override the database settings. I am doing so here just to show OGG operation when this option is set.

Now let's run another update transaction on the source tables:

update loren.t_pkey set c_txt='Update of row 5.' where c_num=5;
update loren.t_ui set c_txt='Update of row 5.' where c_num=5;
update loren.t_nopk set c_txt='Update of row 5.' where c_num=5;
commit;

When Integrated Extract resolves the table structure, we see the following in the Extract Report file:

2016-05-23 15:37:55  INFO    OGG-06509  Using the following key columns for source table PDBORCL.LOREN.T_PKEY: C_TS.
2016-05-23 15:37:55  INFO    OGG-06509  Using the following key columns for source table PDBORCL.LOREN.T_UI: C_TS.
2016-05-23 15:37:55  INFO    OGG-06509  Using the following key columns for source table PDBORCL.LOREN.T_NOPK: C_TS.

Replicat reports:

2016-05-23 15:44:21  INFO    OGG-06510  Using the following key columns for target table PDBAMER.LOREN.T_PKEY: C_TS.
UPDATE "LOREN"."T_PKEY" x SET x."C_NUM" = '5',x."C_TXT" = 'Update of row 5.' WHERE x."C_TS"=TO_TIMESTAMP('2016-05-23 13:48:58.147000000','YYYY-MM-DD HH24:MI:SS.FF')
2016-05-23 15:53:25 INFO OGG-06510 Using the following key columns for target table PDBAMER.LOREN.T_UI: C_TS.
UPDATE "LOREN"."T_UI" x SET x."C_NUM" = '5',x."C_TXT" = 'Update of row 5.' WHERE x."C_TS"=TO_TIMESTAMP('2016-05-23 13:48:58.163000000','YYYY-MM-DD HH24:MI:SS.FF')
2016-05-23 15:53:35 INFO OGG-06510 Using the following key columns for target table PDBAMER.LOREN.T_NOPK: C_TS.
UPDATE "LOREN"."T_NOPK" x SET x."C_NUM" = '5',x."C_TXT" = 'Update of row 5.' WHERE x."C_TS"=TO_TIMESTAMP('2016-05-23 13:48:58.178000000','YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM = 1

Notice that Replicat reports that the column C_TS is now being used as a key column; which is specified in the where clause built from the source updates.

Tips For Tables Without A Primary Key Or Unique Index

The following tips have been gathered from Oracle field personnel. It is not all-inclusive for what to do when replicating tables without primary keys or unique indexes; but does provide a good summary of things to consider.

1) Classify the tables without PK/UIs in 3 groups: high, medium, or low update/delete activity.  Focus most of your tuning effort on the high activity tables.
a) If high activity tables have a sub-set of columns that will ensure uniqueness (even though there isn’t a PK or UI), use KEYCOLS in the extract & replicat to reduce the data that needs to be sent from source to target.
b) If time permits, analyze the medium activity tables too.
c) No need to worry about the low activity tables since the impact will be minimal overall.

2) Be aware that all updates will get processed like PKUpdates; meaning that the full before and after image will be sent across to the target.
a) The before values will be used in the WHERE clause and the after values will be used in the SET clause of the UPDATE statement created by the replicat. The SQL statement could be quite large for tables with lots of columns; this is another reason to determine if a sub-set of columns will ensure uniqueness and use KEYCOLS.

3) If you’re planning to use a coordinated replicat, it will be best to define a few columns that have low odds of being updated with THREAD or THREADRANGE for the coordinator replicat to perform the hash algorithm to determine the processing thread.

4) Tables that have no key cannot be used for conflict detection and resolution (CDR).

5) Keep in mind row size limit for OGG of 4Mb for before AND after image combined.

6) BATCHSQL has a limitation of 25Kb for a row, if there is no PK and there are a lot of columns BATCHSQL should be disabled.

Summary

In this article we described and presented examples of how Oracle GoldenGate functions for tables with Primary Keys, Unique keys, and neither. We also presented examples of how end-users can create pseudo-keys that Oracle GoldenGate will use for target table update and delete operations; and finished by providing tips to keep in mind when working with tables without primary keys or unique indexes.

Join the discussion

Comments ( 1 )
  • Mayank Monday, June 24, 2019
    I have issue my replicat is abending , if developer by mistake writing a query like "update loren.t_pkey set c_txt='Update of row 5.'" , without using Primary key column in SQL (although it has PK ) , please suggest how to overcome this issue ,as i cannot stop Developer can write any adhoc queries , Thanks
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Recent Content