Oracle GoldenGate: Replicating “Soft” Deletes To Data Warehouses

Introduction

We receive a lot of questions on how to setup Oracle GoldenGate to perform “soft” deletes in Data Warehouses. By default, Oracle GoldenGate replicates data operations exactly as they occur in the source database; however, in Data Warehouses their is typically a requirement to retain the original data record and set a flag that shows the record was deleted. This is what we mean by “soft” delete.

In this article we shall present the three most common use cases and Oracle GoldenGate Replicat configurations that address the requirements. I shall be using Teradata as my target database; however, the concepts presented are valid for any RDBMS database supported as an Oracle GoldenGate target.

Main Article

When configuring replication to Data Warehouses, the three most common use cases we encounter are:

(1) Retain a record of all source database operations for regulatory requirements.

(2) Update the target database to set a delete flag and then insert the delete record as a new row. If the same data is re-inserted into the source database, create a new row in the target for this record.

(3) Retain a single row in the target for the source data. If the source row is deleted, update a flag to record the delete. If the record is re-inserted into the source database, update the target row with the source record data and reset the delete flag.

For our tests, we shall use the following source and target tables:

Oracle Source Table
create table tpc.repldel (
myrowid   number(11) not null,
atextrow  varchar2(50),
insert_ts timestamp(6) not null,
updt_ctr  number(18),
update_ts timestamp(6)
);

Teradata Target Table
create multiset table FDSUSER.REPLDEL (
MYROWID   decimal(11),
ATEXTROW  varchar(50),
INSERT_TS timestamp(6),
UPDT_CTR  decimal(18),
UPDATE_TS timestamp(6),
DEL_FLAG  char(1)
)
primary index (MYROWID);

My source table definition does not contain a primary key or unique index, so we must ensure that all columns are logged for update and delete operations. Likewise, my Teradata target is a Mulitset table that does not have any unique indexes defined; this allows duplicate data rows to exist in the table.

Use Case #1

Use case: Retail A Record Of All Source Database Operations.

For this use case, we configure Replicat to insert every source record, no matter the source operation type, by using the INSERTALLRECORDS parameter. The Replicat configuration I’ll use for my test is:

replicat deltest
targetdb tdexpress, userid ggadmin, password Oracle1
maxtransops 500
batchsql batchtransops 500, bytesperqueue 1000000, opsperbatch 500
dboptions nocatalogconnect
insertallrecords
map PDBORCL.TPC.REPLDEL, target FDSUSER.REPLDEL,
COLMAP( USEDEFAULTS,
DEL_FLAG = @IF (@STRCMP (@GETENV (‘GGHEADER’, ‘OPTYPE’), ‘DELETE’) = 0, ‘Y’, ‘N’)
);

Generate test data:

insert into tpc.repldel values (1, ‘Insert row 1’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (2, ‘Insert row 2’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (3, ‘Insert row 3’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (4, ‘Insert row 4’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (5, ‘Insert row 5’, CURRENT_TIMESTAMP, 0, NULL);
commit;

update tpc.repldel set atextrow = ‘Update row 2’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 2;
commit;

update tpc.repldel set atextrow = ‘Update row 5’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 5;
commit;

update tpc.repldel set atextrow = ‘Update row 2’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 2;
update tpc.repldel set atextrow = ‘Update row 5’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 5;
commit;

delete from tpc.repldel where myrowid = 5;
commit;

Target table contents:

SELECT MYROWID, ATEXTROW, INSERT_TS, UPDT_CTR, UPDATE_TS, DEL_FLAG FROM FDSUSER.REPLDEL order by MYROWID, UPDT_CTR;

 

MYROWID ATEXTROW INSERT_TS UPDT_CTR UPDATE_TS DEL_FLAG
1 Insert row 1 1/25/2017 13:35:52.291655 0 ? N
2 Insert row 2 1/25/2017 13:35:52.297039 0 ? N
2 Update row 2 1/25/2017 13:35:52.297039 1 1/25/2017 13:37:28.070826 N
2 Update row 2 1/25/2017 13:35:52.297039 2 1/25/2017 13:37:28.080780 N
3 Insert row 3 1/25/2017 13:35:52.297880 0 ? N
4 Insert row 4 1/25/2017 13:35:52.298722 0 ? N
5 Insert row 5 1/25/2017 13:35:52.299372 0 ? N
5 Update row 5 1/25/2017 13:35:52.299372 1 1/25/2017 13:37:28.076635 N
5 Update row 5 1/25/2017 13:35:52.299372 2 1/25/2017 13:37:28.081412 N
5 Update row 5 1/25/2017 13:35:52.299372 2 1/25/2017 13:37:28.081412 Y

 

As we can see in the output, a row exists in the target that corresponds to each source operation; with any rows deleted identified by the DEL_FLAG column.

Use Case #2

Use case: Update the target database to set a delete flag and then insert the delete record as a new row. If the same data is re-inserted into the source database, create a new row in the target for this record.

Because there are no unique indexes on the target table, we need to define KEYCOLS on the Replicat to define uniqueness for update operations. We handle the delete requirement by setting the parameter INSERTDELETES. The Replicat configuration I’ll use for this test is:

replicat deltest
targetdb tdexpress, userid ggadmin, password Oracle1
maxtransops 500
batchsql batchtransops 500, bytesperqueue 1000000, opsperbatch 500
dboptions nocatalogconnect
insertdeletes
map PDBORCL.TPC.REPLDEL, target FDSUSER.REPLDEL,
KEYCOLS (MYROWID, INSERT_TS), COLMAP( USEDEFAULTS,
DEL_FLAG = @IF (@STRCMP (@GETENV (‘GGHEADER’, ‘OPTYPE’), ‘DELETE’) = 0, ‘Y’, ‘N’)
);

Generate test data:

insert into tpc.repldel values (1, ‘Insert row 1’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (2, ‘Insert row 2’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (3, ‘Insert row 3’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (4, ‘Insert row 4’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (5, ‘Insert row 5’, CURRENT_TIMESTAMP, 0, NULL);
commit;

update tpc.repldel set atextrow = ‘Update row 2’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 2;
commit;

update tpc.repldel set atextrow = ‘Update row 5’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 5;
commit;

update tpc.repldel set atextrow = ‘Update row 2’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 2;
update tpc.repldel set atextrow = ‘Update row 5’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 5;
commit;

delete from tpc.repldel where myrowid = 5;
delete from tpc.repldel where myrowid = 1;
commit;

insert into tpc.repldel values (1, ‘Insert row 1’, CURRENT_TIMESTAMP, 0, NULL);
commit;

Target table contents:

SELECT MYROWID, ATEXTROW, INSERT_TS, UPDT_CTR, UPDATE_TS, DEL_FLAG FROM FDSUSER.REPLDEL order by MYROWID, UPDT_CTR;

MYROWID ATEXTROW INSERT_TS UPDT_CTR UPDATE_TS DEL_FLAG
1 Insert row 1 1/25/2017 14:20:06.915726 0 ? N
1 Insert row 1 1/25/2017 14:19:06.064428 0 ? Y
1 Insert row 1 1/25/2017 14:19:06.064428 0 ? N
2 Update row 2 1/25/2017 14:19:06.065892 2 1/25/2017 14:19:42.388967 N
3 Insert row 3 1/25/2017 14:19:06.067126 0 ? N
4 Insert row 4 1/25/2017 14:19:06.068325 0 ? N
5 Update row 5 1/25/2017 14:19:06.069736 2 1/25/2017 14:19:42.389616 Y
5 Update row 5 1/25/2017 14:19:06.069736 2 1/25/2017 14:19:42.389616 N

 

As we can see in the output, we now have 3 rows for MYROWID 1; one for the original insert, one for the delete, and one for the re-insert of the row in the source database. Likewise, there are two rows for MYROWID 5, one for the update of the original row and one for the delete.

Use Case #3

Use case: Retain a single row in the target for the source data. If the source row is deleted, update a flag to record the delete. If the record is re-inserted into the source database, update the target row with the source record data and reset the delete flag.

This use case requires a change to the target table, as it requires only one row per source record. For Teradata, we either need to redefine the table as a Set table, or add an Unique Primary Index to the existing Multiset Table. To make this change, I delete all of my test data from the Teradata table and executed the command:

alter table FDSUSER.REPLDEL modify unique primary index (MYROWID);

In the Replicat, I’ll use the parameter UPDATEDELETES to set the DEL_FLAG and retain a copy of the source record. Because we set an Unique Primary Index on the column MYROWID, Teradata will return a duplicate row violation if an insert comes from the source for a row that had previously been deleted. We’ll set Replicat to catch the error returned by Teradata and then execute an exceptions map to update the existing row with the new data. The Replicat configuration I’ll use for this test is:

replicat deltest
targetdb tdexpress, userid ggadmin, password Oracle1
maxtransops 500
batchsql batchtransops 500, bytesperqueue 1000000, opsperbatch 500
dboptions nocatalogconnect
reperror (-2801, exception)
updatedeletes
map PDBORCL.TPC.REPLDEL, target FDSUSER.REPLDEL,
COLMAP( USEDEFAULTS,
DEL_FLAG = @IF (@STRCMP (@GETENV (‘GGHEADER’, ‘OPTYPE’), ‘DELETE’) = 0, ‘Y’, ‘N’)
);
— Exceptions handlers
allowduptargetmap
updateinserts
map PDBORCL.TPC.REPLDEL, target FDSUSER.REPLDEL, EXCEPTIONSONLY
COLMAP( USEDEFAULTS,
DEL_FLAG = ‘N’
);

Generate test data:

insert into tpc.repldel values (1, ‘Insert row 1’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (2, ‘Insert row 2’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (3, ‘Insert row 3’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (4, ‘Insert row 4’, CURRENT_TIMESTAMP, 0, NULL);
insert into tpc.repldel values (5, ‘Insert row 5’, CURRENT_TIMESTAMP, 0, NULL);
commit;

update tpc.repldel set atextrow = ‘Update row 2’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 2;
commit;

update tpc.repldel set atextrow = ‘Update row 5’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 5;
commit;

update tpc.repldel set atextrow = ‘Update row 2’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 2;
update tpc.repldel set atextrow = ‘Update row 5’, updt_ctr = updt_ctr+1, update_ts = CURRENT_TIMESTAMP
where myrowid = 5;
commit;

delete from tpc.repldel where myrowid = 5;
delete from tpc.repldel where myrowid = 1;
commit;

Target table contents:

SELECT MYROWID, ATEXTROW, INSERT_TS, UPDT_CTR, UPDATE_TS, DEL_FLAG FROM FDSUSER.REPLDEL order by MYROWID, UPDT_CTR;

MYROWID ATEXTROW INSERT_TS UPDT_CTR UPDATE_TS DEL_FLAG
1 Insert row 1 1/25/2017 14:50:48.629563 0 ? Y
2 Update row 2 1/25/2017 14:50:48.635908 2 1/25/2017 14:51:20.975013 N
3 Insert row 3 1/25/2017 14:50:48.637100 0 ? N
4 Insert row 4 1/25/2017 14:50:48.638282 0 ? N
5 Update row 5 1/25/2017 14:50:48.639429 2 1/25/2017 14:51:20.975918 Y

 

We now have one row in the target corresponding to each source operation. The DEL_FLAG column shows that records for MYROWID 1 and 5 were deleted from the source table.

Now re-insert data for MYROWID 5 in the source table.

insert into tpc.repldel values (5, ‘Insert row 5’, CURRENT_TIMESTAMP, 0, NULL);
commit;

The corresponding record in the target table is:

MYROWID ATEXTROW INSERT_TS UPDT_CTR UPDATE_TS DEL_FLAG
1 Insert row 1 1/25/2017 14:50:48.629563 0 ? Y
2 Update row 2 1/25/2017 14:50:48.635908 2 1/25/2017 14:51:20.975013 N
3 Insert row 3 1/25/2017 14:50:48.637100 0 ? N
4 Insert row 4 1/25/2017 14:50:48.638282 0 ? N
5 Insert row 5 1/25/2017 14:58:21.334681 0 ? N

 

The data for MYROWID was overlaid with the incoming source data and the DEL_FLAG was reset.

 

Summary

In this article we presented solutions for the three most common use cases requiring the replication of “soft” deletes to Data Warehouses.

Add Your Comment