Oracle GoldenGate: Understanding Referential and Transactional Data Integrity

Introduction

When working with new Oracle GoldenGate (OGG) customers we always get two questions:

  • How can I ensure the referential integrity of my data?
  • How can I ensure transactions are applied to my target database in the proper order?

The key is to understand the difference between referential and transactional data, and how they relate to OGG.

Main Article

What is Referential Integrity?

Definition: Referential integrity is a relational database concept which states that table relationships must remain consistent.

But what does this mean in practical applications? Simply put, if Table “B” contains a column that is the primary key of Table “A”, this column represents a foreign key relationship between the two tables and the following rules apply:

  • You may not add a record to Table “B” unless there is a corresponding record in Table “A”.
  • Updates or deletes in Table “B” must be reflected in Table “A”.
  • Updates or deletes in Table “A” must be reflected in Table “B”.

Consider a simple banking database, containing the table definitions:

ri_bank

In our banking database, a customer is defined by a row of data in the BANK_CUSTOMER table and uniquely identified by the column CUSTOMER_ID. Since every customer should also maintain an account with our bank, we establish a customer to account relationship via the column ACCOUNT_CUSTOMER_ID of the BANK_ACCOUNTS table. This column is special because we define the column to reference the CUSTOMER_ID column of the BANK_CUSTOMER table, creating a foreign key relationship between the two tables.

By applying the referential integrity standard, a BANK_CUSTOMER.CUSTOMER_ID record cannot be modified without changing the corresponding BANK_ACCOUNTS.ACCOUNT_CUSTOMER_ID record. For example, if Jane Doe’s customer ID is changed in the BANK_CUSTOMER table, this change must also be applied to the BANK_ACCOUNTS.ACCOUNT_CUSTOMER_ID record to maintain the link between her personal information and her bank accounts.

In some cases, Database Administrators have implied referential relationships where no physical definition exists. For example, the BANK_TRANSACTION_LOG table contains the column TRANSACTION_ACCOUNT_NUMBER, which could be construed as having a foreign key relationship to the BANK_ACCOUNT.ACCOUNT_NUMBER column. Querying the database for the BANK_TRANSACTION_LOG table definition would show that no foreign key relationship exists between the two tables; therefore, the tables have no referential relationship.

Referential Integrity and OGG

The rule for maintaining referential integrity when implementing OGG is:

All tables with foreign key relationships must be processed by the same Change Data Capture (Extract), Extract Data Pump, and Change Data Apply (Replicat) process.

Following this rule, the OGG configuration for our simple banking database would be similar to this:

ri_ogg

Because our Extract and Extract Data Pump are processing records for all three tables, referential integrity is maintained. On the Replicat side, REPA is applying data records to the target BANK_CUSTOMER and BANK_ACCOUNTS tables; thus maintaining referential integrity. Because there is no referential relationship to the BANK_TRANSACTION_LOG table, to improve overall OGG performance data for that table is being processed by a single Replicat, REPB.

What is Transactional Integrity?

Definition: The degree to which a transaction flowing through a network reaches its intended destination without impairment of its function, content, or meaning.

For data integration purposes this means that for any logical unit of work performed at the source database, the data will be captured and applied in order to the target database, and the two databases will be consistent upon completion of the unit of work.

Transactional Integrity and OGG

Oracle GoldenGate works with commit sequenced data. When a unit of work is started on the database, the Extract process reads the transaction log storing the retrieved data in a local or extended memory space, depending upon the size of the unit of work. This data remains in the process memory space until either a rollback is initiated or the unit of work completes via a database COMMIT.

When the COMMIT is recorded by the database, the Extract process flushes the queued transaction information to a designated disk storage location for further processing, transmission, and apply to the target. Therefore all data processed by OGG is in commit sequence order.

Using our simple banking database example from above, lets look at the transaction flow for the following scenario:

  1. TX1 – Jane opens a new account, depositing $25,000.
  2. TX2 – Jane purchases $185 at the market.
  3. TX3 – Jane purchases $500 in painting supplies.
  4. TX4 – Jane makes a $300 ATM withdrawal to celebrate her windfall.

Lets further complicate things by having the market’s POS system go down before Jane’s $185 purchase is committed. The data flow will look like this:

 

data_capture

As we can see, the database has three committed transactions which were captured and recorded by the Extract process. The remaining, uncommitted transaction remains in the Extract process memory buffer until either a database COMMIT or ROLLBACK occurs. This functionality maintains transactional data integrity and the end result is consistency between the source and target database.

Add Your Comment