Oracle GoldenGate: Integrated Heartbeat

December 18, 2017 | 10 minute read
Text Size 100%:

Introduction

This article is applicable to Oracle GoldenGate Classic installations, version 19c and prior.

Oracle GoldenGate (OGG) version 12.2 introduced an automatic heartbeat integrated into the application. In this article we shall demonstrate how to enable the OGG heartbeat in an Oracle to Oracle replication environment, demonstrate the additional information made available via heartbeat, and demonstrate how to manage and modify heartbeat across the replication environment.

The concepts, scripts and information presented in this article are for educational purposes only. They are not supported by Oracle Development or Support, and come with no guarantee or warrant for functionality in any environment other than the test system used to prepare this article. Before applying any changes presented in this article to your environment, you should thoroughly test to assess functionality and performance implications.

Main Article

The figure below depicts my very simple OGG replication environment used for this discussion.

Sample OGG Environment

Before we can enable OGG heartbeat we need to identify a database schema for the associated tables and views. In my Oracle target database I already have a database user, ggadmin, configured which is used for data apply. However, on my source Oracle database I only have a container user, c##ggadmin, which is used for data capture from a pluggable database. To keep things simple and consistent, I created the database user ggadmin in my source pluggable database, giving the user sufficient privileges to create tables, views, and scheduler jobs.

Create The Heartbeat Tables

Now that the source and target database schemas are defined, we can create the heartbeat tables, views, and scheduler jobs.

First, we must set an Oracle GoldenGate GLOBALS parameter that identifies the schema where the heartbeat components will be created. In the source and target OracleGoldenGate installation directory, edit the file GLOBALS and specify the parameter ggschema.

[oracle@centos0ra12 goldengate]$ vi GLOBALS

ggschema ggadmin
~

Save and close the file.

If you have not used Oracle GoldenGate GLOBALS before, this is a special parameter file that defines instance wide runtime options. The file name must be uppercase and its contents are activated in the runtime environment each time GGSCI is started.

The ggschema parameter is used to specify the name of the schema that contains the database objects owned by Oracle GoldenGate.

To create the heartbeat objects in the source database, start GGSCI, execute the dblogin command to establish a database connection to the source Pluggable Database, and then execute the add heartbeattable command.

[oracle@centos0ra12 goldengate $ ./ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 12.2.0.1.1 OGGCORE_12.2.0.1.0_PLATFORMS_151211.1401_FBO
Linux, x64, 64bit (optimized), Oracle 12c on Dec 12 2015 02:56:48
Operating system character set identified as UTF-8.
Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.
GGSCI (centos0ra12) 1> dblogin userid ggadmin@pdborcl, password Oracle1
Successfully logged into database PDBORCL.
GGSCI (centos0ra12 as ggadmin@orcl/PDBORCL) 2> add heartbeattable
2016-08-04 11:10:48  INFO    OGG-14001  Successfully created heartbeat seed table ["GG_HEARTBEAT_SEED"].
2016-08-04 11:10:48  INFO    OGG-14032  Successfully added supplemental logging for heartbeat seed table ["GG_HEARTBEAT_SEED"].
2016-08-04 11:10:48  INFO    OGG-14000  Successfully created heartbeat table ["GG_HEARTBEAT"].
2016-08-04 11:10:48  INFO    OGG-14033  Successfully added supplemental logging for heartbeat table ["GG_HEARTBEAT"].
2016-08-04 11:10:48  INFO    OGG-14016  Successfully created heartbeat history table ["GG_HEARTBEAT_HISTORY"].
2016-08-04 11:10:48  INFO    OGG-14023  Successfully created heartbeat lag view ["GG_LAG"].
2016-08-04 11:10:48  INFO    OGG-14024  Successfully created heartbeat lag history view ["GG_LAG_HISTORY"].
2016-08-04 11:10:48  INFO    OGG-14003  Successfully populated heartbeat seed table with [ORCL].
2016-08-04 11:10:49  INFO    OGG-14004  Successfully created procedure ["GG_UPDATE_HB_TAB"] to update the heartbeat tables.
2016-08-04 11:10:49  INFO    OGG-14017  Successfully created procedure ["GG_PURGE_HB_TAB"] to purge the heartbeat history table.
2016-08-04 11:10:49  INFO    OGG-14005  Successfully created scheduler job ["GG_UPDATE_HEARTBEATS"] to update the heartbeat tables.
2016-08-04 11:10:49  INFO    OGG-14018  Successfully created scheduler job ["GG_PURGE_HEARTBEATS"] to purge the heartbeat history table.
GGSCI (centos0ra12 as ggadmin@orcl/PDBORCL) 3>

In the above command execution we took the defaults of the add heartbeattable command. We could have overridden the defaults by adding command line options:

ADD HEARTBEATTABLE [, FREQUENCY number in seconds] [, RETENTION_TIME number in days] | [, PURGE_FREQUENCY number in days]

FREQUENCY
Specifies how often the heartbeat seed table and heartbeat table are updated. For example, how frequently heartbeat records are generated. The default is 60 seconds.

RETENTION_TIME
Specifies when heartbeat entries older than the retention time in the history table are purged. The default is 30 days.

PURGE_FREQUENCY
Specifies how often the purge scheduler is run to delete table entries that are older than the retention time from the heartbeat history . The default is 1 day.

To create the heartbeat objects in the target database, execute the same GGSCI commands:

GGSCI (centos0ra12) 6> dblogin userid ggadmin@pdbnoreast, password Oracle1
Successfully logged into database PDBNOREAST.
GGSCI (centos0ra12 as ggadmin@orcl/PDBNOREAST) 7> add heartbeattable
2016-08-04 12:37:53  INFO    OGG-14001  Successfully created heartbeat seed table ["GG_HEARTBEAT_SEED"].
2016-08-04 12:37:53  INFO    OGG-14032  Successfully added supplemental logging for heartbeat seed table ["GG_HEARTBEAT_SEED"].
2016-08-04 12:37:53  INFO    OGG-14000  Successfully created heartbeat table ["GG_HEARTBEAT"].
2016-08-04 12:37:53  INFO    OGG-14033  Successfully added supplemental logging for heartbeat table ["GG_HEARTBEAT"].
2016-08-04 12:37:53  INFO    OGG-14016  Successfully created heartbeat history table ["GG_HEARTBEAT_HISTORY"].
2016-08-04 12:37:54  INFO    OGG-14023  Successfully created heartbeat lag view ["GG_LAG"].
2016-08-04 12:37:54  INFO    OGG-14024  Successfully created heartbeat lag history view ["GG_LAG_HISTORY"].
2016-08-04 12:37:54  INFO    OGG-14003  Successfully populated heartbeat seed table with [ORCL].
2016-08-04 12:37:54  INFO    OGG-14004  Successfully created procedure ["GG_UPDATE_HB_TAB"] to update the heartbeat tables.
2016-08-04 12:37:54  INFO    OGG-14017  Successfully created procedure ["GG_PURGE_HB_TAB"] to purge the heartbeat history table.
2016-08-04 12:37:54  INFO    OGG-14005  Successfully created scheduler job ["GG_UPDATE_HEARTBEATS"] to update the heartbeat tables.
2016-08-04 12:37:54  INFO    OGG-14018  Successfully created scheduler job ["GG_PURGE_HEARTBEATS"] to purge the heartbeat history table.
GGSCI (centos0ra12 as ggadmin@orcl/PDBNOREAST) 8>

If any Oracle GoldenGate Groups were running when you executed the add heartbeattable command, they must be restarted. Heartbeat is now active and the heartbeat data is automatically captured and replicated; you do not need to make any modifications to the OGG parameter files.

Since we took the defaults, every 60 seconds an update is performed against the gg_heartbeat_seed table. This record is captured and replicated to the target tables gg_heartbeat_seed, gg_heartbeat, and gg_heartbeat_history. The target table gg_heartbeat contains the data from the last heartbeat record processed by Replicat while the table gg_heartbeat_history contains all hearbeat records.

The source side gg_heartbeat_seed table records the local (source) database name and the timestamp of the last heartbeat.

LOCAL_DATABASE HEARTBEAT_TIMESTAMP REMOTE_DATABASE INCOMING_EXTRACT
ORCL 04-AUG-16 04.58.52.111031000 PM
INCOMING_ROUTING_PATH INCOMING_REPLICAT INCOMING_HEARTBEAT_TS INCOMING_EXTRACT_TS
INCOMING_ROUTING_TS INCOMING_REPLICAT_TS OUTGOING_EXTRACT OUTGOING_ROUTING_PATH
OUTGOING_REPLICAT OUTGOING_HEARTBEAT_TS OUTGOING_EXTRACT_TS OUTGOING_ROUTING_TS
OUTGOING_REPLICAT_TS

 

The target side gg_heartbeat table contains the last heartbeat record, with additional information added by every Oracle GoldenGate Group in the replication stream.

LOCAL_DATABASE HEARTBEAT_TIMESTAMP REMOTE_DATABASE INCOMING_EXTRACT
ORCL 04-AUG-16 05.34.52.379802000 PM ORCL ETPC
       
INCOMING_ROUTING_PATH INCOMING_REPLICAT INCOMING_HEARTBEAT_TS INCOMING_EXTRACT_TS
PTPC RTPC 04-AUG-16 05.34.52.379802000 PM 04-AUG-16 05.34.53.561655000 PM
       
INCOMING_ROUTING_TS INCOMING_REPLICAT_TS OUTGOING_EXTRACT OUTGOING_ROUTING_PATH
04-AUG-16 05.34.54.961135000 PM 04-AUG-16 05.34.57.632114000 PM    
       
OUTGOING_REPLICAT OUTGOING_HEARTBEAT_TS OUTGOING_EXTRACT_TS OUTGOING_ROUTING_TS
       
       
OUTGOING_REPLICAT_TS      
       

 

The target side gg_heartbeat_history table contains all retained heartbeat records.

HEARTBEAT_RECEIVED_TS REMOTE_DATABASE INCOMING_EXTRACT INCOMING_REPLICAT
04-AUG-16 04.48.40.927614000 PM ORCL ETPC RTPC
04-AUG-16 04.48.40.931696000 PM ORCL ETPC RTPC
04-AUG-16 04.48.40.932235000 PM ORCL ETPC RTPC
04-AUG-16 04.48.40.933346000 PM ORCL ETPC RTPC
04-AUG-16 04.48.40.934062000 PM ORCL ETPC RTPC
04-AUG-16 04.49.09.943003000 PM ORCL ETPC RTPC
04-AUG-16 04.50.10.073445000 PM ORCL ETPC RTPC
04-AUG-16 04.51.10.209285000 PM ORCL ETPC RTPC
04-AUG-16 04.52.09.107407000 PM ORCL ETPC RTPC
       
INCOMING_HEARTBEAT_TS INCOMING_EXTRACT_TS INCOMING_REPLICAT_TS OUTGOING_EXTRACT
04-AUG-16 04.43.52.075803000 PM 04-AUG-16 04.43.56.971288000 PM 04-AUG-16 04.48.38.295487000 PM  
04-AUG-16 04.44.52.284489000 PM 04-AUG-16 04.44.54.094577000 PM 04-AUG-16 04.48.38.295487000 PM  
04-AUG-16 04.45.52.094065000 PM 04-AUG-16 04.45.53.226088000 PM 04-AUG-16 04.48.38.295487000 PM  
04-AUG-16 04.46.52.105854000 PM 04-AUG-16 04.46.53.427845000 PM 04-AUG-16 04.48.38.295487000 PM  
04-AUG-16 04.47.52.097313000 PM 04-AUG-16 04.47.53.552932000 PM 04-AUG-16 04.48.38.295487000 PM  
04-AUG-16 04.48.52.072194000 PM 04-AUG-16 04.48.53.682091000 PM 04-AUG-16 04.48.57.412599000 PM  
04-AUG-16 04.49.52.265018000 PM 04-AUG-16 04.49.53.807273000 PM 04-AUG-16 04.49.57.688499000 PM  
04-AUG-16 04.50.52.130920000 PM 04-AUG-16 04.50.53.953876000 PM 04-AUG-16 04.50.57.904505000 PM  
04-AUG-16 04.51.52.097373000 PM 04-AUG-16 04.51.53.050948000 PM 04-AUG-16 04.51.57.085295000 PM  
       
OUTGOING_ROUTING_PATH OUTGOING_REPLICAT OUTGOING_HEARTBEAT_TS OUTGOING_EXTRACT_TS
       
       
OUTGOING_ROUTING_TS OUTGOING_REPLICAT_TS    
       

 

On the target, the views gg_lag and gg_lag_history  retain information relevant to lag statistics. These views are queried whenever the GGSCI lag command is executed.

Contents of the gg_lag view will be similar to this:

LOCAL_DATABASE CURRENT_LOCAL_TS REMOTE_DATABASE INCOMING_HEARTBEAT_AGE
ORCL 04-AUG-16 05.54.02.747524000 PM ORCL 70.568669
       
INCOMING_PATH INCOMING_LAG OUTGOING_HEARTBEAT_AGE OUTGOING_PATH
 ETPC ==> PTPC ==> RTPC 4.896061    
       
OUTGOING_LAG      
       

Contents of the gg_lag_history table will be similar to this

LOCAL_DATABASE HEARTBEAT_RECEIVED_TS REMOTE_DATABASE INCOMING_HEARTBEAT_AGE
ORCL 04-AUG-16 04.48.40.927614000 PM ORCL 288.851811
ORCL 04-AUG-16 04.48.40.931696000 PM ORCL 228.647207
ORCL 04-AUG-16 04.48.40.932235000 PM ORCL 168.83817
ORCL 04-AUG-16 04.48.40.933346000 PM ORCL 108.827492
ORCL 04-AUG-16 04.48.40.934062000 PM ORCL 48.836749
ORCL 04-AUG-16 04.49.09.943003000 PM ORCL 17.870809
ORCL 04-AUG-16 04.50.10.073445000 PM ORCL 17.808427
ORCL 04-AUG-16 04.51.10.209285000 PM ORCL 18.078365
ORCL 04-AUG-16 04.52.09.107407000 PM ORCL 17.010034
INCOMING_PATH INCOMING_LAG OUTGOING_HEARTBEAT_AGE OUTGOING_PATH
 ETPC ==> PTPC ==> RTPC 286.219684
 ETPC ==> PTPC ==> RTPC 226.010998
 ETPC ==> PTPC ==> RTPC 166.201422
 ETPC ==> PTPC ==> RTPC 106.189633
 ETPC ==> PTPC ==> RTPC 46.198174
 ETPC ==> PTPC ==> RTPC 5.340405
 ETPC ==> PTPC ==> RTPC 5.423481
 ETPC ==> PTPC ==> RTPC 5.773585
 ETPC ==> PTPC ==> RTPC 4.987922
OUTGOING_LAG

Obtaining Lag Information

The GGSCI lag command is used to obtain lag statistics for running Oracle GoldenGate Groups. On the target server, executing this command for the Replicat returns the following information:

GGSCI (centos0ra12) 2> lag rtpc
2016-08-04 14:03:57  INFO    OGG-14054  Lag from heartbeat table requires DBLOGIN.
Sending GETLAG request to REPLICAT RTPC ...
Last record lag 4 seconds.
Low watermark lag: 73.
High watermark lag: 14.
Low watermark position: 3858356.
High watermark position: 3858411.
At EOF, no more records to process.

If I dblogin to the database, I am able to obtain additional lag information from the lag views:

GGSCI (centos0ra12) 5> dblogin userid ggadmin@pdbnoreast, password Oracle1
Successfully logged into database PDBNOREAST.
GGSCI (centos0ra12 as ggadmin@orcl/PDBNOREAST) 6> lag rtpc
Lag Information From Heartbeat Table
LAG                 AGE                 FROM       TO         PATH
5.77s               10m 22.87s          ORCL       ORCL       ETPC ==> PTPC ==> RTPC
Sending GETLAG request to REPLICAT RTPC ...
Last record lag 561 seconds.
Low watermark lag: 76.
High watermark lag: 16.
Low watermark position: 3859364.
High watermark position: 3859418.

Now we are able to see details about lag for the entire replication path. Overall lag as determined by the last record Replicat read from the trail (current process timestamp minus record commit timestamp) is 561 seconds. The lag information from the heartbeat table is showing that this Replicat has not processed a heartbeat record for over 10 minutes and the last heartbeat record processed incurred 5.77 seconds of lag across the replication stream. Details about the replication path, listing each Oracle GoldenGate Group that processed the heartbeat record, are also shown.

You may notice four additional lines of data in the output: Low watermark lag, High watermark lag, Low watermark position, and High watermark position. The watermark position data provides SCN information for data applied by the Replicat at the time the lag command was executed. In the above example, the low watermark position for the set of transactions processed by Replicat was SCN 3859364. Basically this means that all transactions for SCNs less than this number have already been applied to the database. While transactions greater than SCN 3859418, the high watermark position have yet to be applied. Consequently, the lag encountered at the transaction at the low watermark position, SCN 3859364, was 76 seconds (Low watermark lag) and the lag encountered at the high watermark position, SCN 385418, was 16 seconds. So, we can determine that lag is reducing as the Replicat processes its backlogged data.

Troubleshooting Lag

If you notice above, the last record lag reported by my Replicat was 561 seconds. That's great to know, but where in the replication stream did the lag come from? For that, we can query the gg_heartbeat_history table; however there are a couple of important things to note first:

1. The timestamps recorded are in UTC format.

2. Lag computations require that the source and target server clocks be setup correctly; i.e., regularly synced with a network time service. A negative lag means your target server clock is ahead of the source server; while, and inordinately high lag could mean the source is not synced with the target.

To compute lag at each point in the replication stream, we want to get information for these columns in the gg_heartbeat_history table:

(a) incoming_heartbeat_ts – source side heartbeat timestamp
(b) incoming_extract_ts – timestamp when extract processed the heartbeat record
(c) incoming_routing_ts – timestamp when data pump read the heartbeat record from the extract trail
(d) incoming_replicat_ts – timestamp when replicat read the heartbeat record from the remote trail
(e) heartbeat_received_ts   - timestamp when replicat applied the heartbeat record to the target

With this information, we can now compute the following:
(a) heartbeat_received_ts - incoming_heartbeat_ts = total end-to-end lag
(b) incoming_extract_ts -  incomning_heartbeat_ts = cdc extract lag
(b) incoming_routing_ts – incoming_extract_ts = data pump read lag
(c) incoming_replicat_ts – incoming_routing_ts = replicat read lag
(d) heartbeat_received_ts – incoming_replicat_ts = replicat apply lag

Then create a database query to return lag across the replication stream in seconds:

set hea on set wrap off set lin 300 column Extract format a9 column Data_Pump format a10 column Replicat format a9 select to_char(incoming_heartbeat_ts,'DD-MON-YY HH24:MI:SSxFF') Source_HB_Ts        , incoming_extract Extract        , extract (day from (incoming_extract_ts - incoming_heartbeat_ts))*24*60*60+          extract (hour from (incoming_extract_ts - incoming_heartbeat_ts))*60*60+          extract (minute from (incoming_extract_ts - incoming_heartbeat_ts))*60+          extract (second from (incoming_extract_ts - incoming_heartbeat_ts)) Extract_Lag        , incoming_routing_path Data_Pump        , extract (day from (incoming_routing_ts - incoming_extract_ts))*24*60*60+          extract (hour from (incoming_routing_ts - incoming_extract_ts))*60*60+          extract (minute from (incoming_routing_ts - incoming_extract_ts))*60+          extract (second from (incoming_routing_ts - incoming_extract_ts)) Data_Pump_Read_Lag        , incoming_replicat Replicat        , extract (day from (incoming_replicat_ts - incoming_routing_ts))*24*60*60+          extract (hour from (incoming_replicat_ts - incoming_routing_ts))*60*60+          extract (minute from (incoming_replicat_ts - incoming_routing_ts))*60+          extract (second from (incoming_replicat_ts - incoming_routing_ts)) Replicat_Read_Lag        , extract (day from (heartbeat_received_ts - incoming_replicat_ts))*24*60*60+          extract (hour from (heartbeat_received_ts - incoming_replicat_ts))*60*60+          extract (minute from (heartbeat_received_ts - incoming_replicat_ts))*60+          extract (second from (heartbeat_received_ts - incoming_replicat_ts)) Replicat_Apply_Lag        , extract (day from (heartbeat_received_ts - incoming_heartbeat_ts))*24*60*60+          extract (hour from (heartbeat_received_ts - incoming_heartbeat_ts))*60*60+          extract (minute from (heartbeat_received_ts - incoming_heartbeat_ts))*60+          extract (second from (heartbeat_received_ts - incoming_heartbeat_ts)) Total_Lag  from ggadmin.gg_heartbeat_history order by heartbeat_received_ts desc;

Which returned the following in my test environment:

lagout

This now allows me to see the overall health of my entire replication stream. For example, in the data I see a total lag of 250.241 seconds was recorded at 15:28 on 8 Dec 2017. If this lag exceeds the service level agreement requirements for this environment, I can then do some further analysis to determine why the Replicat had issues due to the 244 second read lag recorded for that process.

Summary

In this article we introduced you to the internal Oracle GoldenGate heartbeat, detailed how to enable heartbeat across the replication environment, and provided information about the tables and views that store heartbeat statistics and how to obtain that information via the GGSCI lag command and direct database queries.

For more information on what other articles are available for Oracle GoldenGate please view our index page.

Loren Penton


Previous Post

Part 4 of 4 - SSSD Authentication: Known Problems and Troubleshooting Tips

Tim Melander | 9 min read

Next Post


Using Business Events in Fusion-based SaaS with Oracle Integration Cloud Service - Part 1: Overview

Stefan Koser | 3 min read