X

Best Practices from Oracle Development's A‑Team

Oracle GoldenGate Best Practices: HP Nonstop Audited Data Replication

Introduction

Oracle GoldenGate for HP Nonstop provides capabilities for the manipulation of transactional data and the replication of selected TMF Audited Enscribe, SQL/MP, SQL/MX, and Non-Audited Enscribe data to a variety of applications and platforms. In this article we shall present a series of "best practices" for the configuration and operation of Oracle GoldenGate when replicating HP Nonstop TMF Audited data.

Main Article

Properly configuring your Oracle GoldenGate (OGG) instance on HP Nonstop begins with planning the implementation. Some basic things to consider include:

Disk Capacity

On the source server, OGG Extract reads TMF Trails to capture transactional data. However, TMF has no knowledge of outside processes that depend upon it's Audit Trails and purges the ones it no longer requires. There are several methods for dealing with this, which we shall detail later; however, when determining disk capacity compute the TMF requirements to make sure a certain number of audit files are always available, either in the default TMF Trail location or backed up to an alternative subvolume.

On the source and target server, OGG should be installed on disks with sufficient capacity to hold the OGG programs, Trail files, parameter files, logs, and report files. You can always spread this data across multiple disks; however, this adds complexity to the management of your OGG instance. In the OGG instance, the largest consumer of disk resources are the OGG Trail files. The size of these files are user defined and may be up to 2 GB in size. New Trail files are created as old ones fill, and the old ones are purged at regular intervals after they are fully consumed by Extract Data Pump and Replicat processes.

The amount of data written to the OGG Trails depends upon how much data OGG is configured to capture from TMF. In a typical production implementation, OGG will capture approximately 30% of the data TMF writes to an Audit Trail. Using the default of 100 MB for TMF Audit Trails, this means the minimum size each OGG Trail should be is 30 MB. However, the most expensive operation that may be performed on a NonStop Server is file creation, so we don't want OGG to create a new Trail file every time TMF Audit Trails roll.

Since OGG Trails can be up to 2 GB in size, we want to put as much captured TMF data as possible into a single OGG Trail. This is dependent upon the number of TMF Audit Trail file rollovers per hour. For example, if my server is rolling fifteen 100 MB Audit Trails per hour, then OGG will capture and write 450 MB of data to its Trails per hour. Using this example, OGG Trails of 1 GB may be configured to hold 2 hours of captured TMF data; while OGG Trails of 2 GB will contain just over 4 hours of data.

For this discussion the optimal size is 2 GB OGG Trails as that provides the best efficiency, rolling OGG Trails 6 times daily. However, we should note that 2 GB Trails really allocate 1.9 GB disk files, and in some cases the trails may roll at 80-85% of capacity; depending upon the transaction size. We should also note that each OGG Trail contains a 1.2 KB header, a 29 byte marker for each heartbeat record, and that a 54 byte header record is added to each transactional data record. Because of this, we will increase the number of OGG Trail rolls per day to 7.

Now that I have determined the optimal sizing of the OGG Trails, I can compute how much disk capacity is needed to house them. This depends upon requirements related to unscheduled network or target server outage requirements. On my source OGG instance server, we should plan for sufficient disk capacity to hold OGG Trails should the target server or network become inaccessible for a period of time. Likewise, on the target OGG instance we need to make the same computations in the event of a prolonged target database outage. We typically use 4 days for this computation; although we have seen some customers who require 7 days retention, and some that require no retention at all. Using the 4 day figure, minimum disk capacity required for retaining OGG Trails on my source and target servers is 58 GB. This figured was determined by multiplying the number of anticipated OGG Trail rolls per day by the size of each OGG Trail and then multiplying that by the number of days we need to retain OGG Trails; or, ((7 * 2048000000) * 4). The actual result is 57.344 GB, so I rounded up.

Network Capacity

When doing NonStop to NonStop replication, Expand is the preferred method for transferring OGG data to the target server. However, you must take into account any Expand link latency and resource utilization. If latency of the Expand network is 5 seconds, or more, and the requirement for OGG is 5 seconds or less end to end latency; then TCP/IP should be used for OGG connections. Likewise if the Expand network has limited bandwidth available for OGG messaging traffic; use TCP/IP instead.

For TCP/IP communications, OGG defaults to the $ZTC0 process. Verify there is at least 50% of the available bandwidth free on this process and if not assign OGG to another TCP/IP process.

Define Nonstop Process Options

When creating the OGG instance, assign each component a primary and backup CPU, and set the process priority. When assigning CPUs, spread the workload so not to overload any single CPU with too many OGG processes.

For the OGG Manager process, a primary CPU and process priority is assigned when the process is started via GGSCI, like this:

GGSCI (\ZEUS) 58> start mgr, cpu 2, pri 160
Manager process $LPMGR started.

For OGG Manager, a backup CPU is assigned via the BACKUPCPU parameter, like this:

GGSCI (\ZEUS) 73> view param mgrparm
Parameter file contents...
port 18900
backupcpu 3

For Extract, Extract Data Pump, and Replicat processes, CPU and priority assignments are made when the group is added to the OGG instance:

GGSCI (\ZEUS) 148> add extract elpsmp, begin now, cpu 1, backupcpu 2, pri 170
EXTRACT ELPSMP added with following attributes:
Program:  \ZEUS.$DATA09.LPGGS.EXTRACT
Param File:  \ZEUS.$DATA09.LPPARM.ELPSMP
Report File:  \ZEUS.$DATA09.LPRPT.ELPSMP
Process Name:  $LPE01
CPU:   1, 2
Priority:  170

For the Extract reading TMF Audit Trails, assign CPUs for the audit reading processes. This is done via the AUDSERVCPU parameter:

GGSCI (\ZEUS) 151> view param elpsmp
Parameter file contents...
extract elpsmp
audservcpus 2, 3, 0

In the above example, when the Extract starts the reader process for the Master Audit Trail will be started in CPU 2, the reader for AUX01 in CPU 3, and the reader for AUX02 in CPU 0. The default is to start all audit readers in the same CPU as the Extract. Assigning readers to different CPUs parallelizes the workload and improves performance.

Process Priorities

Setting proper process priorities is key to ensuring the OGG instance performs properly.

The Extract reading TMF should run at a priority less than the production application, but also at a priority higher than user TACL processes. This ensures Extract has sufficient resources to read TMF without impacting the production application and without being impacted by user queries and commands. For servers where the application is running at priority 175 and above, the TMF Extract should run at priority 170.

Extract Data Pumps should run at a priority less than the TMF Extract and higher than user TACL processes. For instances where the TMF Extract runs at priority 170, Extract Data Pump is set with a priority of 160.

At a minimum, Replicats should run at the same priority as the TMF Extract. Because of the complex work performed by Replicat in recreating and executing source database operations, this process needs sufficient system resources to perform its work as quickly as possible. For instances where the TMF Extract runs at a priority of 170, the target NSK Replicat process should be set to run at priority 170, or above.

Add OGG Event Text for Viewpoint Monitoring

To add OGG event text to Viewpoint, run the macro EMSINST in the OGG installation location.

$DATA09 LPGGS 6> run emsinst
This macro installs EMS event message detail text into the
EVENTCX file.  This enables operators to view error cause and recovery
text through the Viewpoint event detail screen.

OGG Configuration Settings

We touched on OGG process settings for running in the Nonstop environment above; lets look at some settings for the OGG instance and each Group within the instance.

Manager

There are several parameters that should be set in the OGG Manager related to housekeeping, process spawning, and error reporting:

1) Assign the OGG Manager process a backup CPU.

As we mentioned earlier, define a backup CPU for the OGG Manager process via the BACKUPCPU [n] parameter.

--
-- Set the OGG Manager to run a backup process in CPU 3
--
backupcpu 3

2) Purge consumed OGG Trails that are no longer required for replication.

Manager performs housekeeping functions every 10 minutes. Include the PURGEOLDEXTRACTS parameter to automatically remove consumed OGG Trails from the server.

--
-- Cleanup fully consumed extract trails every 4 days
--
purgeoldextracts $data09.lpdat.*, usecheckpoints, minkeepdays 4

3) Report on latency (lag) that exceeds the published SLA for replication.

Set the LAGREPORT, LAGINFO, and LAGCRITICAL parameters to receive EMS messages should lag be incurred that is not within SLA requirements.

--
-- Check for lag every 15 minutes
--
lagreportminutes 15
--
-- Generate an informational message if lag is 15 minutes or more
--
laginfominutes 15
--
-- Generate a critical error message if lag is 30 minutes or more
--
lagcriticalminutes 30

4) Attempt to automatically restart Extract and Replicat processes in the event of a failure.

--
-- Attempt to autorestart any failing extracts and replicats.
-- Make 10 attempts delaying 5 minutes between starts.
--
autorestart replicat er *, retries 10, waitminutes 5, resetminutes 120

Save-abend (ZZSA) files are created whenever an OGG process abnormally terminates. Having processes restart and abend repeatedly will result in numerous ZZSA files being created in the OGG installation location; which could lead to storage space issues. You should regularly check for, and purge these files.

Change Data Capture Extract

The CDC Extract reads TMF Audit Trails, copies all committed transactional data records for tables and files defined within it's parameter settings, and stores that captured data in an OGG Trail where it will be acted upon by other OGG processes; typically Extract Data Pumps, but in some instances Replicat.

Some things to consider when defining the CDC Extract:

1) The CDC Extract should only read from the TMF Audit Trails and write to a local OGG Trail.

Earlier we mentioned that TMF has no knowledge of processes that required access to it's Trails. We never want to induce a situation where the CDC Extract is down due to an extended network outage; which could result in TMF Audit Trails required for replication are purged from the system. Because of this, the CDC Extract should never be configured to write to OGG Trails over TCP/IP via the RMTHOST and RMTTRAIL parameter options.

The only exception to this rule is for NonStop to NonStop environments using EXPAND Networks. In this case, it is acceptable to use a network filename in the CDC EXTTRAIL setting. However, if the Expand Network is unstable, or if replication data is also being transmitted via TCP/IP; the CDC Extract must write to files on the local system which are then moved to the various targets via Extract Data Pumps.

2) Limit the number of CDC Extracts reading from TMF Audit Trails.

A single, well tuned, CDC Extract can process a lot of data quickly; so unless the server is large and very busy there is rarely a need to have more than one CDC Extract reading TMF Audit Trails.

If the CDC Extract shows latency capturing audit data, it is acceptable to have more than one running in the OGG instance. If you need to run more than one CDC Extract be aware that CDC Extract must read every record in the Audit Trail to determine if it is something that must be acted upon. For instances with two CDC Extracts, every audit record will be read twice.

Each CDC Extract added to the instance adds additional overhead and stress upon the TMF Audit Trails and disks; which in turn could cause a degradation in the overall server; so disk performance and file queue monitoring must be performed to determine the overall impact of adding additional CDC Extracts.

3) FUP RELOAD causes performance degradation.

The FUP RELOAD command physically reorganizes key-sequenced files and SQL objects while allowing shared read/write access to the file or object. A RELOAD operation improves the access time and disk space consumption for highly active files and objects; it also generates massive amounts of TMF Audit data; which could have an adverse affect upon CDC Extract performance.

Once again, CDC Extract must read every record in the Audit Trail to determine if it is something that must be acted upon. So, RELOAD activity must be considered to ensure the CDC Extract does not incur excessive latency. Some general guidelines include: (1) perform reloads only during periods of low OLTP activity, (2) split reload tasks into small chunks to be executed over a period of several days instead of all at once, and (3) do not perform reload tasks when bulk data loads contain operations replicated by OGG.

4) Ensure Audit Trails are available for the CDC Extract.

Ensure Audit Trails are always available for the CDC Extract. The easiest was to do this is to have sufficient disk space and Audit Trails sized so that they are not purged before the CDC Extract has a chance to process them during periods of peak activity. Alternatively, you can configure TMF to make disk dumps, or setup OGG to make copies of Audit Trails needed by the CDC Extract. For this latter option, sufficient disk space must exist for the OGG Manager to duplicate a specified number of trails.

The GGSCI command ADD ATCONFIG with the ALTLOC and DUPFILES option is used to make copies of Audit Trails which are managed by OGG. In the example below, up to six Master Audit Trail files are copied to the alternate disk location; with the oldest audit file being purged to make way to the newest.

GGSCI (\ZEUS) 162> ADD AT MAT DUPFILES 6, ALTLOC $DATA12.ALTTMF

CDC Extract parameters to set in the instance include:

1) Define CPUs and process prefix names for the OGG Audit Server processes.

In the example below, when the Extract starts the reader process for the Master Audit Trail will be started in CPU 2, the reader for AUX01 in CPU 3, and the reader for AUX02 in CPU 0. By setting the process prefix, the reader process for the Master Audit Trail will be named $GGX00, AUX01 $GGX01, and AUX02 $GGX02.

--
-- Set the OGG TMF Audit Server process cpu and process prefix
--
audservcpu 2, 3, 0
audservprefix $ggx

2) Set the CDC Extract to perform 28 KB writes into its local Extract Trail.

--
-- output records in large blocks of up to 28K bytes
--
fastio

3) Write performance information into the process report file as operation counts per second and overall throughput.

This provides data needed when performance tuning as it provides details for determining peak processing times.

--
-- report operation counts per second and throughput
--
reportcount every 10 minutes, rate

4) Generate a detailed report of operations capture per file daily.

This provides data needed when performance tuning as it provides details for identifying the busiest files.

--
-- generate a detailed report daily
--
report at 00:01

5) Create a new report file daily and age the older report files. (Ten report files are maintained on disk.)

--
-- roll the report files daily
--
reportrollover at 00:01

6) For Expand connected networks, generate a heartbeat record to verify the replication stream is intact.

--
-- write a small record to the trail to make sure data is flowing downstream
--
heartbeat 120

7) Extract only the primary key fields or columns for delete operations.

--
-- extract only the primary key fields or columns for deleted operations.
-- By default all columns are captured for deletes. This gives a performance boost.
--
compressdeletes

8) If distributed network transactions are not being replicated, suppress the creation of trail records used to track them.

--
-- suppress the creation of trail records that track
-- distributed network transactions
--
excludeggstransrecs

9) Write lag statistics to the report file.

In the example below the CDC Extract writes the data to the report file every 30 minutes, providing information about the percentage of records captured that were processed in less than 1 minute, less than 15 minutes, less than 30 minutes, and over 30 minutes. Additional information is also reported that is beneficial when troubleshooting CDC Extract performance and latency issues.

--
-- Report lag statistics every 60 minutes
-- Show the percentage of records applied within
-- 1, 15, and 30 minutes
--
lagstats intervalminutes 30, thresholdminutes 1, thresholdminutes 15, thresholdminutes 30, report

Extract Data Pump

Extract Data Pumps must be used whenever replicated data is transmitted over TCP/IP. Parameters to set for each Extract Data Pump include:

1) Set options for the RMTHOST parameter.

Set the COMPRESS option to the RMTHOST parameter to gain efficiencies when transmitting data over TCP/IP. Setting COMPRESS typically results in compression ratios of at least 4:1; which improves Extract Data Pump throughput.

If transmitting data outside of your data center, set the ENCRYPT BLOWFISH option to encrypt the data stream to the remote host.

--
-- Make a connection to the remote server's OGG manager
-- listener port. Compress all outgoing data blocks. Use
-- Blowfish encryption with the key OGGFWOUT from ENCKEYS.
--
rmthost [target server],  mgrport [port], compress, encrypt blowfish, keyname OGGFWOUT

2) Set the Extract Data Pump to perform 28 KB reads.

This increases throughput and lowers process overhead when volumes are high.

--
-- change the number of bytes that Extract reads when
-- processing Oracle GoldenGate trails. Extract reads
-- up to 4096 bytes at a time by default. FASTREADS
-- enables larger reads of up to 28K byte.
--
fastreads

3) Change the output to 28 KB writes to provide performance gains when volumes are high.

--
-- output records in large blocks of up to 28K bytes
--
fastio

4) Do not evaluate the data being passed to the target server, or lookup the file and table definitions.

--
-- Do not lookup table definitions or do any evaluation
-- of the data
--
passthru

5) Write performance information into the process report file as operation counts per second and overall throughput.

This provides data needed when performance tuning as it provides details for determining peak processing times.

--
-- report operation counts per second and throughput
--
reportcount every 10 minutes, rate

6) Generate a detailed report of operations processed per file daily.

This provides data needed when performance tuning as it provides details for identifying the busiest files.

--
-- generate a detailed report daily
--
report at 00:01

7) Create a new report file daily and age the older report files. (Ten report files are maintained on disk.)

--
-- roll the report files daily
--
reportrollover at 00:01

8) Write lag statistics to the report file.

In the example below the Extract Data Pump writes the data to the report file every 30 minutes, providing information about the percentage of records captured that were processed in less than 1 minute, less than 15 minutes, less than 30 minutes, and over 30 minutes. Additional information is also reported that is beneficial when troubleshooting Extract Data Pump performance and latency issues.

--
-- Report lag statistics every 60 minutes
-- Show the percentage of records applied within
-- 1, 15, and 30 minutes
--
lagstats intervalminutes 30, thresholdminutes 1, thresholdminutes 15, thresholdminutes 30, report

Replicat

Replicat best practice settings depend upon the target Oracle GoldenGate instance. For Nonstop to Nonstop replication, Classic Replicat is used. For NonStop to Oracle use either Integrated Replicat or Parallel Apply. For other Oracle GoldenGate target instances, use Coordinated Apply; where supported. We're only going to discuss Replicat settings for Nonstop to Nonstop as there are too many target database variations to cover here. Parameters to set in the Nonstop Replicat include:

1) If the source and target files or tables have the same structure (column names, data types, column widths); set Replicat to always use the target data structure as the reference for building DML.

--
-- Source and target tables have the same structure
--
assumetargetdefs

2) Perform 28 KB reads.

This increases throughput and lowers process overhead when volumes are high.

--
-- change the number of bytes that Replicat reads when
-- processing Oracle GoldenGate trails. Replicat reads
-- up to 4096 bytes at a time by default. FASTREADS
-- enables larger reads of up to 28K byte.
--
fastreads

3) Write performance information into the process report file as operation counts per second and overall throughput.

This provides data needed when performance tuning as it provides details for determining peak processing times.

--
-- report operation counts per second and throughput
--
reportcount every 10 minutes, rate

4) Write lag statistics to the report file.

In the example below the Replicat writes the data to the report file every 30 minutes, providing information about the percentage of records captured that were processed in less than 1 minute, less than 15 minutes, less than 30 minutes, and over 30 minutes. Additional information is also reported that is beneficial when troubleshooting Replicat performance and latency issues.

--
-- Report lag statistics every 60 minutes
-- Show the percentage of records applied within
-- 1, 15, and 30 minutes
--
lagstats intervalminutes 30, thresholdminutes 1, thresholdminutes 15, thresholdminutes 30, report

5) Generate a detailed report of operations applied per file daily.

This provides data needed when performance tuning as it provides details for identifying the busiest files.

--
-- generate a detailed report daily
--
report at 00:01

6) Create a new report file daily and age the older report files. (Ten report files are maintained on disk.)

--
-- roll the report files daily
--
reportrollover at 00:01

7) Use white space in your parameter files when column mapping to make them readable by humans.

For example, the MAP statement below contains a syntax error. In this current layout it is very difficult to identify the error quickly.

MAP \thor.$data09.lpdb.customers, TARGET $data22.lpdb.customers,
colmap (usedefaults
HEADER_CHANGE_SEQ = @STRCAT(@DATE('YYYYMMDDHHMISSFFFFFF','YYYY-MM-DD:HH:MI:SS.FFFFFF',@GETENV('GGHEADER','COMMITTIMESTAMP')),@STRNUM(@GETENV('GGHEADER','LOGPOSITION'),RIGHTZERO,25),@STRNUM(@GETENV('GGHEADER','LOGRBA'),RIGHTZERO,10),@STRNUM(@GETENV('RECORD','FILERBA'),RIGHTZERO,10)
,HEADER_CHANGE_OPER = @CASE(@GETENV('GGHEADER','OPTYPE'),'INSERT','I','DELETE','D',@CASE(@GETENV('GGHEADER', 'BEFOREAFTERINDICATOR'),'BEFORE','B','U'))
,HEADER_COMMIT_TS = @GETENV('GGHEADER','COMMITTIMESTAMP')
,HEADER_UNIQUE_SEQ = @STRCAT(@STRNUM(@GETENV('GGHEADER','LOGPOSITION'),RIGHTZERO,25),@STRNUM(@GETENV('GGHEADER','LOGRBA'),RIGHTZERO,10),@STRNUM(@GETENV('RECORD','FILERBA'),RIGHTZERO,10)));

If the statement had been formatted to make it human readable, we could have quickly determined that the closing parenthesis was missing in the column map statement for HEADER_CHANGE_SEQ.

Target Instantiation

How one performs target database instantiation depends upon several factors; chiefly, (1) are you taking an application outage on the source and (2) is the replication method heterogeneous or homogeneous.

If you are taking an application outage, the Nonstop Servers are part of an Expand Network, and the replication method is homogeneous (like to like), use FUP DUP or SQLCI DUP to create and instantiate the target database. However; In most cases, replication is heterogeneous; either between dissimilar database vendors (HP to Oracle, or HP to HP with different table structures), or the Oracle GoldenGate implementation is being performed zero-downtime (no application outage). In these situations OGG provides the only viable option for instantiating the target database.

Several methods for instantiating target databases using OGG are described in the Administering Oracle GoldenGate for HP Nonstop documentation; however, the steps below always yield consistent results.

1) Get the number of records/rows and total size of each file or table being replicated.

This will allow us to begin making decisions about the number of OGG Extracts and Replicats needed to perform the instantiation quickly. As a general rule of thumb, multiple Extracts should be used to read large non-partitioned tables and files; while partitioned tables and files will have one Extract per partition.

When I am planning a target instantiation, this is what I start with for determining how many Extracts will be needed to process non-partitioned tables and then fine tune the configuration after testing:

a) For small files/tables containing less than 1 million rows or records, group them together into a single Extract.

b) For tables/files containing 1 - 10 millions rows or records, a single Extract for every two files/tables.

c) For files/tables containing 10 -20 million rows or records, a single Extract per table.

d) For files/tables containing more than 20 million rows/records split the table across multiple Extracts; but no more than 4.

The above depends upon how the database is laid out and disk/file queue and read rates. If all of the database files/tables are on the same disk; then we are limited by the number of Extracts we can use to read from the files. Having too many Extracts simultaneously read from the same disk volume will cause excessive queuing and impact overall performance. In this case, for a zero downtime implementation, we will need to stagger the initial load process and run no more than 4 initial load Extracts simultaneously. When testing your initial load configuration, monitor the disk subsystem performance and then adjust your configuration accordingly to minimize the impact of disk and file queuing.

2) Create the target database.

For replication between Nonstop Enscribe and Nonstop SQL/MP, or Nonstop to Oracle, SQL Server, DB2, or Sybase; Oracle GoldenGate provides a utility named DDLGEN. This utility may be used to create a target table definitions for these databases based upon the source Nonstop configuration. The DDLGEN output should be treated as a starting template and modified for the desired target database layout.

3) Create CDC Extract, Extract Data Pump, and Replicat parameter files.

In the Replicats set two special parameters: HANDLECOLLISIONS and END RUNTIME. These parameters are only to be set once as part of the initial load, are used to handle special case data issues that occur as part of the instantiation process, and must never be set in production upon completion of the instantiation process.

4) Create the initial load Extract parameter files.

Use Extract Tasks to perform direct IO reads from the source tables or files. For Nonstop to Nonstop over Expand, have the Extract write to an EXTFILE on the target server. An example is shown below:

-- Initial load extract nsk to nsk
extract isqlmpa
extfile \zeus.$data09.ggiload.ia, megabytes 2000, maxfiles 100, purge
reportcount every 2 minutes, rate
table $data09.lpdb.catdesc;
table $data09.lpdb.CATEGORY;
table $data09.lpdb.CUSTINFO;

For instantiation over TCP/IP have the Extract write to a RMTFILE on the target server.

-- Initial load extract nsk to ora
extract isqlmpa
rmthost myserver.oracle.com, mgrport 7809, compress
rmtfile ./dirdat/ggiload/ia, megabytes 2000, maxfiles 100, purge
reportcount every 2 minutes, rate
table $data09.lpdb.catdesc;
table $data09.lpdb.CATEGORY;
table $data09.lpdb.CUSTINFO;

For non-partitioned files and tables, use the FILTER @RANGE ([n], [n]) option to the FILE/TABLE statement to split large files/tables across multiple Extracts. In the example below, three Extracts will be used to read all of the data from the table $data09.lpdb.orders.

-- Initial load extract nsk to ora
-- Split Orders over 3 Extracts
extract isqlmpb
rmthost myserver.oracle.com, mgrport 7809, compress
rmtfile ./dirdat/ggiload/ib, megabytes 2000, maxfiles 100, purge
sourcedefs ./dirdef/nskdefs
reportcount every 2 minutes, rate
table $data09.lpdb.orders, filter @RANGE (1,3);

For partitioned files or tables, use the PARTITIONS option of the FILE/TABLE statement to read from each individual partition. In the example below the initial load Extract will only read from partition 3 of the $data09.lpdb.ordttl table.

-- Initial load extract nsk to nsk
extract isqlmpd
extfile \zeus.$data09.ggiload.id, megabytes 2000, maxfiles 100, purge
reportcount every 2 minutes, rate
table $data09.lpdb.ordttl, partitions (3);

5) Create the initial load Replicat parameter files.

Create parameter files for Replicats to process the OGG Trails created by the initial load Extracts. For Nonstop, these will be Classic Replicats, for Oracle Database; Integrated or Parallel Replicats, and for databases where supported Coordinated Apply, or Classic Replicats.

If you are using Classic Replicat for instantiation of medium to large tables, use the FILTER @RANGE ([n], [n]) option to the MAP statement to split the data load process across multiple Classic Replicat processes. On HP Nonstop servers, be aware of disk queuing issues, so do not have more than four Replicats reading from the same OGG Trail without first analyzing the disk subsystem performance.

If you are using Coordinated Apply for instantiation of medium and large tables, use the THREADRANGE option of the MAP statement to split the apply workload across multiple Replicat processes.

Integrated and Parallel Replicat have built-in efficiencies, so one Replicat per OGG Trail should be sufficient.

6) Create the GGSCI obey file for adding source and target GoldenGate instance components.

GGSCI provides support for reading and executing "obey" files. These files contain GGSCI commands used to add Extracts, Replicats, and OGG Trails to each instance. A sample obey file for creating my source instance would look like this:

-- start the OGG manager
start mgr
--
-- Add the CDC Extract
add extract esqlmp, begin now, cpu 2, backupcpu 3, pri 170
-- Link the OGG Trail to the Extract
add exttrail table $data09.ggsdat.mq, extract esqlmp, megabytes 2000
--
-- Add the Extract Data Pumps
--
-- NonStop to NonStop over Expand
add extract psqlmp, exttrailsource $data09.ggsdat.mq, cpu 3, backupcpu 1, pri 160
add exttrail \THOR.$prog00.ggsdat.ns, extract psqlmp, megabytes 2000
--
-- NonStop to Oracle
--
add extract pora, exttrailsource $data09.ggsdat.mq, cpu 0, backupcpu 1, pri 160
add rmttrail ./dirdat/or, extract pora, megabytes 2000
--
-- Add initial load extracts to nsk
add extract isqlmp, sourceistable, cpu 3, pri 160
add extract isqlmpa, sourceistable, cpu 2, pri 160
add extract isqlmpb, sourceistable, cpu 1, pri 160
add extract isqlmpc, sourceistable, cpu 0, pri 160
add extract isqlmpd, sourceistable, cpu 3, pri 160
add extract isqlmpe, sourceistable, cpu 2, pri 160
add extract isqlmpf, sourceistable, cpu 1, pri 160
--
--
-- Add initial load extracts to oracle
add extract imcat, sourceistable, cpu 0, pri 160
add extract imcust, sourceistable, cpu 1, pri 160
add extract imord1, sourceistable, cpu 2, pri 160
add extract imord2, sourceistable, cpu 3, pri 160
add extract imord3, sourceistable, cpu 0, pri 160
--
-- Check everything is setup
status all
status *, task

To run the file, start GGSCI and execute the command:

GGSCI (\ZEUS) 236> obey $data09.lpparm.cr8inst

7) Run DEFGEN.

Replication to databases other than NonStop, or NonStop databases with a structure different from the source, require the use of "defines files" for OGG to properly convert the source NonStop data to the appropriate target database format. The Oracle GoldenGate DEFGEN utility is used to create these source defines.

After running DEFGEN, be sure to put a copy of the output file on each target OGG instance.

8) Test the obey files and configuration.

Always test your work before running in production. Once testing is complete, we are ready for production implementation.

Production Implementation

Preparation and testing will make production implementation smoother. To instantiate and bring the OGG instance live in production:

1) Run DEFGEN and copy the output to each target OGG instance.

2) Run the GGSCI obey files to create the source and target OGG instances.

3) Start the CDC Extract and Extract Data Pumps.

This begins the capture and queuing of OLTP operations from the TMF Audit Trails.

4) Start the initial load Extract processes.

5) Check the status of the source OGG instance.

The CDC Extract, Extract Data Pump, and all initial load Extracts must be in the RUNNING state before proceeding. If not, review the component's report file, correct the error condition, and restart the Extract.

6) Start the initial load Replicats.

Do NOT start the Classic, Coordinated, Integrated, or Parallel Replicats.

7) Check the status of the initial load Replicats.

They must all be in the RUNNING state. If not, review the associated report file, correct the error condition, an restart the Replicat.

8) Wait for the initial load Extracts to go into the STOPPED state.

This means that all data has been read from the source file/table. Review the report file to verify the number of rows/records read from the table and that no error conditions occurred.

If errors are reported, or the Extract state shows ABENDED, the initial load for the associated file/table must be started over; from the beginning. Stop all initial load Replicats and initial load Extracts associated with this table, delete all data from the target file/table, delete the OGG trails in the target OGG instance associated with the initial load Extracts, reposition the initial load Replicats to trail sequence 0, rba 0 via the GGSCI ALTER REPLICAT command, restart the initial load Extracts, and then restart the initial load Replicats.

9) Wait for the initial load Replicats to process all queued data.

Once the initial load Extracts have stopped, we need to wait for the initial load Replicats to finish applying all of the queued data. The initial load Replicats will not stop on their own, so we need to check their read position in their associated OGG Trail.

Get the size of the last OGG Trail written by the initial load Extract from the file system. For NonStop servers, the FILEINFO command will return the EOF of the Trail, for example:

$DATA09 LPDIRDAT 14> fileinfo mp000003
$DATA09.LPDIRDAT
CODE              EOF   LAST MODIFIED  OWNER  RWEP   PExt   SExt
MP000003  O   1161            10733 29SEP2017  7:44 100,177 NNNN   2450   2450

On Unix or Linux, the ls -l command returns the same information:

$ /home/oracle/ogg_12.2_12c/dirdat/iload\> ls -l ia000000
-rw-r-----. 1 oracle oinstall 10733 May 25 12:10 ia000000

Run the GGSCI command INFO [REPLICAT NAME] to obtain details about the Replicats read position in the OGG Trail.

GGSCI (\THOR) 243> info rlpmp
Replicat RLPMP        Last Started 2017-10-02 11:44  Status RUNNING
Process  $LPR00       Checkpoint Lag: 0K, 00:00:00 (updated 00:00:27 ago)
Log Read Checkpoint   \THOR.$DATA09.LPDIRDAT.MP000003
2017-09-29 07:44:36, RBA 10733

Look for the text Log Read Checkpoint, the OGG Trail being processed by the initial load Replicat is located on this line. On the following line, look for the text RBA; this is the relative byte address of the last data the initial load Replicat read from the Trail. If the RBA position matches the file size information returned by the operating system, then the intial load Replicat has completed processing all queued and can be shutdown via the GGSCI command STOP [replicat name].

10) Start Change Data Apply (the Classic, Coordinated, Integrated, or Parallel Replicats).

11) Check the status of the Replicats.

They must all be in the RUNNING state. If not, review the associated report file, correct the error condition, an restart the Replicat.

12) Wait for the Replicats to go into the STOPPED state.

This means the Replicat has processed all data captured and queued by the Extract reading from NonStop TMF from when we started it in Step #3.

13) Remove the parameters HANDLECOLLISIONS and END RUNTIME from the Replicat parameter file.

14) Restart the Replicat.

Once all Replicats have been modified and restarted, target instantiation is complete and OGG will be running in production mode.

Dealing With Latency and Tuning OGG

When working with OGG, we must first understand that latency (lag) is not necessarily a bad thing requiring emergency correction. OGG lag can be the result of several things, such as (1) file/table reload activity, (2) batch loads, (3) database and server capacity issues, or (4) high OLTP transaction volumes. Before jumping into a major redesign of your OGG instance there are some things you must first consider and evaluate:

1) What is the SLA requirement for data replication for normal, daily OLTP processing rates, peak OLTP processing rates, batch load activities, and scheduled database maintenance activities?

If there are no published SLA requirements for replication, then you do not have an OGG performance issue. It is impossible to tune an environment if there are no clearly defined performance parameters.

Likewise, SLA requirements must be realistic. For instance, A-Team was called in to assess a performance issue in a high volume environment where there were clearly defined performance requirements. However, the performance target of 300 milliseconds for source to target replication was unrealistic given that it took 1 second for the CDC Extract to write the transaction to its local Trail; due to SAN latency and the Extract Data Pump 1.5 seconds to retrieve the record from the SAN and transmit it over TCP/IP due to SAN and network latencies. So, OGG performance was hindered by the operational environment (not that 300 milliseconds capture to target apply is realistic in any environment).

2) What caused the lag?

Identify the root cause of the lag. Was it the result of a 1 billion row batch operation performed against the source database as a single transaction? Was there a network issue that slowed data transfer rates between the source and target OGG instances? Was the target database or server down for scheduled or unscheduled maintenance? What is the health of the disk subsystem? What about the server, are the CPUs 90%, or more, busy; are they out of memory?

Most lag issues are environmental in that they are the result of activities performed against the database, or resource constraints of the server and database environments. Environmental tuning issues, including batch load operations, must be addressed before modifying the OGG instance.

3) Does the lag clear?

A 6 hour lag being reported against an OGG component that clears in 30 minutes is not an issue; especially during periods of database maintenance or batch processing. Having realistic, and clearly defined SLA requirements for replication will minimize the angst of dealing with lag.

4) Is the lag repeatable?

Over time OLTP rates in the database are expected to grow as your business grows. If OGG is reporting lag during your daily normal, or peak, OLTP processing periods that is close to exceeding the SLA requirements; its time to tune.

Tuning OGG

Before beginning the tuning process, we need to determine where the lag originates. All lag reporting in OGG is based upon the source transaction commit timestamp, and is cumulative. So lag in the Replicat is a combination of (1) the time it took to read from the OGG Trail, (2) network latency, (3) lag in the Extract Data Pump, and (4) lag in the CDC Extract. Likewise lag in the Extract Data Pump is a combination of (1) the time it took to read from the OGG Trail and (2) lag in the CDC Extract; while lag in the CDC Extract is a combination of (1) database latency, (2) the time it took to read from the TMF Audit Trail, and (3) the time it took to write the transaction into the OGG Trail.

So, if lag is reported in the Replicat; but there is no lag in the Extract Data Pump or CDC Extract; we can concentrate on tuning that component after verifying that network or disk latency are not the root cause. Once we are satisfied that the root cause of the lag is an overloaded OGG component, we can begin the tuning process.

OGG performance analysis should be an active part of your capacity planning process. As such, you should be using information gleaned from each Extract and Replicat report file to track and analyze throughput. In the section above where we discussed OGG parameter settings, we included settings the REPORTCOUNT, LAGSTATS, REPORT, and REPORTROLLOVER options to generate statistics than can be used for this analysis.

REPORTCOUNT will provide operations processed by the Extract or Replicat per second and the number of operations processed since the last entry. This will allow you to determine peak processing times.

LAGSTATS will provide the percentage of operations processed within the threshold settings. This will allow you to track lag before it becomes a problem.

REPORT and REPORTROLLOVER provide total operations, by operation type, per table on a daily basis. This will allow you to identify your busiest files/tables and track operational growth per file/table.

If you are dealing with lag in a single Extract or Replicat, we can use this information to identify the busiest tables or files in the component and then come up with a solution for rebalancing the workload; either by using the FILTER @RANGE ([n], [n]) option to the MAP and TABLE statement to split the workload across multiple Extracts or Replicats, by moving the file/table to a different Extract or Replicat that has excess capacity, or by creating additional Extracts or Replicats to handle the increased workload.

For capacity planning you want to take the information available in the report files and add them to your capacity planning worksheets to track workload growth over time. By doing this you will be able to identify the top 20% files or tables that use the most OGG resource and then can come up with a plan to load balance the instance for optimum performance.

Database Maintenance Activities

Database maintenance activities affect the performance and stability of Oracle GoldenGate. As we've already mentioned FUP RELOAD causes a lot of information to flood the TMF Audit Trails; which in turn affects the performance of the CDC Extract; however, normal maintenance tasks such as adding or altering columns/fields, renaming files and tables, and adding files or tables all have impact upon the replication environment.

The general procedures for scheduled database maintenance activities for environments replicated by OGG are:

1) Stop application activity.

The application cannot be performing OLTP operations captured by OGG to the files/tables being modified.

2) Wait for OGG to capture all data from TMF on the source and apply the data to the target.

3) Shutdown the CDC Extract and Extract Data Pumps on the source.

4) Shutdown the Replicats associated with the files/tables being modified on the target.

5) Make the modifications to the source and target databases.

6) If applicable, run the Oracle GoldenGate DEFGEN utility and copy the output to the target OGG instance.

7) Restart the CDC Extract and Extract Data Pumps on the source.

8) Restart the Replicats on the target.

9) Check the status of the source and target OGG instances to ensure all processes are running.

10) Start application activity.

Summary

In this article we presented best practices for implementing and running Oracle GoldenGate on HP NonStop servers for replication of TMF Audited application data.

For more information on what other articles are available for Oracle GoldenGate please view our index page.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Recent Content