Oracle GoldenGate Best Practices: Using the GoldenGate Logdump Utility to Manually Load Balance across Multiple Processes

Introduction

This technical document describes how to combine the GoldenGate logdump utility, UNIX shell scripting and MS Excel to generate a detailed summary report to aid in manual load balancing of tables across multiple GoldenGate replication processes. This document is not intended to be a comprehensive overview of all replication performance techniques; it is limited to the usage of logdump to collect table level statistics, how to import those statistics into a spreadsheet and then make use of them.

Main Article

Replicating a high volume of change data in a single data stream (one extract, one pump, one replicat, for example) often does not provide enough throughput to maintain latency requirements for real-time data feeds between source and target systems. Understanding how to split the data into evenly distributed data streams is fundamental to increasing throughput.

After a reasonable sample of change data has been captured and written to a trail, the trail data will be used as the primary input to methodically spread the tables across multiple replication processes for increased throughput. This is done by extracting change data over a representative period of time that might either reflect a specific load test or typical production transaction mix.
Once the trail(s) exist a feature of the logdump utility is then used to log table details (bytes, inserts, updates, deletes, etc.) to a loosely structured text file. Before this data is used in a spreadsheet the log dump output will first be reformatted into horizontal records with one row per table and values separated by commas. Reformatting is done using standard UNIX commands strung together at the command line or in shell scripts.

Finally, the formatted data is imported into MS Excel (or any spreadsheet application) and percentages are calculated based on change data bytes. The resulting spreadsheet will then provide the insight needed to evenly distribute the data load across multiple processes. The spreadsheet may also double as a summary report to customers.

Undoubtedly some tables will account for a greater percentage of the change data than other tables. In instances when, for example, one table accounts for 90% or all the change data bytes then the load balancing method described here will highlight this fact but balancing the load across multiple process will typically require the use of additional methods such as the of the “range” function in table or map statements. Ranges are not covered in this document.

This article was written by Joe deBuzna, Product Manger for Oracle GoldenGate.

The complete document can be found on the Oracle Support site under the document ID:1301300.1

Add Your Comment