B2B Event Queue Management for Emergency

Executive Overview

Many customers face a crisis in production system when, for some reason, they end up with several B2B messages stacked up in the system, that may not be of a high priority to be processed at that point in time. In other words, it would greatly help many customers if, in such critical situations, they had an option to flush the backed-up messages from the system for later resolution and simply continue with processing of the current messages.
A-Team has been involved with different customers worldwide helping them implement such a solution for emergency use. Without getting into too much technical details, a high-level approach for such a solution is discussed here. The methodology accomplishes two key tasks, that are of primary importance during an emergency crisis within a B2B production gateway:

  • Allows to flush the event queue while the gateway is down, so that the gateway can be brought up quickly
  • Introspect the messages created from the event queue for resubmission or rejection

The primary objective of this framework is to allow the B2B engine to come back up quickly after flushing the messages from the event queue. The recovery or resubmission of messages is usually reviewed manually by the operations and business teams off-line and takes a longer cycle to complete. But this should not affect the down-time of the production system after the fast removal of the messages from the event queue. The downtime, thus encountered, is only driven by the first task, as listed above.

Solution Approach

Overview

The solution consists of immediate cleanup of messages from the system. The entries will be stored in files. After the files are created, the gateway will be ready for normal processing without any impact of messages that were previously present in the system.
After the gateway is opened for normal business, the analysis of the file contents can be carried out, in parallel, to decide which messages will be resubmitted or discarded. This analysis can be done via scripts to extract relevant pieces of business data for the messages removed. The scripts are decoupled for various types of transient message data and built on basic query utilities. The basic building blocks for data introspection are typically custom scripts, that are created based on specific business needs for analysis.
The analysis will create 2 lists of message IDs – one for resubmission and the other for rejection. Existing command-line utilities can be invoked to resubmit the messages in a scripted loop with configurable delays in between the resubmissions. For rejection, there is typically no processing required. However, the list of IDs will be used to update the database to reflect a final state for the appropriate messages.

Tasks and Activities

The following sections describes the tasks in greater detail. Sections I and II cover the activities that need to be completed while the gateway is down. Sections III and IV include the post-mortem phase for analysis of messages removed from the system.
The flowchart below can be used as a reference for the critical cleanup tasks covered in Sections I and II.

eventq

I. Preparation of Environment

If the gateway is down, it is important to bring it up in a maintenance mode, so that the cleanup of transient messages in the system can be completed. Otherwise, if the gateway is running, it has to be restarted for enabling maintenance mode. This can be achieved with the following sequence:

  • If the SOA/B2B environment is not up and running, start the Admin Server. Otherwise, this step can be skipped.
  • Pause the consumption of messages coming in to the B2B engine via external and internal listening channels.
  • Change the startup mode of SOA managed server to ADMIN mode.
  • Change the startup mode of SOAJMSServer to pause at server startup.
  • For a running environment, stop SOA managed servers and restart Admin Server. Otherwise, this step can be skipped.
  • Start SOA Managed Servers.

II. Cleanup of Transient Messages

There are four areas that require attention when there is a gateway outage and the whole B2B cluster is down. The four areas are:

  • B2B Event Queue – Weblogic JMS Queue, B2B_EVENT_QUEUE
  • SOA Quartz Scheduler – SOA Repository Database Table, SOAQTZ_JOB_DETAILS
  • B2B Sequence Manager – SOA Repository Database Table, B2B_SEQUENCE_MANAGER
  • B2B Pending Batch Jobs – SOA Repository Database Table, B2B_PENDING_MESSAGE

These four areas require attention since they contain information about in-flight messages that have not been processed to their final states. Based on the specific environment, the cleanup could be a maximum of four-step process, where only the first step is mandatory.

  • The B2B Event queue contents will be exported to a file for later analysis and the queue contents will be purged thereafter.
  • The SOA Quartz Scheduler tables key contents will be exported to a file for later analysis and purged (optional – only applicable to message retries).
  • The B2B Sequence Manager table key contents will be exported to a file for later analysis and purged (optional – only applicable to scheduled Partner downtime).
  • The B2B Pending Batch table key contents can be exported to a file for later analysis and purged (optional – only applicable to batching use cases)

After the above-mentioned 4 steps are completed, the B2B gateway can be started in normal processing mode. One of the key metrics for the solution, will be to determine how soon can these 4 steps be completed, so that the gateway can be brought up for ongoing business. Only step 1 above requires the preparation described in Section I (Preparation of Environment).
Steps 2, 3, and 4 can be performed only with the database up (i.e. Admin and Managed server are both down)

III. Message Data Analysis

After the gateway is up and running, the analysis of all the entries backed up can be carried out for further resubmission or rejection. The main objective of the analysis phase is to gather sufficient business data for each message ID to help operational analysis. The analysis for the backed up messages will be addressed based on the source.
The flowchart below can be used as a reference for the message data analysis tasks covered in Sections III and IV.

eventq2

A. B2B Event Queue, JMS Queue – Mandatory
  • Shell script based utilities can be used to read message IDs from the JMS export file, generated in Section II.
  • Entries existing in b2b_instancemessage view: Message IDs can be joined with the view to get desired information about messages for business analysis (for the most part, new incoming or outgoing messages referenced by the B2B Event Queue would not be available in the b2b_instancemessage view)
  • Entries not existing in the b2b_instancemessage view: All such message IDs can be scanned to save the payload into a file, that can be processed by a customized shell script to extract any field for further analysis.
  • Other system level entries (optional): Can be put back in the event queue via JMS Import utility in Weblogic console.
B. SOA Quartz Scheduler, SOA Repository Table – Optional
  • Message IDs from SOAQTZ_JOB_DETAILS table can be joined with b2b_instancemessage view for data analysis via custom script utilities.
C. B2B Sequence Manager, SOA Repository Table – Optional
  • Message IDs from B2B_SEQUENCE_MANAGER table can be joined with b2b_instancemessage view like shown in Section B above.
D. B2B Pending Batch Messages, SOA Repository Table – Optional
  • Message IDs from B2B_PENDING_MESSAGE table can be joined with b2b_instancemessage view like shown in Section B above.

IV. Message Resubmission/Rejection

At the end of the analysis phase, the list of Message IDs for resubmission and rejection will be available. The resubmission list can then be read by custom shell scripts to process individual messages via existing command-line utility, driven by parameters to control pause interval and looping criterion.
In general, no further action should be required for rejected messages. In certain exceptional situations, a database script can be run to change the state of such messages to a final state.

Summary

The above approach has been successfully implemented and used in production systems by customers for many years and is a well-proven technique. The entire package has been delivered as a consulting solution and the customer is responsible for all the scripts and artifacts developed. However, as newer versions of B2B are released, there could be other alternate options available as well. For further details, please contact the B2B Product Management team or SOA/B2B group within A-Team.

Acknowledgements

B2B Product Management and Engineering teams have been actively involved in the development of this solution for many months. It would not have been possible to deliver such a solution to the customers without their valuable contribution.

Comments

  1. Venugopal Puli says:

    Any pointers from Oracle Fusion Applications Rel8 Stack context ?
    How do i proactively monitor if this is happening, on a daily basis,
    wanted some pointers from a fusion application stack context and sql that we can use to monitor on a daily basis before it worsens.

    thanks.

    • Shub Lahiri says:

      If B2B engine is the primary underlying SOA component that needs to be monitored, then the process should still be the same.

      For proactive monitoring, the Weblogic JMS queue, B2B_EVENT_QUEUE can be monitored so that it does not grow beyond a certain threshold. There are many ways to handle such alerts from Weblogic platform perspective.

      Hope this helps.

      Thanks ..
      -Shub

Add Your Comment