Many customers face a crisis in production system when, for some reason, they end up with several B2B messages stacked up in the system, that may not be of a high priority to be processed at that point in time. In other words, it would greatly help many customers if, in such critical situations, they had an option to flush the backed-up messages from the system for later resolution and simply continue with processing of the current messages.
A-Team has been involved with different customers worldwide helping them implement such a solution for emergency use. Without getting into too much technical details, a high-level approach for such a solution is discussed here. The methodology accomplishes two key tasks, that are of primary importance during an emergency crisis within a B2B production gateway:
The primary objective of this framework is to allow the B2B engine to come back up quickly after flushing the messages from the event queue. The recovery or resubmission of messages is usually reviewed manually by the operations and business teams off-line and takes a longer cycle to complete. But this should not affect the down-time of the production system after the fast removal of the messages from the event queue. The downtime, thus encountered, is only driven by the first task, as listed above.
The solution consists of immediate cleanup of messages from the system. The entries will be stored in files. After the files are created, the gateway will be ready for normal processing without any impact of messages that were previously present in the system.
After the gateway is opened for normal business, the analysis of the file contents can be carried out, in parallel, to decide which messages will be resubmitted or discarded. This analysis can be done via scripts to extract relevant pieces of business data for the messages removed. The scripts are decoupled for various types of transient message data and built on basic query utilities. The basic building blocks for data introspection are typically custom scripts, that are created based on specific business needs for analysis.
The analysis will create 2 lists of message IDs - one for resubmission and the other for rejection. Existing command-line utilities can be invoked to resubmit the messages in a scripted loop with configurable delays in between the resubmissions. For rejection, there is typically no processing required. However, the list of IDs will be used to update the database to reflect a final state for the appropriate messages.
The following sections describes the tasks in greater detail. Sections I and II cover the activities that need to be completed while the gateway is down. Sections III and IV include the post-mortem phase for analysis of messages removed from the system.
The flowchart below can be used as a reference for the critical cleanup tasks covered in Sections I and II.
If the gateway is down, it is important to bring it up in a maintenance mode, so that the cleanup of transient messages in the system can be completed. Otherwise, if the gateway is running, it has to be restarted for enabling maintenance mode. This can be achieved with the following sequence:
There are four areas that require attention when there is a gateway outage and the whole B2B cluster is down. The four areas are:
These four areas require attention since they contain information about in-flight messages that have not been processed to their final states. Based on the specific environment, the cleanup could be a maximum of four-step process, where only the first step is mandatory.
After the above-mentioned 4 steps are completed, the B2B gateway can be started in normal processing mode. One of the key metrics for the solution, will be to determine how soon can these 4 steps be completed, so that the gateway can be brought up for ongoing business. Only step 1 above requires the preparation described in Section I (Preparation of Environment).
Steps 2, 3, and 4 can be performed only with the database up (i.e. Admin and Managed server are both down)
After the gateway is up and running, the analysis of all the entries backed up can be carried out for further resubmission or rejection. The main objective of the analysis phase is to gather sufficient business data for each message ID to help operational analysis. The analysis for the backed up messages will be addressed based on the source.
The flowchart below can be used as a reference for the message data analysis tasks covered in Sections III and IV.
At the end of the analysis phase, the list of Message IDs for resubmission and rejection will be available. The resubmission list can then be read by custom shell scripts to process individual messages via existing command-line utility, driven by parameters to control pause interval and looping criterion.
In general, no further action should be required for rejected messages. In certain exceptional situations, a database script can be run to change the state of such messages to a final state.
The above approach has been successfully implemented and used in production systems by customers for many years and is a well-proven technique. The entire package has been delivered as a consulting solution and the customer is responsible for all the scripts and artifacts developed. However, as newer versions of B2B are released, there could be other alternate options available as well. For further details, please contact the B2B Product Management team or SOA/B2B group within A-Team.
B2B Product Management and Engineering teams have been actively involved in the development of this solution for many months. It would not have been possible to deliver such a solution to the customers without their valuable contribution.