BPM 11g Process Instances – Faults, Rollback & Recovery – Part 2

Introduction

This is part 2 of a 4 part blog explaining how the BPM engine functions under the covers when “faults” occur, be they unhandled technical faults or failures at the engine level.

Part 1 can be found here.

Part 2: Understanding BPM Messages, Threads & Transactions

Given SOA Suite & BPM’s ability to control timeouts and to handle faults, why do we need to understand any further inside the BPM engine ?

Well, there are always going to be exceptional circumstances, such as runtime errors in the engine itself (e.g. NullPointerExceptions) caused by internal or external events (Out of Memory, stuck threads etc.) and alongside this is the likelihood that however thorough the testing carried out there will be some unforeseen scenarios that have not been handled appropriately in the design. In both of these circumstances the affected instances will need to be recovered somehow, and we’ll now look at how the BPM engine handles threads and how failed instances can be recovered.

As an introduction to this topic I would advise the excellent presentation by David Read (BPM PM) found here.

Local Optimization

It is important to understand the concept of “local optimization” in SOA Suite… essentially this will be enabled by default and means that any service calls which remain within the same weblogic cluster will be optimized to be routed to the same weblogic managed server instance, i.e. there will be no routing out via the load balancer and no HTTP/SOAP, all calls will be optimized to be java calls and therefore use the same thread as the client.

BPM Process Patterns

In order to better demonstrate messages & threading inside the BPM engine we will use the following three common process interaction patterns….

Pattern 1 – Async – Async

This is a very common pattern within BPM, an asynchronous process calls an asynchronous process via a send/receive activity….

BPMR_07

Pattern 2 – Async – Sync

Another common pattern, normally an asynchronous BPM process calling a synchronous SOA service such as a mediator composite in this case….

BPMR_08

Pattern 3 – Async with Acknowledgement – Sync

The final common pattern, again an asynchronous client process calling a synchronous SOA service, but this time the client sends an acknowledgement back to its caller immediately after being invoked in order to notify it that it will continue to process asynchronously…. note that in order to effectively respond with the acknowledgement the process is generally designed with a timer activity to force dehydration and therefore a commit point, otherwise the acknowledgement is not sent back to the client until a further dehydration occurs….

BPMR_09

BPM Process Patterns – Messages & Threads

Now we have seen the patterns we can see how the messages & threads are handled for these within the BPM engine. First let us understand the basics of BPM messages and threads….

BPM Messages

There are essentially two kinds of messages within the BPM engine, “invoke messages” and “callback messages”.

Invoke messages – are what drive a process, an invoke message instantiates an invoker thread which then handles the process instance until a dehydration point, i.e. a “Wait” activity, a “Timer” activity or a “Receive” activity. The invoker thread can be an engine thread or in the case of a synchronous call from a client, the client thread itself.

Callback messages – are messages that asynchronously arrive back in the BPM engine and must be correlated to a running instance, examples are callbacks from the workflow service when a human task has been acted upon, or callbacks from an asynchronous service which has completed.

DLV_MESSAGE, DLV_SUBSCRIPTION & WORK_ITEM Tables

Both of these messages are stored in the SOAINFRA table DLV_MESSAGE which acts as a message tracker for BPM processes and will be the starting point for any recovery scenario we cover later. In the case of “callback” messages a row is also written to the DLV_SUBSCRIPTION table and used to correlate the incoming callback message to the process instance. Note that the actual payload of the message is not stored here, it is stored in XML_DOCUMENT table.

BPMR_10

The important columns here are “CONV_ID” which is a unique identifier for the message, “CONV_TYPE” which identifies whether this message is an invoke or callback message and “STATE” which identifies which identifies whether the message itself has been handled etc…

Also of importance is the WORK_ITEM table which contains information & state of certain BPMN activities, we are interested in this because of timer activities.

BPMR_11

The states we are interested in on these tables are as follows….

BPMR_12

BPMR_13

BPMR_14

Pattern 1 – Async – Async

Let’s look at how this knowledge of messages can be applied to the patterns we’ve seen starting with the standard async-async….

BPMR_15

TX1 – Transaction 1 from the client inserts a message into DLV_MESSAGE of type “INVOKE” and state “UNDELIVERED”

TX2 – Transaction 2 updates the message to state “DELIVERED” and continues until the dehydration point at the “receive” activity, via inserting another message into DLV_MESSAGE of type “INVOKE” for the async service process.

TX3 – Transaction 3 inserts a new message into WORK_ITEM of type “RECEIVE” and state “PENDING” and a message into DLV_SUBSCRIPTION with a state of state “UNRESOLVED”

TX4 – Transaction 4 updates the DLV_MESSAGE to “DELIVERED” and inserts a WORK_ITEM for the timer of state “PENDING”

TX5 – Transaction 5 updates the WORK_ITEM message with state “CLOSED” and continues till the end of the called process.

TX6 – Transaction 6 updates the WORK_ITEM message for the main process to state “CLOSED”, sets the state of the DLV_SUBSCRIPTION table to “HANDLED” and continues till the end of the main process.

We can test this out to see what happens in the relevant tables by giving the “gotoSleep” activity a large value such as 4 minutes and running a test.

During Testing

This is what we see during the test while “gotoSleep” is active….

DLV_MESSAGE

BPMR_16

…i.e. an INVOKE message for each of the “start” activities in the two processes in state “STATE_HANDLED”

WORK_ITEM

BPMR_17

…i.e. a WORK_ITEM for the “Receive” activity in the client process of state “open_pending_complete” and a similar WORK_ITEM for the “gotoSleep” activity with a state “open_pending_complete”

DLV_SUBSCRIPTION

BPMR_18

…i.e. a subscription of state UNRESOLVED for the called service.

After Test Completion

This is what we see once the test has successfully completed….

DLV_MESSAGE

BPMR_19

…i.e. an extra entry for the “end” activity  of type “DLV_MESSAGE” and state “STATE_HANDLED”

WORK_ITEM

BPMR_20

…i.e. both WORK_ITEM entries are now in state “CLOSED_FINALIZED”

DLV_SUBSCRIPTION

BPMR_21

…i.e. the subscription is now in state HANDLED

Pattern 2 – Async – Sync

BPMR_22

TX1 – Transaction 1 from the client inserts a message into DLV_MESSAGE of type “INVOKE” and state “UNDELIVERED”

TX2 – Transaction 2 updates the message to state “DELIVERED” and continues until the end of the process.

Note how “local optimization” affects the threading here… the same java transaction (TX2) is used for everything from the “Start” in the BPM process, through the mediator and DB adapter and back to the “End” activity in the BPM process.

We can test this out to see what happens in the relevant tables by setting the DBSleep stored procedure to sleep for an appropriate amount of time and running a test.

During Testing

This is what we see during the test while the DBSleep stored procedure is sleeping….

DLV_MESSAGE

BPMR_23

…i.e. an INVOKE message for the “start” activity in the process in state “STATE_UNRESOLVED”

Nothing in WORK_ITEM or DLV_SUBSCRIPTION as expected.

After Test Completion

This is what we see once the test has successfully completed….

DLV_MESSAGE

BPMR_24

…i.e. the INVOKE message now has state “STATE_HANDLED”

Pattern 3 – Async with Acknowledgement – Sync

BPMR_25

TX1 – Transaction 1 from the client inserts a message into DLV_MESSAGE of type “INVOKE” and state “UNDELIVERED”

TX2 – Transaction 2 updates the message to state “DELIVERED” and continues until the dehydration point at the “timer” activity.

TX3 – Transaction 3 inserts a new message into WORK_ITEM with state “3 – OPEN_PENDING_COMPLETE”

TX4 – Transaction 4 updates the “WORK_ITEM” row with state “6 – CLOSED_FINALIZED” and continues till the end of the process.

Note how “local optimization” affects the threading here… the same java transaction (TX4) is used for everything from the “CatchEvent” in the BPM process, through the mediator and DB adapter and back to the “End” activity in the BPM process.

We can test this out to see what happens in the relevant tables by setting the DBSleep stored procedure to sleep for an appropriate amount of time and running a test.

During Testing

This is what we see during the test while the DBSleep stored procedure is sleeping….

DLV_MESSAGE

BPMR_26

…i.e. an INVOKE message for the “start” activity in state “STATE_HANDLED”

WORK_ITEM

BPMR_27

…i.e. a WORK_ITEM for the “CatchEvent” timer activity in the client process of state “open_pending_complete

After Test Completion

This is what we see once the test has successfully completed….

DLV_MESSAGE

BPMR_28

…i.e. no difference.

WORK_ITEM

BPMR_29

…i.e. WORK_ITEM enty is now in state “CLOSED_FINALIZED”

Summary

In the second part in the series we have looked at some typical process patterns, the important tables in SOAINFRA and what data is added to these tables as a process instance starts and moves to completion. In the next part we will look at what happens when an uncaught exception occurs and the instances roll back.

Add Your Comment