BPM 11g: XML_DOCUMENT Table Growth

Introduction

I’ve heard from several customers lately who have asked about unexpected growth in XML_DOCUMENT table compared to other BPM tables. This blog looks into the reasons for this growth and some suggestions on how to mitigate it.

Test Project

In order to demonstrate XML_DOCUMENT table growth we’ll use the following process….

XD_01

… this has a simple embedded sub-process and a multi-instance embedded sub-process so we can monitor whether these have any impact. Both have timer activities inside to give us a window to run some queries against the SOAINFRA tables.

Audit Settings

For our first tests the audit level will be set to “Development”…

XD_02

Queries

We will be running the following queries against the SOAINFRA tables….

CUBE_INSTANCE

Will allow us to query the compressed and uncompressed payload size….

select * from cube_instance
order by creation_date desc;

CUBE_SCOPE

Will allow us to view whether the payload of the running instance is stored in CUBE_SCOPE….

select * from cube_scope
order by modify_date desc;

XML_DOCUMENT

Will allow us to see whether the payload of the running instance is stored in XML_DOCUMENT….

select * from xml_document
order by doc_partition_date desc;

Test 1: Audit Level Development

The process takes a payload such as the following….

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
    	<ns1:start xmlns:ns1="http://xmlns.oracle.com/bpmn/bpmnProcess/BpmTestXMLDoc" xmlns:ns2="http://www.oracle.com/repeat">
    	<ns2:list>
    		<ns2:ListRow>
    			<ns2:Field1>R1F1</ns2:Field1>
    			<ns2:Field2>R1F2</ns2:Field2>
    			<ns2:Field3>R1F3</ns2:Field3>
                </ns2:ListRow>
    		<ns2:ListRow>
    			<ns2:Field1>R2F1</ns2:Field1>
    			<ns2:Field2>R2F2</ns2:Field2>
    			<ns2:Field3>R2F3</ns2:Field3>
                </ns2:ListRow>
    		<ns2:ListRow>
    			<ns2:Field1>R3F1</ns2:Field1>
    			<ns2:Field2>R3F2</ns2:Field2>
    			<ns2:Field3>R3F3</ns2:Field3>
                </ns2:ListRow>
        </ns2:list>
        </ns1:start>
    </soap:Body>
</soap:Envelope>

…i.e. three repeating rows with three fields in each, populated with simple 4-byte strings, not a very large message at all.

Contents of Tables at First Wait

XD_03

CUBE_INSTANCE

XD_04

…even with such a small message we have an uncompressed scope size of almost 15k.

CUBE_SCOPE

XD_05

…the payload itself is stored in the “SCOPE_BIN” field as a BLOB.

XML_DOCUMENT

XD_06

…there are two rows in XML_DOCUMENT, not really significant for our purposes.

Contents of Tables at Second Wait

XD_19

CUBE_INSTANCE (iteration 1)

XD_07

…the uncompressed payload size has increased to 16.5k

CUBE_SCOPE (iteration 1)

XD_08

…the payload has been updated (modify_date changed).

XML_DOCUMENT (iteration 1)

XD_09

…we now have another two rows in XML_DOCUMENT.

CUBE_INSTANCE (iteration 2)

XD_10

…uncompressed payload size remains the same, previous iteration must have been overwritten.

CUBE_SCOPE (iteration 2)

XD_11

…the payload has been updated (modify_date changed).

 

XML_DOCUMENT (iteration 2)

XD_12

…unchanged.

CUBE_INSTANCE (iteration 3)

XD_13

…uncompressed payload size remains the same, previous iteration must have been overwritten.

CUBE_SCOPE (iteration 3)

XD_14

…the payload has been updated (modify_date changed).

XML_DOCUMENT (iteration 3

XD_15

…unchanged.

Contents of Tables at Process End

XD_20

CUBE_INSTANCE

XD_16

…uncompressed payload size has dropped considerably, previous scopes removed.

CUBE_SCOPE

XD_17

…the payload has been updated (modify_date changed).

 

XML_DOCUMENT

XD_18

…unchanged.

Conclusion

Although the payload size (compressed and uncompressed) in CUBE_INSTANCE seems large compared to the message we passed in to the process, XML_DOCUMENT itself seems little used. It is evident that the payload itself is store in CUBE_SCOPE and is overwritten as the instance progresses through activities to completion.

Test 2: Audit Level Production

This should be more pertinent a test given this is the recommended setting for production environments.

XD_21

Contents of Tables at First Wait

CUBE_INSTANCE

XD_22

…more or less the same as before.

CUBE_SCOPE

XD_23

…as before.

XML_DOCUMENT

XD_24

…only one row this time, clearly audit level of “production” reduces the number of rows in XML_DOCUMENT.

Contents of Tables at Second Wait

CUBE_INSTANCE (iteration 1)

XD_25

…as test 1.

CUBE_SCOPE (iteration 1)

XD_26

…as before.

XML_DOCUMENT (iteration 1)

XD_27

…still only the one row.

The progression continues as in Test 1, and at the process end we get the following….

XML_DOCUMENT

XD_28

…still only the one row !

Conclusion

So with standard settings we have been unable to reproduce any noticeable growth in the XML_DOCUMENT table, in fact we reduced it by setting the audit level to “Production”… so what is causing this growth ?

Large Document Threshold

This is the key to XML_DOCUMENT table growth, the parameter can be found in the BPMN properties in Enterprise Manager….

XD_29

…what exactly is this telling us ? If the payload size is larger than 100k it is stored in XML_DOCUMENT instead of CUBE_SCOPE. There is a similar property for BPEL also.

What difference does this make ? Let’s do some more tests.

Test 3: Large Document Threshold Lowered.

It will be simpler for us to reduce the threshold rather than create a larger message (and therefore payload)…. let’s go to the extreme and reduce this to 100 (from 100k)….

XD_30

Contents of Tables at First Wait

 CUBE_INSTANCE

XD_31

…more or less unchanged.

CUBE_SCOPE

XD_32

…again not much difference (although the contents of the BLOB may be).

XML_DOCUMENT

XD_33

…already 6 rows in this table. Unlike the CUBE_SCOPE table which simply overwrote the payload BLOB as the instance progressed, we now seem to be adding a new row to XML_DOCUMENT for at the very least the entry and exit of each activity…. i.e. we have history in this table.

Contents of Tables at Second Wait

We’ll just concentrate on XML_DOCUMENT from now on….

XML_DOCUMENT (iteration 1)

XD_34

…another 5 rows.

XML_DOCUMENT (iteration 2)

XD_35

…another 6 rows.

XML_DOCUMENT (iteration 3

XD_36

…another 4 rows.

Contents of Tables at Process End

XML_DOCUMENT

XD_37

…unchanged, no activities after the end of the second scope.

Conclusion

So, we have found the culprit, payloads greater than the large document threshold are stored in XML_DOCUMENT, and unlike CUBE_SCOPE the payloads are not overwritten for each activity, a new row is created.

Overall Conclusion

It seems simple then, just increase the large document threshold to a huge value and XML_DOCUMENT will not be used.Instead the payload will always be stored in CUBE_SCOPE.

Easy !

Or not !

This highlights several things….

  • follow the BPM best practice to keep messages and payloads small
  • try to avoid iterating over large resultsets with large payloads in multi-instance sub-processes
  • have a fully tested purge strategy

But if we find ourselves in this situation, what can we do ?

As a first step i would advise doing some investigation into average compressed and uncompressed payload sizes in CUBE_INSTANCE for running instances, to give an idea of how large the payloads are.

As a second step I would use a like-live pre-production environment to load test as I was making incremental increases to the “large document threshold” and pay close attention to possible increased memory usage as the number of rows written to XML_DOCUMENT decreases.

There is no one-size-fits-all magic number here for “large document threshold”…. only thorough load testing can give you an appropriate value for your environment.

Summary

In this blog entry we have seen how the “large document threshold” has a critical impact on the growth of the XML_DOCUMENT table and briefly looked at ways of mitigating this.

Add Your Comment