I've heard from several customers lately who have asked about unexpected growth in XML_DOCUMENT table compared to other BPM tables. This blog looks into the reasons for this growth and some suggestions on how to mitigate it.
In order to demonstrate XML_DOCUMENT table growth we'll use the following process....
... this has a simple embedded sub-process and a multi-instance embedded sub-process so we can monitor whether these have any impact. Both have timer activities inside to give us a window to run some queries against the SOAINFRA tables.
For our first tests the audit level will be set to "Development"...
We will be running the following queries against the SOAINFRA tables....
Will allow us to query the compressed and uncompressed payload size....
select * from cube_instance order by creation_date desc;
Will allow us to view whether the payload of the running instance is stored in CUBE_SCOPE....
select * from cube_scope order by modify_date desc;
Will allow us to see whether the payload of the running instance is stored in XML_DOCUMENT....
select * from xml_document order by doc_partition_date desc;
The process takes a payload such as the following....
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <ns1:start xmlns:ns1="http://xmlns.oracle.com/bpmn/bpmnProcess/BpmTestXMLDoc" xmlns:ns2="http://www.oracle.com/repeat"> <ns2:list> <ns2:ListRow> <ns2:Field1>R1F1</ns2:Field1> <ns2:Field2>R1F2</ns2:Field2> <ns2:Field3>R1F3</ns2:Field3> </ns2:ListRow> <ns2:ListRow> <ns2:Field1>R2F1</ns2:Field1> <ns2:Field2>R2F2</ns2:Field2> <ns2:Field3>R2F3</ns2:Field3> </ns2:ListRow> <ns2:ListRow> <ns2:Field1>R3F1</ns2:Field1> <ns2:Field2>R3F2</ns2:Field2> <ns2:Field3>R3F3</ns2:Field3> </ns2:ListRow> </ns2:list> </ns1:start> </soap:Body> </soap:Envelope>
...i.e. three repeating rows with three fields in each, populated with simple 4-byte strings, not a very large message at all.
...even with such a small message we have an uncompressed scope size of almost 15k.
...the payload itself is stored in the "SCOPE_BIN" field as a BLOB.
...there are two rows in XML_DOCUMENT, not really significant for our purposes.
...the uncompressed payload size has increased to 16.5k
...the payload has been updated (modify_date changed).
...we now have another two rows in XML_DOCUMENT.
...uncompressed payload size remains the same, previous iteration must have been overwritten.
...the payload has been updated (modify_date changed).
...unchanged.
...uncompressed payload size remains the same, previous iteration must have been overwritten.
...the payload has been updated (modify_date changed).
...unchanged.
...uncompressed payload size has dropped considerably, previous scopes removed.
...the payload has been updated (modify_date changed).
...unchanged.
Although the payload size (compressed and uncompressed) in CUBE_INSTANCE seems large compared to the message we passed in to the process, XML_DOCUMENT itself seems little used. It is evident that the payload itself is store in CUBE_SCOPE and is overwritten as the instance progresses through activities to completion.
This should be more pertinent a test given this is the recommended setting for production environments.
...more or less the same as before.
...as before.
...only one row this time, clearly audit level of "production" reduces the number of rows in XML_DOCUMENT.
...as test 1.
...as before.
...still only the one row.
The progression continues as in Test 1, and at the process end we get the following....
...still only the one row !
So with standard settings we have been unable to reproduce any noticeable growth in the XML_DOCUMENT table, in fact we reduced it by setting the audit level to "Production"... so what is causing this growth ?
This is the key to XML_DOCUMENT table growth, the parameter can be found in the BPMN properties in Enterprise Manager....
...what exactly is this telling us ? If the payload size is larger than 100k it is stored in XML_DOCUMENT instead of CUBE_SCOPE. There is a similar property for BPEL also.
What difference does this make ? Let's do some more tests.
It will be simpler for us to reduce the threshold rather than create a larger message (and therefore payload).... let's go to the extreme and reduce this to 100 (from 100k)....
...more or less unchanged.
...again not much difference (although the contents of the BLOB may be).
...already 6 rows in this table. Unlike the CUBE_SCOPE table which simply overwrote the payload BLOB as the instance progressed, we now seem to be adding a new row to XML_DOCUMENT for at the very least the entry and exit of each activity.... i.e. we have history in this table.
We'll just concentrate on XML_DOCUMENT from now on....
...another 5 rows.
...another 6 rows.
...another 4 rows.
...unchanged, no activities after the end of the second scope.
So, we have found the culprit, payloads greater than the large document threshold are stored in XML_DOCUMENT, and unlike CUBE_SCOPE the payloads are not overwritten for each activity, a new row is created.
It seems simple then, just increase the large document threshold to a huge value and XML_DOCUMENT will not be used.Instead the payload will always be stored in CUBE_SCOPE.
Easy !
Or not !
This highlights several things....
But if we find ourselves in this situation, what can we do ?
As a first step i would advise doing some investigation into average compressed and uncompressed payload sizes in CUBE_INSTANCE for running instances, to give an idea of how large the payloads are.
As a second step I would use a like-live pre-production environment to load test as I was making incremental increases to the "large document threshold" and pay close attention to possible increased memory usage as the number of rows written to XML_DOCUMENT decreases.
There is no one-size-fits-all magic number here for "large document threshold".... only thorough load testing can give you an appropriate value for your environment.
In this blog entry we have seen how the "large document threshold" has a critical impact on the growth of the XML_DOCUMENT table and briefly looked at ways of mitigating this.
Director - Oracle A-Team
Previous Post