Tips for Improving Endeca CAS Reliability with ATG

The Content Acquisition System [CAS] component is provided as a part of the Endeca Guided Search product. CAS provides the record stores into which ATG pushes product information in preparation for search indexing. It is a generational data store; in other words, it keeps track of product information changes being made over time and applies them in the order in which they are received.

This article describes some operational controls that you can manage to optimize the efficiency of the data store.


The tips contained in this article apply to Endeca releases prior to 11.0. Changes have been made to CAS instrumentation to automatically prune records from generation data as they are superseded by newer content. With the changes introduced, each baseline will operate as fast as the first.


Update your CAS Record Store Retention Times

CAS is a generational data store (meaning that it keeps track of previous versions of records). This is useful for tracking which things have changed for a partial update, but beyond that, this may be troublesome since it might retain data beyond when it is needed (especially if you do baseline updates daily or more frequently).

This tip allows you to establish a retention period for the generations of data kept by CAS in line with your operational needs.

After a baseline update is performed, the generational data stores may be pruned. So we’ll establish a retention period that is roughly twice that of your baseline update frequency. It should be a number roughly two times larger than your baseline update frequency. So if your baseline updates are being performed every 24 hours, then having a value of 48 hours would be fine.

First, you have to export your CAS record store configuration. For example, if the data source is named “attributes”, you’ll use the CAS recordstore-cmd command as follows:

${ENDECA_HOME}/CAS/11.2.0/bin/recordstore-cmd.sh get-configuration -a attributes -f attributes.xml

Next edit the attributes.xml text file that you just created. Note that the generationRetentionTime value isn’t specified. Add

<generationRetentionTime>48.0</generationRetentionTime>

Change the 48 value (which is in hours) to something that makes sense for your operational procedures. In Endeca 3.1.0, the default is 168 hours (1 week). In 3.1.1+ it’s set to 48.

Finally, update the live configuration in CAS.

${ENDECA_HOME}/CAS/11.2.0/bin/recordstore-cmd.sh set-configuration -a attributes -f attributes.xml

Repeat this process for each of your record stores.

The problem with an overly large number is that this will increase the size of the database file(s) written to disk. As more data is added, and more frequently, this will slow down the internal database that stores the generational data and may contribute to timeouts and errors when sending data from ATG to CAS.To get a current list of your CAS record stores, use the command:

  ${ENDECA_HOME}/CAS/11.2.0/bin/component-manager-cmd.sh list-components

NOTE: This has been written up in more extensive detail in this document on the Oracle support site.

Change the default maxIdleTime for CAS

As the size of the CAS record stores grow, the time that it takes for operations against it may increase. Timeouts may occur on operations that exceed the jetty timeout set.

In the jetty.xml configuration file for CAS (for example, located here):

${ENDECA_HOME)/CAS/11.2.0/workspace_template/conf/jetty.xml

there is a setting:

<Set name="maxIdleTime">600000</Set>

Consider increasing this value (specified in milliseconds).

The jetty documentation states that:

“Jetty interprets this value as the maximum time between some progress being made on the connection. So if a single byte is read or written, then the timeout (if implemented by Jetty) is reset.”

If there are problems with CAS, such as it getting overly large or fragmented due to the issue described above about retention times, it’s possible for this to get exceeded.

Safeguard the baseline_update process

A baseline_update operation triggers a complete replacement of the associated search index content. ATG sends all of your product information to CAS, then triggers indexing and uptake of the replacement content into your MDEX (search engine) instances.

If you were to encounter problems sending data from ATG to CAS, you might run into a situation where an invalid number of records were sent followed by initiating a baseline update. This could result in your having only a fraction of your product catalog in Endeca (and thus a website with many missing products).

A configuration property was added to the IndexingOutputConfig class in ATG named minDocumentsForBulkCommitting. This property only effects bulk/full repository export and represents the minimum number of documents/records required before a bulk index will be committed. If the number of records is not exceeded, then a baseline update won’t get triggered.

The javadoc for this class may be found at:
docs.oracle.com/…/Platform.11-2/apidoc/atg/repository/search/indexing/IndexingOutputConfig.html

An article is available here on the A-Team Chronicles website showing you how you can introduce additional sanity checks into the baseline_update process.

Final Notes

Proper care and feeding of your CAS generational record store will go a long way towards ensuring satisfactory uptake of your ATG product data into Endeca.

If you ever get into a situation where you need to debug your CAS content, a tool is available here on the A-Team Chronicles website that will allow you to visually inspect the current content of your record stores.

Add Your Comment