X

Best Practices from Oracle Development's A‑Team

Debugging CAS with the Endeca RecordStore Inspector

Introduction

When it comes to debugging Endeca Content Acquisition System (CAS) related issues, there are few tools that Endeca developers have at their disposal to aid them in their troubleshooting. Most know to review the CAS service logs, but occasionally an issue arises where a peek inside a record store can be very revealing. If you’re a savvy Endeca developer you already know that you can export record store content using the "recordstore-cmd" command. Combined with a good text editor this command can be very useful, but CLI’s can be a bit tedious to work with at times, so when I recently ran into such an issue I decided to write my own visual tool for inspecting Endeca Record Stores, which I aptly named the Endeca RecordStore Inspector. In this article, I introduce the Endeca RecordStore Inspector utility and show how it can be used to debug Endeca CAS related issues.

Main Article

I was recently assisting with a CAS issue where the CAS Console was reporting failed records for an incremental crawl of a Record Store Merger where one of the data sources contained deleted records. The origin record store was reporting the deleted records correctly, but when the last-mile-crawl merger was run it would report the deleted records as failed records.

For each failed record, I could see the following messages in cas-service.log:

WARN [cas] [cas-B2BDemo-last-mile-crawl-worker-1] com.endeca.itl.executor.ErrorChannelImpl.[B2BDemo-last-mile-crawl]: Record failed in MDEX Output with message: missing record spec record.id

DEBUG [cas] [cas-B2BDemo-last-mile-crawl-worker-1] com.endeca.itl.executor.ErrorChannelImpl.[B2BDemo-last-mile-crawl]: Record failed in MDEX Output (MdexOutputSink-826115787) with Record Content: [Endeca.Action=DELETE]

The messages were not intuitive at first, but I could tell that a different record spec identifier was being used, so to see what was going into the record stores I decided to create a tool for visualizing the contents of a record store in a familiar tabular format. Using this tool, I could see that the records contained both an “Endeca.Id” property as well as the “record.id” property:

However, when one of the source files was removed and the acquisition re-run, the new generation contained the delete records with only the “Endeca.Id” property:

So when the last mile record store merger was run, it didn’t know how to merge the delete records because the record spec identifier (as well as the Record ID property on the data sources) had been changed to “record.id”, thereby producing the above warning message ("missing record spec record.id") for the DELETE action entries.

Of course the same diagnosis could have been made using recordstore-cmd and a text editor, but some things are easier done in a GUI. For example, sorting records by a specific column type. The Record Store Inspector allows you to sort on any column, as well as filter which columns are visible using regular expression syntax. You can open two different generations of a record store and compare them side-by-side. You can even export the contents of a record store (with or without filters applied) to a comma-separated value (CSV) text file, Microsoft Excel file, or Endeca Record XML file. These sorts of operations are more difficult to do when using just recordstore-cmd and a text editor, and my goal in creating this tool was to make the Endeca community more productive in their ability to diagnose CAS related issues on their own.

About the Endeca RecordStore Inspector

The Endeca RecordStore Inspector utility was written using JavaFX 8 and Endeca 11.1, and runs on Windows, Linux, or any environment that supports the Java 8 runtime. Below I've provided download links for two versions; a portable version optimized for Windows, and a self-contained Java jar file for all other environments. To run the Windows version, simply extract the contents of the attached archive and double-click on the rs_inspector.exe file. This version includes a Java 8 runtime environment, so there is no need to install Java or Endeca to run it. To run the self-contained jar file you will need a Java 8 Runtime Environment installed. If one is already present, just copy the attached file below and run the command: "java -jar rs_inspector-1.0-all.jar" to launch the application.

When the application has started you can press CTRL-R to select the record store and generation that you want to view. If your CAS Server runs on a host/port other than localhost:8500, then you can use ALT-S to change the default settings.

Once the record store loads, you can further refine the view using Java regular expression syntax in the column and value fields. This will restrict the columns and rows visible to only records matching the regular expression syntax specified. For example, to view only the columns for "product.id" and "product.short_desc" you can specify a Column Text filter of "product\.id|product\.short_desc" and click the Apply Filter button. To further refine the view to show only products with an id value greater than 3000000, you can use a Value Text filter of “[3-9][0-9]{6}[0-9]*|[1-9][0-9]{7}[0-9]*”.

It is important to point out that this tool currently loads all rows in the selected record store into the table view, so depending on how large the record store is, if your JVM doesn't have sufficient memory to store all the data, you will receive an OutOfMemoryError. If your environment has sufficient memory to store the entire record store in memory, then you can increase the JVM memory settings (-Xmx) to support your record store. If your record store is larger than 2GB, you should run the RecordStore Inspector in a 64-bit JVM. If the memory in your environment is limited, then you may not be able to load your record store using this tool. Perhaps later versions will offer the ability to incrementally load data into the view. If this is important to you, then please let me know.

Summary

Endeca content acquisition can be somewhat of a black box. To provide some transparency to this process I’ve created the Endeca RecordStore Inspector. The Endeca RecordStore Inspector is a visual tool intended to aid in the debugging of issues pertaining to Endeca CAS data ingestion. In this article we’ve seen one example of how this tool was used to make sense of a seemingly enigmatic error message, but the applications of this tool are much broader in scope, not only in its use as debugging aid, but as a medium for understanding Endeca CAS in general.

Below are links to download the Endeca RecordStore Inspector. Please note that this tool is provided “as-is”, without guarantee or warranty of any kind. It is not part of the Endeca or Oracle Commerce product suite, and therefore not supported by Oracle. However, a link to the complete source code is provided below, and you are free to fix any issues or enhance the tool in any way you like.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha