Best Practices from Oracle Development's A‑Team

Notes on Querying Endeca from within an ATG Application


On a few projects in 2014, the issue of Endeca's performance came up. Specifically, applications were seeing a large number of queries and were also generating large response sizes from Endeca. These queries were not being generated by the Assembler API, but were one-off queries created to bring back other information from Endeca.

This article will give some tips on how to optimize those queries.

Notes on Endeca Query Response objects

A response from Endeca can consist of a number of different pieces of data:

  • Records (aka, products)
  • Dimensions to navigate on
  • Breadcrumbs (also sometimes called Descriptors), which return information on which things have been navigated on
  • Supplemental objects: Used internally, these bring back meta-data about the rules being executed (such as landing pages, content slots, etc.)
  • Dimension search results: These are special queries that search only within dimensions, not products.
  • Key properties: Almost never used. Returns meta-data about the properties and dimensions in the index.
  • Other information about the index (such as the list of available sort keys, search fields, etc). You can't really turn any of that off.

In an Assembler based application, the overall flow for a standard page being rendered would be this:

First, a super-lightweight query is executed that only returns Supplemental objects representing information from Experience Manager. Depending on that meta-data, one or more subsequent queries is executed that will return:

  1. 1. Dimensions
  2. 2. Records
  3. 3. Breadcrumbs
  4. 4. NOT supplemental objects (this is true for patched 10.2, 11.0 and 11.1)

Thus a standard basic page would generate about two Endeca queries to be rendered. There are some cartridges that generate more:

  • featured records cartridges generate one query per cartridge. So if you have three featured record cartridges, there would be three extra queries

In addition, the Assembler API is usually very good about only bringing back the exact information it needs. This means that it will only "open up" the dimensions being requested, and will return back the attributes on the products specified in the ResultsList configuration.

When it comes to making standalone queries to Endeca, you need to understand the information above so as to NOT bring back any more data than necessary.

Scanning for Standalone Endeca queries in code

The easiest way to scan an application for one-off Endeca queries is to do a search for "ENEQuery" or "ENEQueryResults" in you .java classes. In addition, searching for "InvokeAssembler" in .jsp's. You can also search for UrlENEQuery.

If you find many instances of those, each and every query involved should be assessed using the information laid out below.

Improving your queries

Limiting which properties to return on a record

In a standard commerce application, it's not uncommon for one record (representing a product or SKU) to have 80 or 100 or more properties. These can be things like product ID, UPC, short descriptions, size/color/widths, prices, image URLs, etc.

If you are not careful, it's very easy to return all 80 or 100 or whatever property values with a standalone query.

To look at what comes back by default, you can look at the orange JSP reference application (typically located at http://localhost:8006/endeca_jspref, with a host of localhost and port of 15000).

The list of properties to be returned can be controlled by using ENEQuery.setSelection(). This requires that you to specify every single property (and dimension) to be returned. It is case-sensitive.

Limit the number of records to return

By default, a query will return 10 records. To limit this, you can use ENEQuery.setNavNumERecs.

In the .reqlog, if you see &nbins=10, that means that someone didn't set this value specifically and is probably using the default.

At the same time, you shouldn't set this value to be too large. If you find yourself setting this to 50 or 100, you might be doing something wrong.

Omit Supplemental Objects

A supplemental object is the meta-data about a landing page or content slot. If you use the orange reference application, at the top you'll see one or more "Unknown Merch Style". Scrolling to the bottom of page, you'll see a series of "Matching Supplemental Objects".

What's the big deal about these? Well, these can actually get somewhat large in size (for instance, if you have cartridges that allow merchandisers to copy/paste raw HTML). Also, the only real time they need to come back is when doing Assembler queries; not one-off queries.

There's no flag for turning supplemental objects on/off. However, you can add a merch rule filter that will have the effect of turning them off. (This is what a hotfix for 10.2 and what 11.x do by default. If you look in the .reqlog, you'll see &merchrulefilter=endeca.internal.nonexistent in some of the queries).

This can be turned on by using ENEQuery.setNavMerchRuleFilter(). Basically any nonsense string in here will have the correct effect. This would also be a good place to put a message in for logging purposes. Something like ENEQuery.setNavMerchRuleFilter("topNavigationQuery").

In the .reqlog, you should see &merchrulefilter .

Don't expose all dimensions

If you look at the orange reference app, you'll see that the dimensions on the left side are "closed" up. If you click one, the page will refresh and now that dimension will be "opened" up.

If you would like to open up all dimensions, you can use ENEQuery.setNavAllRefinements(true).

However, this can be potentially very expensive. With no dimensions being returned, the MDEX Engine doesn't have to compute the refinement counts (aka, "How many records are there for Brand=XYZ"?) Also, this can inflate the response size greatly, especially for big flag dimensions.

Instead, you should specify which particular dimensions you want to return. Unfortunately, you need to specify the ID of the dimension, not the name.

If you know the ID of the dimensions you care about, you can use UrlENEQuery.setNe() and pass in a string like "123+234+532".

Looking through the .reqlog, if you see &allgroups=1, that means somewhere someone has setNavAllRefinements(true).

Use record filters instead of keyword search

Let's say you're on a product details page. If you know the ID of the product, you have two choices: You can do a keyword search on the ID field passing in the string of the value. Or you can construct a record filter. A record filter is usually faster and cleaner. (There's no reason to fill your logs with searches that customers didn't type in).

ENEQuery.setNavRecordFilter() is the method. An example might be: query.setNavRecordFilter("AND(product.id:2342342))".

Use setQueryInfo for logging custom things to Endeca's .reqlog files

A little-used feature is the ENEQuery.setQueryInfo() method. This lets you stuff any number of key/value pairs that get sent to the MDEX Engine, ignored, but written out to the .reqlog file. This can be useful for adding things like session ID, debug information, etc.

For our case, what might be good is to write out why this query is being executed. "pdpBreadcrumbs" "typeahead" , etc.

This way, if there are slow or big queries found during performance testing, it will help track them down and help distinguish between real Assembler queries and your one-off queries.

These messages will show up in the .reqlog as &log=

Don't ever set setNavERecsPerAggrERec to 2

ENEQuery.setNavERecsPerAggrERec() allows you to specify how many records are returned per aggregate record. For example, say you are a clothing website. You probably index by SKU (which would represent a single Size/Color combination for a product). When doing query to Endeca, instead of returning info at a SKU level, you would aggregate things by a rollup key using ENEQuery.setNavRollupKey().

setNavERecsPerAggrERec() allows to you bring back 0, 1 or all SKUs within a product. You should do everything possible to NOT set it to the value of "2", which is all.

(As a point of reference, ENEQuery has 3 static values representing those numbers. ZERO_ERECS_PER_AGGR, ONE_EREC_PER_AGGR, ALL_ERECS_PER_AGGR).

In the .reqlog, if you see &allbins=2, then that means someone setNavERecsPerAggrERec(ALL_ERECS_PER_AGGR).

Now, this might make things complicated for you. For instance, on one site on the search results page, they wanted to display the color swatches from each different SKU. By setting it to bring back all, they were able to iterate across all of the SKUs in the product to generate that list.

Instead, things were changed so that each SKU was tagged with information about all of the other SKUs. This allowed us to change this from 2 to 1. Response sizes went from 10 megs to 100kb.

Watch out for filtering based on timestamps

For some commerce sites, they might set products to activate during the day ("Starting at 1pm EST, this product should show up, but before that, it shouldn't").

One way to do this would be to tag all products with a start date and end date. And then with each query to Endeca, pass along a range filter for the dates.

The problem, however, can be that the MDEX Engine does some internal caching based on these values. If the date value you specify is too granular, then the MDEX won't work as fast as it could. So don't specify a timestamp down to the second or millisecond. Try and do timestamps for the hour, or at least chunks of minutes (like 20 or 30 minutes) to ensure that some cache hits occur.

Range filters can be set by using setNavRangeFilters().

In the .reqlog, you can look for &pred . A CRS example might look like: pred=product.endDate%7cGTEQ+1.4163552E12&pred=product.startDate%7cLTEQ+1.4163552E12

Don't return key properties

This is a little-used feature, so it's not something you'd come across very often. Key properties return meta-data about the definitions of properties and dimensions themselves. This can be turned on using ENEQuery.setNavKeyProperties(ENEQuery.KEY_PROPS_ALL).

This can greatly inflate the response size of a query from Endeca.

If you do need this for some reason, you should only need to execute the query once, and then cache the results from it.

This can be found in the .reqlog as &keyprops=all

Things that CRS does that aren't optimal

Careful readers might notice that CRS breaks some of the rules above. In particular:

  • CRS filters based on timestamps
  • CRS used to do setNavERecsPerAggrERec = 2

What would the worst query in the world look like?

As an interesting point of reference, the world's worst Endeca query would:

  • setNavAllRefinements(true)
  • not use .setSelection()
  • not use .setNavMerchRuleFilter()
  • uses setNavRollupKey()
  • does a wildcard keyword search
  • have a high number of search terms (in addition to the wildcard)
  • setNavNumERecs() to a large value
  • setNavKeyProperties(ENEQuery.KEY_PROPS_ALL)
  • sorts on something not frequently sorted on
  • uses pagination ( .setNavERecsOffset) to go to a high page number\
  • use a geospatial filter
  • uses a range filter ( .setNavRangeFilters())

What would the world's fastest query look like?

  • no keyword search
  • setNavAllRefinements(false)
  • setNavNumERecs(0)
  • setNavMerchRuleFilter("lksdkjfd")
  • doesn't touch setNavKeyProperties()
  • uses a setNavRecordFilter() for a record filter that had been previously used and basically filters everything out

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha