Sites Asset Modeling when integrating with Endeca

Introduction

The combination of Sites + Endeca is a good fit as the products complement each other nicely with little overlap. However, there is no current best practice for what kinds of modeling changes should be considered (if any) when the two products are integrated. Since Endeca has exceptionally robust tools to precisely control denormalization, a best practice might be to consider leveraging Endeca to perform denormalization and not use the built-in Sites flex parent mechanism.

Main Article

Denormalization is a key ingredient to delivering performant websites. Sites was one of the first CMS’s to offer OOTB denormalization via its flex parent feature. While flex parents are generally considered key feature of the product, they are (unfortunately) limited to an “all or nothing” behavior. Specifically, that *all* attributes at the parent level are “inherited” by all children under the parent*. There is no provision for “some attributes inherit but others should not”. As such, loading up a flex parent with extraneous attributes is a universally-recognized bad practice: every child will inherit attributes that it doesn;t need with the result of significant database bloat (and subsequent performance issues). Furthermore, editing a flex parent forces all children to be updated (even if they don’t need to). As such, it has become common practice (some would say a “best practice”) to not specify *any* editable attributes at the parent level! This somewhat defeats the purpose of flex parents doesn’t it?

The business case is both simple and obvious: customers envision their siteplan as akin to folders on their website, but they also see these “folders” as having individual attributes that represent the webpage itself. Additionally, some of these attribtues (but not all) need to be inherited by all children of the folder for various reasons: searching, filtering, organizing, etc. Using flex parents to represent such “siteplan folders” simply doesn’t work in the real-world for most Sites implementations since updating such assets is an implied requirement and as we have previsously discussed, updating a flex parent can be significantly onerous in that potentially hundreds (even thousands) of child assets might need to be republished on any given parent update. Additionally, flex parents cannot effectively be used to create hierarchies, another implied requirement.

In contrast, one of the key strengths of Endeca is that it is very adept at precise and customizable denormalizing. This is because Endeca makes no assumptions about the data and that via its tools one creates an ingestion “pipeline” that imports data into the MDEX. And via this pipeline one can design which attributes are denormalized and which are not.
As such, the combination of Endeca + Sites is ideal: Endeca excels at search, denormalization, and pagination (all of which Sites is weak at) and Sites is ideal as a caching framework, drag-and-drop page layout, and general content management (all of which Endeca is weak at).

To that end, I propose that when modeling for a project where Sites + Endeca is known to be the solution, one should avoid using Sites for denormalization and use Endeca for that task instead. Pragmatically, this means that one should not use flex parent for inherited attributes (unless absolutely necessary and only on the condition that such attributes are rarely if ever updated). In fact, I will go a bit further: I believe that the best solution for packaging up attributes to be “denormalized” down to the children would be to design various definitions of the Page asset to create one or more taxonomies of siteplan nodes, each of which stores attributes that can be used for guided search by Endeca as per the rules embodied in its pipeline.

NOTE: using the Page asset aligns very well with an important restriction in Endeca: that hierarchies can only have a single value (i.e. they must not be multi-valued). Flex parents of course do not have this limitation thus present a potential problem when integrating with Endeca. In contrast, Page assets are always single valued with regards to their hierarchies, thus presenting no issues with Endeca’s restriction on hierarchies. In other words: Page assets are an ideal way of representing hierarchies in Endeca.

One of the key benefits of disentangling the asset model from the denormalization logic is that it aligns well with agile project methodologies. As such, editors and developers can use the Page asset in an ad-hoc manner, loading them up with as many attributes as needed for each node. (see endnotes for a discussion of a missing feature to further enhance agile methodologies)

The proposed denormalization model would look something like the following, wherein we use the Sites siteplan assets (i.e. Page assets) to represent not only the navigation, but the collections of attributes to be inherited by the children under each node (very much like flex parents):

Endeca + Sites integration modeling1

Compare the above with the much simpler modeling possible with leveraging Endeca to do the denormalization for you:

Endeca + Sites integration modeling2

Individual non-taxonomic assets (e.g. News, Products, White Papers, FAQs, etc.) would have as part of their definition a required attribute that points back to the taxonomy/siteplan node to which it “belongs” (i.e. TaxonomyNode). In other words, each editor would specify where in the navigation tree the current non-taxonomic child node belongs. Example: a PressRelease would point to the News page node, An Event would point to the Events page node, and so on.

At content ingestion/indexing time, the Endeca pipeline loads all the child nodes in on the first pass The pipeline then “appends” additional attributes (as needed) derived from the siteplan/taxonomy(s) to which each child node belongs. The lookup would be based on the TaxonomyNode id as the common key.

In this way, Editors are now free create and update as many taxonomies as needed, and load these nodes up with as many attributes as necessary. If there are any new inheritance rules needed, editors would convey the ingestion rules to the Endeca pipeline developers, who will then modify the pipeline to make denormalization extraction as appropriate.

Assumptions:

  • Endeca will index all “searchable” content and provides the content for all guided search navigation pagelets on the Sites-rendered webpage. There is no requirement for Endeca Experience Manager in this solution. OTOH, there is nothing in this solution that would prevent the inclusion of Endeca Experience Manager.
  • Since ATG is not involved in this discussion, the assumption here is that Sites will do 100% of the rendering using its JSPs and deploying the Endeca assembler either in the Sites context or remotely, outputting JSON

Endnote 1:

While it is good and well that Page assets can be useful for enabling an agile methodology, the trouble is, there is no OOTB way to convert one Page node definition into another Page node definition. Example, let’s say you have a Page asset that represents a Product category and initially you specify it to be of type “Section”. Later on, you might discover that you really need a different definition for such a page perhaps called ProductCategory. Assuming you now have dozens or hundreds of such “old” section assets that need to be converted to ProductCategory, updating these would be quite onerous. The cool thing about flex assets though is that from a database point of view, converting a Page asset of definition X into definition Y would require just a simple update to the the flextemplateid field in the Page table for the given record and a cleanup of existing attribute values in the Page_Mungo table for attributes that don’t exist in the new definition. In other words, the system makes it easy to add this functionality, which I feel is a much-needed feature enhancement.

Endnote 2:

Since child assets point to Page assets (and not the other way around), there is no OOTB “view” that shows the relationship of Page to child within the GUI. As such, a missing ingredient for the above proposal would be a customization to the Siteplan tab that shows the children under each Page asset in the righthand search pane.

Endnote 3:

* Note that via a property, one can globally turn flex inheritance completely off, but that still leaves flex parents not being appropriate for hierarchies, something that Page assets excel at

Comments

  1. If contents have an attribute that points to a Page asset, and Page assets cannot be shared, how do you solve sharing a content?

    • In general, sharing Page assets across sites is indeed problematic. The conceptual model presented herein works well within a single “site”. One might need to introduce “alias” assets into the mix to make it work for all use-cases when sharing of Page assets is required. Note that sharing of translated assets has a similar problem: sharing translations across sites is also problematic. In short, I find that sharing assets introduces more problems than it solves. And as such, when this becomes a key requirement, I generally would rather see the developers customize the GUI to use a single contribution site instead of dealing with all the problems that sharing across sites entails.

      • My question was about sharing just contents, not Pages, so your model introduces a difficulty. Sharing contents across sites is very useful. It is a common practice to share contents from the main site to a smaller and specialized one.

        On the other hand, managing contents with categories allows multiple parents and attributes inheritance could be necessary (just an attribute in the immediate parent to avoid performance issues…). It is hard to accept a model without flex parents and sharing…

        • the trouble is, changing an asset model post go-live for large content sites is almost impossible. I would suggest that a more flexible model would be to simply have child nodes only (basically just content “objects”). Then creating ad-hoc, changeable structures external to those objects delivers what clients really want. “Hardwiring” a model (i.e. the traditional WC Sites approach) is always problematic. And when it comes to upgrading clients often ask “how easy is it to change your N-level parent structure”? The answer to this question is always messy. It is always better to unentangle the content model from the navigation model. As for denormalization, there are many ways to achieve this. I happen to feel that the downsides of flex parents way of achieving denormalization outweigh any upsides.

Add Your Comment