Localization Patterns Using WebCenter Sites

Introduction

Translating and managing localized web content is a complex undertaking for most enterprises. The number of use-cases WebCenter Sites (WCS) clients have come up with that we have seen in the real-world is enormously varied. One might think that with regard to translations that “all clients have the same requirements” but that is simply not the case. While WCS does provide basic translation “hooks” it does not provide an all-encompassing, productized solution that will work OOTB for all clients. Given WCS’s flexibility in asset modeling and Template coding (as well as flexibility with regard to customizing the contributor interface), Sites Engineering decided to let partners handle any client-specific implementation details. Which leads us attempt to answer the question: What are some useful patterns for leveraging the multi-locale mechanism when using WCS?

Note: this blog post does not attempt to answer everything regarding translations/localizations, but instead should be seen merely as an “introduction” to the topic – one designed to get you thinking. Further, the discussion is “generic” in nature and parts of it will likely not be appropriate for all clients without some refactoring/rethinking. If there is interest, I may elect to post more blogs on this subject in the future.

Main Article

Assumption: for this blog post, we are talking about WebCenter Sites 11.1.1.8

For starters, there are two basic types of translation “flows”:

  1. 1.)    editorial teams create and publish content in a primary language first, then after-the-fact, translate and publish translations/localizations as needed in an ad-hoc manner
  2. 2.)    editorial teams create content and translate it together as a “package”, then when completed, they publish the package all at once.

For most clients, they need to be able to do both.

The next point is to recognize that what clients require and what WCS delivers OOTB represents a rather wide delta. What clients generally want is to be able to create content in a master language, wire it up to other related content (which may or may not be in the master language), then either publish it and translate it later, or translate it and all its related bits and pieces, then publish the whole as a single package. Further, they want certain country-specific rules to apply to the rendered content. Basically, the hope is that somehow WCS can “clean up” any mess they might make! ;) Given that WCS allows for dynamic page composition as well as dynamically-rendered content the ability to apply rules and have the page perform well is a key deliverable.

One master dimensionset? Or multiple?

At its core, WCS has a built-in “translation mechanism” which uses an assettype called Dimensionsets. The idea being that all translated versions of the same content will belong to the same Dimensionset instance. A further feature is to allow one to design a “fallback” tree of locales. The idea being that while the current locale may not have anything translated there “might” be something similar and thus the fallback version would be shown instead.

While the fallback mechanism looks promising, in the real-world, clients always demand exceptions. And here is where the gap between what WCS delivers and what clients require is greatest. One of the major drawbacks of a single dimensionset is that it can only contain each locale once. So using the same locales in a different sequence for different countries is impossible. Such as:

GLOBAL

en-GLOBAL

en_US

en_GB

fr-GLOBAL

fr_CH

de-GLOBAL

de_CH   (the Swiss German site)

de-GLOBAL

de_CH

fr-GLOBAL

fr_CH  (the Swiss French site)

 

One small step towards reducing this gap is to develop different dimensionsets for each country. In this way, each country can define its own fallback logic independent of other countries. A typical example given is that a country site like France will often require that no English assets ever show on that site. Whereas a country site like The Netherlands it is likely to be acceptable to have English assets rendered in addition to Dutch. In other words, it is very common for different countries to have different fallback requirements. Consequently for the rest of this blog post, we assume that each country will have its own fallback dimensionset. Thus, if our site is supposed to support twenty countries, there would be twenty dimensionsets (with perhaps 30 or 40 or more locales in total).

Which brings us to the first complication: how does the rendering Template code “know” what dimensionset to use? (recall that the translate tags require the developer to specify the name of the dimensionset to use for the current translation). If we make the following two assumptions, things get a little bit easier:

  • the name of the dimensionset is the value of the country code (e.g. uk for the United Kingdom, de for Deutchland, etc.)
  • all web-referenceable assets* have a URL “path” that explicitly includes the country code (e.g. www.mysite.com/uk/helloworld, and www.mysite.com/de/helloworld, or alternatively one can support different subdomains such as uk.mysite.com/helloworld, etc.)

Note: it is generally considered bad practice to render different content for the same URL for different users based on their accept-language value or IP geo-location. The reason is that SEO rankings are likely to drop as Google et al may consider such a strategy as a form of bati-and-switch.

With the above two assumptions in place, all one need do is to parse the URL request to extract the “country” then pass the country value as an additional argument which can then be used by the Template to specify which dimensionset to use for the current asset*. This logic is most cleanly implemented as a simple mod_rewrite rule. But assuming each web-referenceable asset has a URL that explicitly has the country code as part of the URL, then it could also be done in a Sites Groovy wrapper, since you are only parsing the country code in order to pass it as an EXTRA parameter down to the underlying pagelets. Or put another way, the assumption here is that we are not parsing the URL to “repackage” it into another URL – we assume there that the requested URL *IS* the requested asset and its value is stored in the URL lookup table maintained by WCS. If that is not the case for your client, then that is an advanced topic which we can discuss in another blog.

* it is common for some clients to choose a primary site that has no country code which we can assume is the “default”.

However, we still need to know the specific LOCALE (a required argument for our rendering tags) to render content for the current visitor. Here is where we see clients not coming to agreement. One way of thinking about it is that the visitor requested an en_US translation because the URL requested is bound to an en_US asset! As such, we can always trust that this is what they want. However, another way to think about it is that the visitor may have indicated a “prefered locale” cookie be set on their browser and we should honor that instead. And lastly, there are clients who want “virtual” URLs to be supported wherein an asset might be published as …/us/helloworld, but the client also wants the SAME CONTENT rendered (without needing to translate it) to show on “other” english-speaking country sites: e.g. …/uk/helloworld, …/ca/helloworld, …/au/helloworld and so on. This last requirement is a bit tricky. While WCS allows one asset to have multiple 200 URLs, there is no OOTB way to specify in code which 200 URL you want to render on any given page. Further, if an en_US asset owns multiple country-specific URLs it “may” not belong to the country-specific dimensionset!

One must tread carefully here. Likewise, assuming that one might want to “deprecate” such a country-specific virtual URL to be overridden by an actual translation, there is no OOTB way to automatically “repair” such a conflict (i.e. both the old asset and the new translation having the same URL). Further, the translated asset “might” have a slightly different URL (due to localizations) and thus the old virtual URL would need to be converted from a 200 response into a 301 response (or even deleting the old URL which is not wise given the negative effect it would have on SEO ). All of the above need custom logic to be implemented either in templates or in the asset model via flex filters, asset listeners, or pre/post update customizations.

One thing that seems to help clients address their specific requirements is the concept of a “global” (or master) version of an asset. So instead of having only ISO locales in our dimensionset hierarchy, we might want to consider having a “made up” locales as needed. Example

Example CH (Switzerland) dimensionset:

global

en_GLOBAL

en_CH

de_GLOBAL

de_CH

fr_GLOBAL

fr_CH

The above hierarchical dimensionset might play out like the following:

  1. 1.)    when editors require locale-specific differences they can make ad-hoc “localized translations” as needed (i.e. either en_CH, de_CH, or fr_CH above)
  2. 2.)    where there are language-specific differences that need to be available to other dimensionsets one can make ad-hoc “global translations” instead (i.e. either en_GLOBAL, global-de, or fr_GLOBAL above)

While the two examples above don’t cover all use-cases, they resolve the bulk of such requirements and provide enough flexibility for most clients. One thing a fixed hierarchy like the above does not provide is the ability to provide exceptions to the fallback logic. Example: in the above France example we might occasionally want to use Canadian French as needed – however, short of adding Canadian French into the hierarchy, this is not possible. The solution of course is to use fr_GLOBAL as much as possible if you think that a certain asset can be syndicated widely across regions.

Note that one can extend the above to also have a “regional fallback”. Two good examples are Latin American Spanish “es_LAD” and South Asian English en_AS.  This allows marketing to create content that is localized for a Latin America audience AND have it syndicated to many country sites WITHOUT it displaying on other international sites as it would if en_GLOBAL were used.

Note also that it gets very messy the minute you add legalese into the mix – often one must avoid use of certain phrases and claims across different countries. Hence the need to have both “global” and “local” versions available as fallback, depending on the asset and its content.

Of course, the next complication is that clients want a single asset to be not only “global” but also only available to a subset of countries for which that language is appropriate. This requirement is a subset of “syndication” — something that is entirely missing in WCS. The typical example goes like this:

I want to create a new English press release announcing a new product but I only want it available in North America and Australia/New Zealand. The OOTB solution would be to create a master US English press release, then “translate” it into Australian, Canadian, and New Zealand English. But clients don’t want to do that (too many steps). Instead, they want to have a Global English asset that can specify which countries to be available in (with matching country-specific URLs as mentioned previously). And then *IF* a country editor then decides to translate the global asset later into a localized version, the system should “clean itself up” by removing the old URL from the master/global asset (or optionally, converting it to a 301 in the case where the new URL is different than the original global one).

Alternatively one could set “locale attributes” and it would then upon save a custom element would create the locale-specific versions of the URLs for you. Lots of options here.

Searching for locale-specific content in code.

A somewhat related detail to the above discussion is how are query-based searches able to gather up and render lists of assets on a per-locale basis, taking into account all the various fallback logic?

For example, if I am on a French webpage, I probably only want assets that are EITHER fr_FR or fr_GLOBAL (i.e. for most clients we generally don’t want to show “duplicate” translations in the same dimensionset).

How best to do that?

For queries to return dynamic lists of assets based on searchable attributes, it goes without saying that the assets need to have attributes that can be used to constrain the search to a specific locale. To enable this, we must convert each asset’s selected locale (which is not an attribute per se) into a multi-valued attribute named SearchableLocale via a custom flex filter. In the case of an asset whose locale is fr_FR we only need to store a single value of SearchableLocale=fr_FR since (for most clients) fr_FR would be at the leaf node of the fallback hierarchy for France. However, an asset whose locale is fr_GLOBAL should store fr_FR as well as any other French-speaking locales across all French-speaking countries. In this way, this global French asset will show up on multiple French language sites via any constrained queries. As an example, for a European-only website, a fr_GLOBAL asset would upon save create both fr_FR and fr_CH SearchableLocale values via its flex filter and thus show up on the FR and CH sites.

There is a downside to the above denormalization of fallback values into an asset attribute. Specifically while using denormalization allows you to perform fewer queries at runtime it may come back to bite you 6 months later when Marketing requires changing the fallback logic and all data values need to be recomputed! Proceed with caution when designing your solution! (one solution might be to leverage Endeca to perform the denormalization for you via custom pipeline and then let Endeca perform all your dynamic searches for your WCS pages).

For many clients there is also a requirement to also be able “hide” a global asset from certain countries. Thus a HideFromTheseCountries attribute should be added to the definition for each translatable asset and a custom attribute editor created that restricts the list of countries shown in the editorial interface that can the current asset can be hidden from based on the locale selected (i.e. if the current locale were fr_FR then the list of countries to hide would be null, whereas if the locale chosen were fr_GLOBAL then the list of countries might include FR, CH, etc. (depending on the number of dimensionsets created to support the all the country sites).

With the combination of custom HideFromTheseCountries attribute/attribute editor and a custom flex filter that denormalizes the optimized locales as multi-valued SearchableLocale attribute, then any queries we require in our rendering Templates only need to add the current locale as a constraint. For example, on a webpage whose locale was fr_FR, you might want to show the “Latest News” in a right-rail pagelet. This pagelet would query the News assets where locale=fr_FR and sort by date and voila, a proper list — with one caveat: there might be duplicates since a fr_GLOBAL asset might also have been translated into fr_FR and thus both would have been returned in the list. The remaining step is to remove duplicates and to do that, we use the translate tag <dimensionset:filter> which takes an iList (from our constrained query) which filters out duplicates.

One contribution site, or multiple?

Quick answer: for the least risk, the general recommendation is to have just one contribution site to serve all translations particularly when there is sharing of content across multiple visitor sites (see syndication discussion above). The primary reason is that when you create a translation of an asset in contribution site x (example: translate an en_US asset from the US contribution site into fr_FR), there is no automatic way to have that asset “just show up” in the FR contribution site. Additionally, one would logically want that new translation to be also automatically “unshared” from the US site (yet another customization). For obvious reasons, note that being able to translate into fr_FR would mean that the fr_FR locale would need to be defined as part of the US site, which would bloat its contributor interface with locales that have nothing to do with the US. And lastly, if the FR contribution site had its own editorial team independent of the US editorial team, there is no way to share such an asset between two sites and maintain workflow process.

The reason is because any workflow process requires that the workflow members be assigned to the workflow itself PRIOR to the workflow process being saved. And as the FR contributors would not exist on the US contribution site (and vice versa) there is no way to assign an asset created on the US site to FR contributors! (However, conceivably this could be done with custom code). Basically, the only time individual contribution sites make sense is when there is absolutely no “sharing” going on (ironically even though sharing is a feature of the product). So, unless you want to do a lot of customizing (which increases project risk), we recommend you keep everything in one contribution site – again, that is unless there is no sharing/syndication going on. In which case localizing/translating content would only ever need to be done on a per site basis, and never globally.

Note: the downside of a single contribution site in that there is no OOTB access control model on the SitePlan – if a contributor has access then they can edit and potentially cause havoc.

Integration with Translation Systems

To be discussed in a future blog post. Stay tuned…

Caching/uncaching issues

If you publish all translations at once, then uncaching is easy: any change to any existing translation will cause those pagelets dependent on that asset to expire at publish time. All is good with the world. But if you publish a global version of a language, then later create localized versions of that asset, the system will not know to update things since no dependencies were recorded as the assets didn’t yet exist at the time of the last rendering of the pagelet. A simple example: let’s say you have a list of the latest press releases on a French page. And let’s also assume that all of the assets being rendered are fr_GLOBAL. We can even further assume that the links are explicitly defined as named associations, thus all the dependencies are known at the time of rendering/caching. However, when someone translates one of those assets and publishes it, the expected and desired behavior is that the link would now be to the localized translation, not to the global version. But because there is no way to log a dependency to an asset that didn’t exist when that pagelet was last rendered, there is no way for the system to “know” to update that one link. And thus nothing uncaches when you publish the new translation. The simple solution that some clients have implemented is to update the master asset whenever a new child is added to a dimensionset. To pull that off, one must implement a custom publish listener that performs these tasks in the background whenever a new asset is published that might affect an existing dimensionset (note that it makes no sense to update the master asset when a new translation is *created* since it might take days or weeks before it gets approved and published).

Also be aware that any automatic approval mechanism must also deal with the situation where the master asset may be checked out or not yet approved. Nothing is ever simple!

Webroots

Since there will be multiple county sites as well as global assets with multiple URLs that span multiple countries, we need a clean way to build links to the right URL. In Template code links are built by the system using the <render:gettemplateurl> tag. It optionally takes a parameter of hostparam=”xxx” where the value of the hostparam should match the hostname attribute of WebRoot. As such we will want to create webroots for each country to enable our Template code to be able to build links to the proper webroot for the same global asset. Note that you are free to design your webroot anyway that makes sense for your client, including subdomains, subpaths, etc. But once designed, you will need to stick with the design.

URLs for global assets

So it should be clear that once you introduce the notion of global assets + fallback that a URL may be somewhat ambiguous as to which asset it is supposed to represent. Let’s take the following example:

  • · www.mysite.com/fr/helloworld

is that a fr_GLOBAL asset? or a fr_FR asset? We cannot tell by examining the URL. We can make the following assumption: if a global asset is “syndicated” to many countries, then it will likely need many URLs (which is fully supported by WCS 11.1.1.8). As an example, a fr_GLOBAL asset should be available to all French-speaking countries. For a combination European+North American website such an asset would have to support four country “sites”:

  • · www.mysite.com/fr/helloworld (Clearly France, must be French)
  • · www.mysite.com/be/helloworld (Clearly Belgium, but is it French or Dutch?)
  • · www.mysite.com/ca/helloworld (Clearly Canada, but is it French or English?)
  • · www.mysite.com/ch/helloworld (Clearly Switzerland, but is it French or German or Italian?)

But there is a problem: all of those countries support more than one language! (e.g. Canada has both English and French)

There are several choices available to your client: either add the language to the URL to make it distinct something like the following (note that there are endless variations of this that you can implement, so you are not limited to the examples given):

  • · www.mysite.com/fr_FR/Hello_World (unambiguously French French)
  • · www.mysite.com/fr_BE/Hello_World (unambiguously Belgian French)
  • · www.mysite.com/fr_CA/Hello_World (unambiguously Canadian French)
  • · www.mysite.com/fr_CH/Hello_World (unambiguously Swiss French)

or localize the “content” of the URL itself. For example, rather than use “Hello_World” (which is clearly English) we could use “Bonjour_Tout_le_Monde” to represent the French translation which removes any ambiguity with English or Dutch or German or Italian variants.

  • · www.mysite.com/fr/Bonjour_Tout_le_Monde (unambiguously French French)
  • · www.mysite.com/be/Bonjour_Tout_le_Monde (unambiguously Belgian French)
  • · www.mysite.com/ca/Bonjour_Tout_le_Monde (unambiguously Canadian French)
  • · www.mysite.com/ch/Bonjour_Tout_le_Monde (unambiguously Swiss French)

Of course, translating all URLs into SEO-friendly language is much more work, thus it may come as no surprise that many clients avoid doing that (in spite of lowered SEO rankings thereof). As such, the choice of URL style will be highly client-specific and you must dig deep to discover what they really want and what they are willing to support. Further, clients also want 301 support for “guessed at” URLs (e.g. /fr/Hello_World should redirect to /fr/Bonjour_Tout_le_Monde). Fortunately, WCS allows editors to manage both 200, 301, and 302 responses on a per-asset basis. From experience, it can be stated that most clients prefer “clean” URLS and as such tend to not want the ISO locale in the URL. Instead most clients prefer only the country (or in some edge cases they may only want the language, which is typical when a site only serves a single country). Note that an additional benefit of language-translated URLs is higher search engine ranking since the content of the URL has a better affinity to the content of the webpage body.

It should be obvious that we never present to visitors a URL that suggests a “global” language, rather each URL always is made to look as if it were localized (even if it is in fact global). So while the above URLs all fetch the same global asset, a visitor wouldn’t know it. They would think they were on the France, Belgian, Swiss, or Canadian “site” (recall that you will be passing &country=xx down to your rendering Templates and as such you can also use this value to drive country-specific behaviors including CSS, Siteplan, or whatever). As such, the concept of a country-specific visitor site is pure smoke and mirrors when it comes to WCS — it can be anything you want. After all, there is only a single WCS serving everything anyway. So all such webroots and subdomains are pure artifice and can be leveraged and managed by the clever developer in any way that makes sense for the particular project.

N.B.: it is generally considered bad behavior to change a visitor’s “context” (i.e being ripped away from the country site and ending up on the “global” or “US” site. The whole notion of syndication is to permit visitors to remain “in context”.

Yet another customization (or business methodology) needs to be enforced: When a global asset is translated into a localized version of French for example, the new child may want to make use of the old URL (to maintain good SEO rankings). Or alternately it may want to convert the old URL into a 301 and use a slightly different URL for the new translation. Thus, creating a translation of an existing global asset that manages multiple URLs across multiple countries typically require multiple steps: For example, if it is desired to reuse the old URL, then copy the appropriate URL from the global to the new localized translation, then remove the old URL from the global asset. Likewise, if it is desired to convert the old URL into a 301 and use a different URL for the new translation, then after saving the new translation, one must then go to the global version of the asset and convert the old URL from 200 response to a 301 and then save the global asset. In both cases, one must then approve/publish the two together. The good news is that when published together the dependency will be with the global asset and will thus cause pagelets to expire, thereby rendering proper links to the new asset.

Another interesting discussion is around SEO and preventing syndicated content being seen as duplicate content. i.e. good usage of href lang where all HTML generation is controlled by templates is totally possible for developers to add.

Managing all your translations

While WCS gives you the ability to create localized translations of any content, the product doesn’t provide an overall “view” of all the translated assets, their masters, and their URLs. It is strongly suggested that WCS developers create a custom report and make it available in the editorial interface that shows a searchable/constrainable list of URLs and their relationship to localized translations, global translations, masters, sites, 200 vs 301 responses, etc. Additionally, it is suggested that the contribution interface will likely need additional customizations to help editors to “see” the forest for the trees, as it were. In particular, a much needed customization should likely be made to the left nav content tree, where all content (including translations) is shown. Assuming a large number of master assets plus an even larger number of translations of those assets, it is obvious that editors will be overwhelmed by the OOTB interface shown in the default left nav. Our recommendation here is to create a customized navigator that renders only “master” assets, hiding translations, but perhaps allowing an editor to inspect translations in the navigator by having an expand/collapse button on the master, which would allow an editor to “expand” a master asset to see what translations exist as children underneath. (This should be an enhancement in the product IMHO). Note that even without implementing expand/collapse in the navigator, the list of all translations of the master is rendered in the inspect screen of the master asset itself. Thus, “hiding” translations in the left nav for large sites is often recommended for many clients.

Summary

  • Use a single contributor website
  • Create “global” locales for syndication
  • Specify different fallback dimensionsets, one for each country, leveraging the global locales
  • Add an HideFromTheseCountries attribute + attribute editor + flex filter to your definitions to hide global assets from showing up in specified countries
  • Add an SearchableLocale attribute + flex filter to denormalize the searchable locales for the specified locale (e.g. set SearchableLocale=fr_FR when an asset is specifed as fr_GLOBAL)
  • Add an TranslateThisIntoTheseLanguages attribute + asset listener to automatically create N translations of a master asset and put these translations into a TranslationWorkflow
  • Create a TranslationWorkflow step action that exports a subset of the attributes (i.e. only the translatable ones) to XML

Issues

The above discussion is by no means “perfect” (nor complete for that matter) and the solution will not be appropriate for all clients. The above pattern has issues that may or may not need to be resolved for your project. One glaring example is that assuming there is only a single contributor website, there would be no OOTB language-specific view of the content e.g. there is no “French View”. This is something to consider. That being said, there would likely be an “FR” navigational hierarchy (for many clients, each country’s navigation is typically slightly different from every other country and that is allowed in this design).

Additionally, an editor can create saved searches using the denormalized SearchableLocale attribute as a constraint. And finally, a contributor is always free to preview the language-specific site (e.g. /fr/*) via the contributor interface.

Other Topics

One additional area that hasn’t been mentioned is how to approach localizing text strings and values that are needed in templates. For example the text “Expand All” or the word “Search” on a button. This will be discussed in a later blog.

Thanks

I want to thank Mark Fincham for his valuable and generous comments and review of this post.

 

Add Your Comment