Oracle Commerce customers often ask, “How should we warm our repository item cache for better page serving performance?” This article will address that question as well as some of the decisions that you’ll need to make along the way.
First, let’s talk about what repository item cache warming is.
Generally speaking, warming is when we want to preemptively fetch repository items from the database in anticipation of future use. In effect, we’re loading them into cache so that we needn’t fetch them while we’re servicing a consumer’s page request. We do this so that we can provide better page response time to the consumer, and so that we can mitigate database traffic for what are mostly read-only items. The prime example of this is warming of product data in the Product Catalog repository.
Warming is generally desirable at two times, when starting an instance and after a deployment from the BCC, when caches need to be invalidated. In this article, we’ll talk about with the former, which is often referred to as preloading.
When we warm a cache like this, we defer servicing consumer requests until after the cache has been warmed. Generally speaking, we would warm the cache when the page serving instance is being started, as a side effect of starting the associated repository component.
A well-tuned and warmed repository cache is essential for achieving optimal page load times for your consumers. It contributes to both good page response time as well as server scalability. But the techniques for its use need to be applied wisely. The steps involved are:
We’ll talk about each of these at length.
Determine whether warming is really necessary
Most of the highly referenced repositories, like the product catalog, warm very naturally. The latency penalty for the first retrieval of a category, product, or SKU is often not materially relevant — often in the 8-12ms range. So before you preemptively attempt to warm the products into cache, first determine whether doing so is actually necessary.
You can determine this through load testing, validating average page load time during ramp up on a cold cache versus the times achieved after the load test has reached a steady state. Or you can simply perform a trial on a running production system by flushing the repository cache on a single page serving instance and monitoring the inflection in average page load time.
At one large retailer that I have worked with, they were delaying startup by 15 minutes while warming their product cache. When we tested the actual latency incurred by page requests when not warming the cache, we found that there was about 100ms latency on first reference only; the cache naturally warmed in 3-4 seconds for the entire portion of the active catalog. From this, we determined that the natural warming of the cache was sufficient to their needs. This gave them an additional 15 minutes of site availability each time that they restarted their production commerce cluster.
Fix the root cause of latency first
If you find that warming is desirable because the latency incurred by natural warming is excessive, first try to determine why that is the case. Is the root cause slow database performance? Network latency?
Make sure that you’re correcting the root cause of the latency. If your database isn’t performing well or is incurring latching/locking, then warming a large number of page serving instances concurrently, such as at cluster startup, can overwhelm it. Before implementing cache warming, analyze AWR reports and make sure that the database isn’t the root cause of the page serving latency. The same can be true for network induced latency as well, such as you might see by putting a PCI compliant (layer 4) firewall between the application servers and the database manager.
Don’t mitigate initial latency by warming your instance offline if you can fix the root cause instead.
Sometimes the reason that initial page loading is slow isn’t because of repository item availability, but rather artifacts in the pages themselves. So improving item availability is only masking the problem, removing one contributing element, rather than resolving the problem. This often involves page composition. A simple way to mitigate this issue is to use the droplet cache to build pre-composed elements or fragments of the page. Then, whenever that element is needed in a page, it can be drawn directly out of droplet cache rather than re-rendering it. An example of this can be found in Appendix B of the Oracle Commerce Page Developer’s Guide, starting on page 231 in the version 11.1 documentation set.
When caching elements of a page containing product information, you can use the product id as the key to the cached item. By doing this, you can extend the deployment event listener to invalidate or recreate these fragments if a change to the associated product information is deployed.
Similar techniques may be used for other custom caches.
Limit warming to your active items
At this point, you’ve determined that you do need to warm your caches to attain adequate page response times, and you’ve determined that the latency incurred obtaining the data from the database is not an issue. The next thing to be determined, then, is how much of the data is needed when you start your page serving instance.
Let’s look at another example. At another retailer that I worked with, they had in excess of 1 million products in their catalog. But analysis determined that less than 500 of those products yielded over 90% of their site’s traffic and revenue. Restricting warming to only those active, high yield items improved availability while not incurring a significant delay in start-up.
This is not uncommon. It’s particularly true of “long tail” retailers that have large numbers of products that are available in low quantity and that sell infrequently.
So how do you go about this?
Most customers have an external system where they track product activity. It could be your order management system, where you track sales and returns. Or it could be the use of consumer tracking tags, which keep track of which products consumers are looking at. Or any number of analytics solutions.
However and wherever you track consumer interest in your products, you need to reduce that information into a list of your top performing products. Those products that draw the most traffic to your site. Now create a custom boolean property in the product item of your product catalog repository. Let’s call it ‘highInterest’. Periodically, run a feed into your content administration system (BCC) that sets this property to true for those products, false for all others.
Now when you start the product catalog repository, you can include an RQL query in the customCatalog.xml file to cause these products to be fetched into item cache when the repository instance is started.
<query-items item-descriptor=”product” quiet=”true”>
This will cause those product items to be warmed in the cache without your having to wait while all of the items of lesser interest are loaded.
For more information about loading queried items during repository startup, see the section titled ‘SQL Repository Caching’ in the Oracle Commerce Repository Guide, starting on page 130 for version 11.1.
It’s important that you separate consumer traffic from traffic generated by bots, such as Google or Bing search indexing. Don’t allow bot traffic to influence your analysis of consumer interest. Exclude the traffic tagging from these pages. And, by the way, having that traffic directed to a page serving instance dedicated to that purpose has the added benefit that that traffic doesn’t consume resources that would be better put to use servicing your consumers. You needn’t bother warming that bot instance’s cache either; bots don’t care much about page response times.
Reduce the size of the items being warmed, if appropriate
Another way to decrease the amount of time spent warming repository cache is to decrease the size of the objects being loaded.
If some of the items being warmed have properties containing large values, or multi-valued properties that are infrequently referenced, use lazy loading of those properties. This will reduce the amount of data being returned from the database and unmarshalled into repository cache, making warming quicker and reducing the amount of heap memory consumed by the cache. Pay particular attention to BLOB and CLOB properties.
By using lazy loading, retrieval of the values of these properties will be deferred until a getPropertyValue() call is made against an instance of the associated item. You can even group these properties, so that a getPropertyValue() call for any one of them will retrieve all of the properties in the group at the same time.
For more information about lazy loading item properties, see the section titled ‘Enabling Lazy Loading’ in the Oracle Commerce Repository Guide, starting on page 131 for version 11.1.
Note that lazy loading of those properties will be used any time that the item instance is loaded from the database, not just during repository startup. So the positive effective goes beyond just during warming.
Avoid warming items that result in excessive startup delay
With Commerce repositories, you can create user-defined properties that are underpinned by Java code. This allows you to create custom code to populate and manage the values of those properties when the item is loaded and accessed from cache.
You need to be sensitive to the runtime characteristics of this code. It is possible to write custom code that makes web service calls to external systems. You want to avoid warming items that have user-defined properties that are underpinned by web service calls. These can have a cascading effect where the delay may compound exponentially.
Imagine the delay that you would incur if you were to have a property of product that is pulling data in real time from an external PIM or OMS. If each call took 100 milliseconds, a very reasonable latency for a call to an OMS, and you are warming 1,000 items, you would incur 100,000ms or almost two minutes of delay just loading those property values. Now imagine that you hadn’t restricted yourself to just warming the items of high interest and instead loaded all one million. That would be almost 20 minutes of delay in starting your instance!
If you do have properties of this nature and yet still need to warm those items into cache, make sure that you lazy load those properties so that they are only loaded upon property retrieval.
Test your cache management strategy under sustained load
At this point, you should have your cache management and warming strategy well in hand. You’ve done your homework, you’ve restricted what’s being loaded to that which is most relevant to your consumers, and you’ve taken care that warming is using heap effectively.
You’re done. Right?
Wrong! Now it’s time to test your caching under sustained load.
Start by appropriately sizing your repository items caches. For each item, you need to determine the approximate count of the number of items that you’ll want to warm into cache. Now set the cache size, as specified by the item-cache-size value in the repository definition file, to a value somewhat larger than the expected number of items. I generally add an additional 5% for items that occur in large numbers, 10% for those that occur in smaller numbers, to allow for unexpected growth.
Ratchet your load up gradually, stepwise, until you reach your peak load. Then sustain that for an extended period, so that you can ensure that your cache management strategy holds under extended periods of stress. This is a good time to test for memory leaks as well, particularly if you have introduced any custom caches.
You should be monitoring page views per second per page serving instance, scalability factors, and server resource utilization to ensure that you are achieving consistent page response time during all phases of the testing cycle. This should include during and immediately after a deployment from the BCC.
Monitor your heap usage using Java Flight Recorder and your cache residency and hit ratios in the Dynamo Administration Component Browser’s repository component display page. Tune the item cache size accordingly until you reach your cache residency objectives and a high hit ratio.
Consider other warming techniques
You may want to consider other warming techniques, such as selective warming when a user logs in or a campaign is launched, rather than warming the world. For example,
These warming techniques may be considered for many reasons. To improve page response time, to reduce load on back-office systems or the database, to increase server scalability, to provide a consistent as-of for information being drawn from external systems, etc.
Cache warming can be an effective technique for ensuring optimal page response times for your consumers and for improving server scalability. But like all very powerful tools, it should be used wisely. While you are warming your cache, you are not servicing your consumers. So temper your use of warming with your overall objective, that of servicing consumers and making money for your company.
And remember, as Fred Brooks once stated, “premature optimization is the root of all evil.”
In a follow-on article, I’ll share with you some tips and sample code for monitoring cache usage effectiveness in your running system. I’ll give you techniques for saving snapshots of your cache usage and hit ratios in a file or database table. Then, in a later article, we’ll discuss warming of caches after a deployment from the BCC.