To download this paper click on: Double Buffered Page Caching with inCache
Page Caching is a very important in WebCenter Sites for achieving high performance. WebCenter Sites delivers dynamic pages. When a visitor requests a web page WebCenter Sites executes a JSP/XML/java template/element to generate the HTML and streams the HTML back to the browser. In addition to streaming the HTML back to browser, WebCenter Sites also saves the HTML, along with all cache criteria parameters, in the ‘Page Cache’. Next time the same WebPage with the same parameters is requested, WebCenter Sites does not have to execute the template/element again, but just sends the HTML from its cache.
It is not necessary to cache the full web page. In fact, caching the full web page is rarely used. In reality, each web page is broken into a number of pagelets – some of which may be cached, and others that might not be cached. Each pagelet is cached independently, along with all the parameters that were passed in the cache criteria. Thus, if you have a header that appears on every page, the header pagelet can be cached once, and reused on all the pages.
The strategies for designing a page for proper caching and the optimum way to break a page into pagelets are very important and discussed in WebCenter Sites documentation and A-Team Chronicles. I will not be discussing them here.
In this paper, I am going to concentrate on some of the less commonly used features of inCache to achieve double buffered caching. There are significant differences in doubled buffered caching with WebCenter Sites traditional caching and with inCache. If these differences are not properly understood they can lead to unexpected results.
Using inCache for page caching is enabled by default when WebCenter Sites 11gR1 is installed or upgraded. inCache for page caching overrides the legacy method of page caching. Two configuration files, cs-cache.xml and ss-cache.xml, are provided with WebCenter Sites for configuring local caches and peer-to-peer communication.
inCache affects RealTime publishing by deactivating the donoteregenerate flag. To enable page regeneration URLs must be specified in the FW_RegenCriteria table. Page propagation is also an option, used to ensure that all nodes host the same pages without each node having to regenerate the same page. In addition, Remote Satellite Server can be configured to continue to ‘serve stale pages’ for a short duration while their replacements are being regenerated.
WebCenter Sites implement a double buffered caching strategy, which uses the WebCenter Sites and Remote Satellite Server caches in tandem on live web site. This double buffered caching strategy ensures that pages are always kept in cache, either on WebCenter Sites or Remote Satellite Server.
When Remote Satellite Server receives a request for a page/pagelet that is not in its cache, it sends the request to WebCenter Sites. If the page/pagelet is not in WebCenter Sites cache also, WebCenter Sites generates the page/pagelet, and saves it in its Page Cache. Then it sends the page/pagelet to the requesting Remote Satellite Server. Remote Satellite Server streams the page to the client browser, and also saves it in its cache. This mechanism of first caching in WebCenter Sites and then in Remote Satellite Server is called ‘Doubled Buffered Caching’.
The main advantages of Double Buffered Caching are:
1) WebCenter Sites cache needs to be generated only once irrespective of number of Remote Satellite Server. Typically, a site has a number of Remote Satellite Server. The first time any one Remote Satellite Server requests a page, WebCenter Sites generates and caches the page, and also sends it to the requesting Remote Satellite Server. Now, if another Remote Satellite Server requests the same page, WebCenter Sites does not have to generate the page again. It directly sends the page from its cache to the second Remote Satellite Server.
‘Page Propagation’ can be used to ensure that page generated on one WebCenter Sites is propagated to all WebCenter Sites nodes.
2) Typically, Remote Satellite Servers are smaller systems, having less memory and disk space. The storage in Remote Satellite Server may not be enough to store all the cached pages. In this case, Remote Satellite Server can delete some of the cached pages, to make space for new pages. But, the cached page continues to live in the WebCenter Sites cache, and Remote Satellite Server can request the page from WebCenter Sites cache.
3) The real benefit of doubled buffered caching comes during publishing. During publishing, the cache for any page that is modified is first flushed from the WebCenter Sites cache, and if ‘ServeStalePages’ is set then the page is not removed from Remote Satellite Server. The page continues to live in Remote Satellite Server, till the page is generated in WebCenter Sites.
It is possible to configure the system such that a user request always finds a page in the cache. The cache is expired when the corresponding assets are published. The cache is regenerated either by crawling, or by the first hit to the page. The Remote Satellite Server can be configured to continue to ‘server stale pages’ while the cache is regenerated. There are two main features of WebCenter Sites that are used here to control this.
WebCenter Sites inCache uses crawling to regenerate pages. This requires populating the FW_RegenCriteria table on the delivery system, with the URLs of pages to be crawled and regenerated.
Crawling is a computationally expensive option. If crawling is not used, pages will be regenerated only when they are requested. Typically, the crawling requires specifying a set of URLs of the home page and other high traffic pages. In addition, you can specify the depth to which those pages will be crawled.
A depth of 1 means that the specified pages and pages they link to will be crawled, while a depth of 0 means that only the specified pages will be crawled. Crawled pages are regenerated only if their component assets have been invalidated during the publishing session, or the pages are not cached. All pagelets on the specified pages are regenerated in the process. The list of URLs to crawl and the crawl depth must be specified in the FW_RegenCriteria table.
Remote Satellite Server can be configured to serve old page/pagelets while they are being regenerated by a background process. To enable serving of invalidated pagelets, add serveStale=true to Remote Satellite Server’s satellite.properties file. If a pagelet is then invalidated, it will be regenerated in one of the following ways:
Although the WebCenter Sites documentation does not explicitely specify, the 30 minute is not from the time of publishing, but it is from the last access to the page/pagelet. This throws up some interesting scenarios that every client using ‘Server Stale Pages’ must be aware of:
1) Frequently Visited Pages
For frequently visited pages, the Remote Satellite Server will be getting request for the page/pagelet very often. When Remote Satellite Server receives the first request for the page/pagelet after publishing has invalidated the page, it will send a request to WebCenter Sites to generate the page/pagelet. Till the page/pagelet is generated, it will continue to serve the old page/pagelet. In case there is any error in the template/jsp to generate the page/pagelet that prevents the new page/pagelet from being generated, and there is constant access to the pagelet it will continue to be served from stale cache potentially indefinitely. If the stale page exists but fails to be created even after 30 minutes since last access time, there are other issues at stake. It is possible that a coding error is preventing the page/pagelet from being regenerated. This feature tends to hide these errors – errors are best surfaced as early as possible in most cases.
On the one hand we want the popular page/pagelets to show up without disrupting the delivery (and hope someone notices the log messages and fixes code) - so letting these errors hide for a while is a good thing. On the other hand we do not want to these errors to hide behind cache for ever – when that happens, after a restart the site may stop working altogether.
2) Infrequently Visited Pages
If Remote Satelite Server receives the first access to the page/pagelet after publishing has invalidated the page, it will send a request to WebCenter Sites to generate the page/pagelet. If this page has not been accessed for last 30 minutes, Remote Satellite Server does not serve the ‘Stale’ page from its cache (even if ‘ServerStalePage’ is set) and waits for WebCenter Sites to regenerate the page/pagelet. WebCenter Sites generates the page/pagelet, caches the newly generated page/pagelet and sends the page/pagelet to Remote Satellite Server. Remote Satellite Server renders the page/pagelet and also stores it in its own cache.
Server Stale Pages allows us to serve an old page from Remote Satellite Server while WebCenter Sites generates the new page/pagelet. While a delay of a few seconds or even a few minutes may be acceptable to continue to render old page, a longer delay may not be acceptable for some clients.
WebCenter Sites generates the new page in background through crawling. One can use these two technologies together to make sure that after a publish session, all the important pages are regenerated in background.
This way one can take advantage of both Page Regeneration and also make sure that frequently accessed pages/pagelets are always rendered from inCache.
As I mentioned earlier, 30 minute time limit to ‘Server Stale Page’ is not from the last publish session, but from the last access to the page/pagelet. For frequently hit pages and for pages that are re-generated in the background through crawling, this limit may not be of much use (unless an error prevents the page from being re-generated). But, for infrequently hit pages that are cached in WebCenter Sites and Remote Satellite Server, it is possible to get a ‘Stale’ page 30 minutes after publishing. This limit is configurable. You can set expireInactiveStalePagesAfter=<minutes> in satellite.properties to lower the limit. However, you should exercise caution in lowering the limit. All the important pages should be regenerated by crawling, and there should not be any need to lower the limit.
Doubled Buffered Caching is an important mechanism in WebCenter Sites, and ensures that a visitor almost always finds the page/pagelet in the cache. There are important differences in WebCenter Sites classical cache mode and inCache in WebCenter Sites 11gR1. WebCenter Sites 11gR1 now uses page regeneration in back ground using crawling and ‘serve stale pages’ from Remote Satellite Server to achieve double buffered caching. One should carefully examine these to implement a proper page caching scheme for their web site.