Busting your caches

December 12, 2019 | 5 minute read
Dolf Dijkstra
Cloud Solutions Architect
Text Size 100%:

TL;DR

Oracle Content and Experience Cloud has an undocumented feature which enables client side caching. This feature enables a  15 day client-side cache expiration for any response of the Delivery API.

Add cb=<somevalue> to your requests to the Delivery API to greatly enhance the user experience.

Cache Buster

By default, the Delivery API sends a Cache-Control header to not cache the response on the  client and intermediate proxies, like a CDN.

Cache-Control: public, max-age=0

This means that the response will not be cached. If the client (browser) needs to download the same resource (image) again because the page is revisited, a new request is make to validate the previous response.

For instance

curl -v 'https://<host>.cec.ocp.oraclecloud.com/content/published/api/v1.1/assets/<id>/Small/circuit_01_on.jpg?format=jpg&type=responsiveimage&channelToken=<token>’ > /dev/null

returned these HTTP headers

HTTP/1.1 200 OK
Date: Mon, 09 Dec 2019 10:41:59 GMT
Content-Type: image/jpeg
Transfer-Encoding: chunked
Connection: keep-alive
Cache-Control: public,max-age=0
Content-Disposition: inline; filename="circuit_01_on_Small.jpg";filename*=UTF-8''circuit_01_on_Small.jpg
X-ORACLE-DMS-ECID: <value>
X-ORACLE-DMS-RID: 0
X-Content-Type-Options: nosniff
ETag: 85636792ae5dac6b6cffbcdcf808c487

The Cache-Control response header set to max-age=0 forces the client to validate the response for subsequent requests, to check if the resources has been changed (https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4). This leads to a round trip request/response to the Oracle Content and Experience server, even if a CDN is deployed. This round takes several milliseconds and degrades the user experience.

With a very simple modification of the request, you can instruct the Oracle Content and Experience server to send a Cache-Control header with a max-age of 15 days. This means that in 15 days after the response the browser can use this image from its caches without any validation by the server.

The addition to make to the request is to add a ‘cb’ parameter with a unique value. The name ‘cb’ stands for cache buster and is meant to force the client to request a new resource. This is somewhat counter intuitive to what we want to achieve. Without the ‘cb’ parameter, Oracle and Content Experience send a ‘non-caching’ response, with it a ‘caching’ response, but the name indicates the cached response should be busted (not used).  If the server would always send back a max-age of 15 days, even without the ‘cb’ param, the client would not know when a new piece of content was published and would continue to use the old content. Showing stale content to end-users can be very problematic. As browsers and CDNs cache content based on the unique URL of the request and the response headers, we have a way to influencing the cache behavior by changing the unique URL by changing the ‘cb’ value when new content was published. In the next paragraph I’ll show some different strategies for generating ‘cb’ values.

But first, the proof by looking at the responses:

curl -v 'https://<host>.cec.ocp.oraclecloud.com/content/published/api/v1.1/assets/<id>/Small/circuit_01_on.jpg?format=jpg&type=responsiveimage&channelToken=<token>&cb=k2xmyty6’ > /dev/null

HTTP/1.1 200 OK
Date: Mon, 09 Dec 2019 10:54:48 GM
Content-Type: image/jpeg
Transfer-Encoding: chunked
Connection: keep-alive
Cache-Control: public,max-age=1296000
Content-Disposition: inline; filename="circuit_01_on_Small.jpg";filename*=UTF-8''circuit_01_on_Small.jpg
X-ORACLE-DMS-ECID: <value>
X-ORACLE-DMS-RID: 0
X-Content-Type-Options: nosniff
ETag: 85636792ae5dac6b6cffbcdcf808c487

As you can see, the Cache-Control header has  a max-age of 1296000 seconds, that is 15 days. Please note that the ETag for both responses is the same.

With this very simple change, you can make the experience for the end-users much better. This also works with a CDN.

Now we have to come up with a value for ‘cb’. As long as the value is unique for the request: API endpoint,  asset id, rendition type, channelToken and cb value, the response will be cached against this unique value. For the browser the cache key is the full URL.

A naïve strategy would be to use a ‘cb’ value that rotates over time, for instance every day: cb=20190912. This issue with this strategy is twofold: the resource is now only cached for 1 days instead of 15 days, and if the resource is changed during that that days, the client will see the old (stale) content.

A better strategy is to us e a value that is derived from the  resource. The updatedDate field on the content item of Digital Asset holds the value of the time the asset was published. We can use this as a proxy for the version of the asset. If you use the retrieval pattern of search for asset first and use the returned list of asset or when you fetch the content item first, you can easily have access to the updatedDate of the asset.  You could use the date of the returned asset directly or hash it to a smaller string. I prefer the latter as it makes the string smaller and thus easier to understand as a hash instead of a date with specific meaning.

I have used the following JavaScript code to produce the hash.

const cb = updatedDate => {
    const time = new Date(updatedDate.value).getTime()
    return parseInt(time, 10).toString(36)
}

The steps to build a link to an image are:

  1. Get the JSON response for the asset from the /content/published/api/v1.1/items/{id} with the ‘expand’ parameter
  2. In the JSON, navigate to the field  that references the DigitalAsset
  3. Get the updatedDate field from the DigitalAsset second (at the same level as “type”: “DigitalAsset”)
  4. Run that value through the above ‘cb’ function, and store that value in memory
  5. Navigate further down the JSON to the renditions section, and get the href for the rendition you want to display.
  6. Append that href value with ‘&cb=<value from cb function>’
  7. Use the appended href value as the src attribute for the <img> (or similar tags) in the HTML.

You can use the same approach for you items (/items/<id>) and search (/items?q=) requests. There are some caveats through. When using expand on an items requests, the expanded items might have changed when the root item, where you have derived the hash from,  has not. This might lead to stale content. The request to a search might also be cached, but this can also lead to stale responses. Here you need to set on another strategy for the ‘cb’ value, as you can’t use the updatedDate of an asset. You might use the date of the last publish, but that requires access to the Management API or a WebHook integration. Another strategy is to poll the Delivery API at interval for the most recent updated asset and use the updatedDate of that asset: 

/content/published/api/v1.1/items?orderBy=updatedDate:des&limit=1&channelToken=<toke>&fields=updatedDate

Depending on the business requirements on the length of the staleness of the content and the performance requirements, you can set the length of the interval for the poll.

To conclude:

  1. Adding a ‘cb’ value to the querystring of the REST URL greatly enhances the user experience
  2. The ‘cb’ value makes the request unique and cacheable
  3. The no-brainer use is for DigitalAssets, like images.
  4. Use the updatedDate as a proxy for the version of the resource.
  5. You have to plan and implement a good strategy to retrieve the updatedDate, different access patterns require different strategies
  6. Understand and document you performance vs staleness trade-offs. 

If you want to read upon HTTP caching, Mozilla Developers Network has a nice overview.

Dolf Dijkstra

Cloud Solutions Architect


Previous Post

Simplifling Identity with Oracle Cloud Infrastucture and Oracle Identity Cloud Service

Vinay Kalra | 4 min read

Next Post


Virtual FW Deployment in OCI

Javier Ramirez | 6 min read