X

Best Practices from Oracle Development's A‑Team

Coherence Federated Caching Overview

Introduction

Coherence clusters rely on the ability to enlist multiple servers connected by a reasonably reliable network. This allows strong cache consistency guarantees — in the absence of a catastrophic failure, one can reason about state in a Coherence cluster in a similar manner to how one reasons about state in a local memory space. However, a cluster is a poor abstraction for systems that must rely on unreliable networks, which most commonly are systems that span wide geographic areas (multi-site systems). Over the years, the software required to use Coherence for multi-site operations has evolved, initially via custom software, later via the Coherence Incubator, and as of Coherence 12.2.1, as part of the core product.

From a developer’s perspective, Coherence federated caching is supported by a new cache service. The <federated-scheme> configuration element is a sibling to the existing <distributed-scheme> element that defines a distributed/partitioned cache. The federated scheme is conceptually similar to a partitioned cache within a cluster, with added functionality to push changes to caches that are in other (remote) clusters. Conversely, a federated cache that is deployed to a single cluster is generally indistinguishable from a partitioned cache.

Federated caches provide a means of ensuring that changes to one cache are reflected in a different cache (a federation “participant”). This is most commonly used for synchronizing cache content between multiple geographic locations. However, the event model allows considerably more flexibility, and can be used to federate data within a single cluster, e.g. copying the contents of a transaction-oriented cache to a separate reporting cache.

To avoid ambiguity, the act of pushing data update events between caches is referred to as “federation” (rather than “replication”, a term which was already in use elsewhere within Coherence). The federation model is based on “participants” notifiying other participants of changes. The term "site" will sometimes be used to refer to a participant as most installations have a single participant at each site. Because it is a push-based model, there is no out-of-the-box functionality to provide an aggregate view (union) of caches spread across multiple sites.

 

Federation Concepts

Coherence federated caches are responsible for intercepting and forwarding cache update events (insert, update, delete), but do not manage the actual state of the cache. Put another way, it is the responsibility of the developer/operator to ensure that the cache is in an appropriate state before federation begins. As one example, if federation to a remote site fails, Coherence does not have any way to determine whether the remote site was just temporarily disconnected (presumably containing stale state) or if it was restarted (presumably in a newly initialized state). Access to a disconnected site must also be managed externally (Coherence itself does not disallow access to a disconnected site). Also, Coherence intentionally has no global state describing the state of federation — each site manages its own view of how it is forwarding updates to the other site(s).

Access to a site can be controlled via standard request fencing practices (e.g. a global load balancer), and management of federation can be managed manually via JMX. The behavior on initial connection is configurable.

Federation is brute-force. For active-passive, the active site will send its entire journal content to the passive side when there is a reconnection. For active-active, both sides will send their entire journals. There is no checksum algorithm (or equivalent) used to minimize network traffic. All federation is via the journal — there is no separate concept of an initial snapshot publication. It is a side-effect of the initial snapshot being placed into the journal as the cache is populated. The operator may always perform a manual sync operation.

 

Federation Details

Similar to a partitioned cache, a federated cache consists of a set of partitions within each cluster. Generally, a federated cache running within a single cluster has functionality identical to a partitioned cache. Each partition has a logical outbound update queue. This queue is fault-tolerant and can survive rebalancing and failover/failback operations. There is never any coalescing of queue entries. So the sequence of inserting an object, then deleting it, then inserting it again from a different client thread will always result in exactly three journal entries (insert, delete, insert) and never in a coalesced entry (a single insert). This has some overhead for simple read-only caches (which only need the latest value) but is generally not a significant concern.

All of these outbound queues are stored using the Elastic Data feature of Coherence. This feature uses two per-JVM storage buffers — a RamJournal in on-heap RAM and a FlashJournal stored on disk (which may be local or shared, and is expected — but not required — to have performance similar to solid state disk). It is important to remember that Elastic Data is a single resource shared by all services running on the cluster member.

By default, the logical journal size (from the perspective of the federated service) is unlimited until it receives a write failure form the operating system. This should be configured with a size limit for production use. The outbound queue tracks which remote participants have already received each update and will purge an entry only after all sites have acknowledged receipt. The current implementation relies on a periodic garbage collection of the queue. The amount of memory used by Federated Cache is controlled by the RAM journal configuration (shared by federated cache and other services), and defaults (at present) to 25% of the maximum heap size. This can be configured by the command line variable -Dcoherence.ramjournal.size or via operational xml: <cluster-config>/<journaling-config>/<ramjournal-manager>/<maximum-size>. Note that there may be some short-term over-allocation (for up to a few minutes). This is most likely to be encountered during initial snapshot publication.

All updates are partition-ordered. There is a partition-scoped sequence (basically an AtomicLong) that is updated by any partition-level transaction at time of commit. Since this is a total ordering, this implies that non-conflicting updates within a single partition (whether against different keys or different caches) will have strictly ordered sequence numbers even in the case of concurrent transactions. This sequence is preserved during failover and rebalancing. The journal records are a fault-tolerant structure and travel alongside the corresponding cache partition.

In previous versions, if an update failed to be applied at a remote site (e.g. a write-through CacheStore failure resulting in an exception), it would shut down the federation service on that receiving node. This was generally not an issue as most applications can be designed to never throw an exception barring a catastrophic failure. In 12.2.1.2, this has been changed to log a warning instead.

 

Inter-Cluster Connectivity

There is no partition-affinity for inter-cluster connections (most updates will travel from the primary owner on one cluster to another node on the remote cluster, and then finally to the primary owner on the remote cluster). Connections operate in a similar manner to extend proxies (despite being a different code base within Coherence). Default connectivity is via the name service and uses connection-time load balancing. The developer may provide a custom load balancer.

Like other services, for intra-cluster (non-federated) communication, the federated service will either use the cluster network or can optionally configure its own service bus (using the <reliable-transport> element). The inter-cluster message bus connections are separate from those used for intra-cluster communication, and can be configured separately — they can only be either tmb or tmbs, and may be SSL regardless of whether intra-cluster communication is SSL. SSL requires additional configuration (certificates). The socket-provider element in federated-scheme is used only for inter-cluster communication. Note that the recommended practice is generally to not use per-service reliable-transport schemes, and to rather rely on the global TransportService instead.

 

Federation Topologies

Topologies are static. If a participant is defined prior to actually being available for federation, it should be disabled to prevent filling the queue.

Active-passive supports multiple active nodes, and may be used to support active-active and hub-and-spoke. Active-passive is particularly attractive for hub-and-spoke designs that require the ability to change the hub due to availability (site loss) concerns. The passive hub may be offline or simply fenced so that it is not actively receiving (and thus publishing) updates. Active-passive with only active participants is equivalent to active-active. Active-passive with a single active participant is equivalent to hub-and-spoke. While the other forms are all variations of active-passive, the central-federation topology is a bit unique as it provides bi-directional updates via 1 central cluster and 0..N remote clusters.

The custom topology is composed of senders, repeaters and receivers. There must be transitive closure over the groups (groups connected by a common participant) to allow proper federation across all participants. This is not a hard requirement, however, if global federation is not desired, and obviously there are failure modes that could produce a lack of closure.

With active-passive replication, the application may make updates at the passive site but those changes won’t be pushed to other participants.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Recent Content