Disaster Recovery with Oracle Kubernetes Engine

Overview of Scenarios This document describes how applications that are built on Oracle Kubernetes Engine (OKE) can continue operating even if an entire geographic region of Oracle Cloud Infrastructure (OCI) is lost. A basic knowledge of OCI is assumed. Each scenario builds upon the previous ones, describing the incremental design differences that arise in more […]

Coherence Federated Caching Overview

Introduction Coherence clusters rely on the ability to enlist multiple servers connected by a reasonably reliable network. This allows strong cache consistency guarantees — in the absence of a catastrophic failure, one can reason about state in a Coherence cluster in a similar manner to how one reasons about state in a local memory space. […]

Basic JVM Garbage Collection Tuning for Coherence

Because Coherence cache servers are built on top of Java, they must deal with the heap size limitations that arise from garbage collection (GC). This is often at odds with the intended use case for Coherence, which is to manage large amounts of data in memory. In order to fully utilize the available memory on […]

Using XA Transactions in Coherence-based Applications

While the costs of XA transactions are well known (e.g. increased data contention, higher latency, significant disk I/O for logging, availability challenges, etc.), in many cases they are the most attractive option for coordinating logical transactions across multiple resources.

 

There are a few common approaches when integrating Coherence into applications via the use of an application server’s transaction manager:

  1. Use of Coherence as a read-only cache, applying transactions to the underlying database (or any system of record) instead of the cache.
  2. Use of TransactionMap interface via the included resource adapter.
  3. Use of the new ACID transaction framework, introduced in Coherence 3.6.

Each of these may have significant drawbacks for certain workloads.

 

Using WKA in Large Coherence Clusters (Disabling Multicast)

Disabling hardware multicast (by configuring well-known addresses aka WKA) will place significant stress on the network. For messages that must be sent to multiple servers, rather than having a server send a single packet to the switch and having the switch broadcast that packet to the rest of the cluster, the server must send a packet to each of the other servers.

 

Using the Coherence ConcurrentMap Interface (Locking API)

For many developers using Coherence, the first place they look for concurrency control is the com.tangosol.util.ConcurrentMap interface (part of the NamedCache interface).

 

The ConcurrentMap interface includes methods for explicitly locking data. Despite the obvious appeal of a lock-based API, these methods should generally be avoided for a variety of reasons.

 

Using Queries with Coherence Write-Behind Caches

Applications that use write-behind caching and wish to query the logical entity set have the option of querying the NamedCache itself or querying the database.

 

Using Queries with Coherence Read-Through Caches

Applications that rely on partial caches of databases, and use read-through to maintain those caches, have some trade-offs if queries are required. Coherence does not support push-down queries, so queries will apply only to data that currently exists in the cache.

Using Bulk Operations with Coherence Off-Heap Storage

Some NamedCache methods (including clear(), entrySet(Filter), aggregate(Filter, …), invoke(Filter, …)) may generate large intermediate results. The size of these intermediate results may result in out-of-memory exceptions on cache servers, and in some cases on cache clients. This may be particularly problematic if out-of-memory exceptions occur on more than one server (since these operations may be cluster-wide) or if these exceptions cause additional memory use on the surviving servers as they take over partitions from the failed servers.

Coherence Data Guarantees for Data Reads – Basic Terminology

When integrating Coherence into applications, each application has its own set of requirements with respect to data integrity guarantees. Developers often describe these requirements using expressions like “avoiding dirty reads” or “making sure that updates are transactional”, but we often find that even in a small group of people, there may be a wide range of opinions as to what these terms mean. This may simply be due to a lack of familiarity, but given that Coherence sits at an intersection of several (mostly) unrelated fields, it may be a matter of conflicting vocabularies (e.g. “consistency” is similar but different in transaction processing versus multi-threaded programming).

Coherence Query Performance in Large Clusters

Large clusters (measured in terms of the number of storage-enabled members participating in the largest cache services) may introduce challenges when issuing queries. There is no particular cluster size threshold for this, rather a gradually increasing tendency for issues to arise.

 

Implementing a Custom Coherence PartitionAssignmentStrategy

A recent A-Team engagement required the development of a custom PartitionAssignmentStrategy (PAS). By way of background, a PAS is an implementation of a Java interface that controls how a Coherence partitioned cache service assigns partitions (primary and backup copies) across the available set of storage-enabled members. While seemingly straightforward, this is actually a very difficult problem to solve.

Impact of Server Failure on Coherence Request Processing

Requests against a given cache server may be temporarily blocked for several seconds following the failure of other cluster members. This may cause issues for applications that can not tolerate multi-second response times even during failover processing. In general, Coherence is designed around the principle that failures in one member should not affect the rest of the cluster if at all possible. However, it’s obvious that if that failed member was managing a piece of state that another member depends on, the second member will need to wait until a new member assumes responsibility for managing that state.

Dealing with Fine-Grained Cache Entries in Coherence

On occasion we have seen significant memory overhead when using very small cache entries. Consider the case where there is a small key (say a synthetic key stored in a long) and a small value (perhaps a number or short string). With most backing maps, each cache entry will require an instance of Map.Entry, and in the case of a LocalCache backing map (used for expiry and eviction), there is additional metadata stored (such as last access time). Given the size of this data (usually a few dozen bytes) and the granularity of Java memory allocation (often a minimum of 32 bytes per object, depending on the specific JVM implementation), it is easily possible to end up with the case where the cache entry appears to be a couple dozen bytes but ends up occupying several hundred bytes of actual heap, resulting in anywhere from a 5x to 10x increase in stated memory requirements. In most cases, this increase applies to only a few small NamedCaches, and is inconsequential — but in some cases it might apply to one or more very large NamedCaches, in which case it may dominate memory sizing calculations.

CPU Usage in Very Large Coherence Clusters

When sizing Coherence installations, one of the complicating factors is that these installations (by their very nature) tend to be application-specific, with some being large, memory-intensive caches, with others acting as I/O-intensive transaction-processing platforms, and still others performing CPU-intensive calculations across the data grid.

But for most reasonably-designed applications, a linear resource model will be reasonably accurate for most levels of scale. However, at extreme scale, sizing becomes a bit more complicated as certain cluster management operations — while very infrequent — become increasingly critical.