Caching in OSB 12c without Out-Of-Process Coherence Servers

November 9, 2014 | 11 minute read
Greg Mally
Consulting Solutions Architect
Text Size 100%:

Author: Ricardo Ferreira

Introduction

One of the most popular use cases for Oracle Service Bus (OSB) is the mediation of synchronous service interactions. In this scenario, a client invokes the service through a proxy instead of the actual service endpoint, guaranteeing that the consumer is decoupled from the producer. This type of architecture allows producers to be changed without impacting the consumers, allowing greater agility for projects with volatile requirements.

Synchronous services that return results that do not change often are good candidates to have their results cached by OSB, through a feature called Result Caching. This improves performance by reducing network overhead to access the back-end service. Result caching can help to improve scalability by reducing the load on the back-end servers that host the service. Figure 1 illustrates a client invoking a synchronous service with Result Caching enabled.

result-caching-article-figure-1

Figure 1: Client invoking a synchronous service with Result Caching enabled.

Although using the Result Caching feature may seem to always be a good idea, it is important to evaluate its side effects. When this feature is activated, all results are cached in the JVM heap. That means that heap can rapidly become full after a number of service invocations occur. This could lead to serious garbage collection (GC) issues once the JVM starts to reclaim the used space when it hits the high water-mark of 80% of the heap size. Eventually, full GC pauses will start to occur and jeopardize OSB performance.

To avoid using too much heap space with Result Caching, out-of-process Coherence servers can be set up to run in their own JVMs to hold the cached results. They are termed “out-of-process” because they execute in a JVM different from the OSB JVM. The technique here is to allocate data off the OSB JVM letting the Coherence servers use their own heap space without affecting the heap space OSB uses to process messages. This technique is also called off-heap caching. Figure 2 shows an OSB domain using out-of-process Coherence servers to hold the cached results.

result-caching-article-figure-2

Figure 2: Implementing off-heap caching through out-of-process Coherence servers.

Besides avoiding GC issues in the OSB JVM, another great advantage of using out-of-process Coherence servers is data density increase. Instead of having each OSB cluster member storing data on its own JVM heap space, data can be stored in a shared storage layer distributed across the cluster. Considering the total amount of data to be cached per service, this means a considerable reduction in the amount of memory required.

The downside of using out-of-process Coherence servers is a complex and hard to maintain OSB domain. While this technique frees OSB JVM from the GC issues, it also introduces a higher degree of complexity in the overall architecture through extra JVMs which need to be maintained and consume more hardware resources.

Hopefully, there is another way to implement off-heap caching. Internally, the Result Caching feature uses Oracle Coherence as the implementation strategy. Oracle Coherence is a sophisticated in-memory data grid product licensed along with OSB. It provides many built-in cache implementations that can be used to hold data. One of these implementations is called Elastic Data and it is used to seamlessly store data across dynamic random-access memory (DRAM) and disk-based devices. It is especially tuned to take advantage of fast disk-based devices such as solid state disks (SSD) and enables near memory speeds while storing and reading data from SSDs. Figure 3 shows how Elastic Data can simplify the OSB domain.

result-caching-article-figure-8

Figure 3: Implementing off-heap caching through Elastic Data.

This article will show how to activate the Elastic Data feature to implement off-heap caching in business services with Result Caching enabled. The article will also show examples of how to fine tune Elastic Data to allow multi-gigabyte caches to hold data with near-zero overhead in the OSB JVM.

Case Study: Customer Web Service

A large bank from the U.S is implementing a master data management (MDM) solution and part of the strategy is the creation of a 360° view of their customers. To allow access from different systems, a web service will be created with the capability to retrieve customer data based on their identifiers. This web service will be exposed through OSB - the bank’s corporate Enterprise Service Bus - to ensure seamless connectivity from systems using heterogeneous technologies and formats. Performance is paramount for the MDM strategy. There are approximately 100,000 customers and this web service needs to retrieve and return the results with short response times.

The customer information is assembled with data from different systems. A batch process running every 24 hours loads the data from those systems and aggregates them into a single master data. The master data itself is stored in an Oracle database, and the web service need to fetch data from there. The IT architecture team agrees that Result Caching should be used in that service, for the following reasons:

- Data does not change in 24 hours, so the results can be appropriately cached.

- Each web service call generates a result with a large payload of 640 KB.

The web service implementation begins with the creation of the WSDL artifact. Figure 4 shows the WSDL artifact created in Oracle Fusion Middleware JDeveloper. The Service Bus application is then created to implement the integration scenario, which consists of retrieving the customer identifier from the input payload and performing a query against the Oracle database that holds the master data.

result-caching-article-figure-3

Figure 4: WSDL artifact of the customer service in Oracle Fusion Middleware JDeveloper.

The integration scenario implementation is fairly simple. A proxy service using HTTP as the transport receives SOAP messages and dispatches them to a pipeline. The request pipeline performs a message translation, converting the message from SOAP to the XML format expected by the downstream business service. The request pipeline then routes the message to the business service that uses the JCA transport to interact with the database. Finally, the business service executes an SQL query against the database to retrieve the customer data. The response pipeline converts the XML result back to the SOAP format and returns to the caller.

The development team activates the Result Caching feature in the business service, as shown in figure 5. This instructs OSB to cache all results that came from the database so that subsequent calls will perform much faster. The results will be cached based on the identifiers for each customer. In the expiration time section, the duration field is configured to 24 hours. If Coherence finds that the result has expired, it flushes the cache, and the business service queries the database again for a result. That result is then stored in the cache (if there is no error in the result) and the result is available in the cache so that it can be returned on the next request.

result-caching-article-figure-4

Figure 5: The Result Caching feature being activated in the business service.

When the development team releases the first version of the web service, the functional tests reveal that the implementation meets all requirements as expected. The IT architecture team decides to perform a load test before releasing the web service for external usage to be sure that the expected volume will be supported.

However unlike the results of the functional test, the load test reveals an unstable and non-reliable implementation. With the JVM heap size set to 4 GB, the service stopped working after five minutes due to out-of-memory issues, being able to process only 6292 different requests. Figure 6 shows the final OSB JVM state. The outcome could not be different: with payloads of 640 KB per result the JVM got saturated with too much data allocation.

result-caching-article-figure-5

Figure 6: OSB JVM state after five minutes, running out-of-memory and completely saturated.

The IT architecture team realizes that this JVM overload is being caused by usage of the Result Caching feature, but they know that without caching they will not be able to achieve the desired performance. At this point, they know that the solution is to augment the data storage capacity, so they are trying to increase the JVM heap size. However with the JVM heap size increase, frequent full GC pauses happens, also affecting the performance. The solution of using out-of-process Coherence servers was discussed but discarded by the IT architecture team once they need to have an easy-to-manage OSB domain. Then they decide to implement Elastic Data to perform off-heap caching. Using Elastic Data, they can rely on the OSB JVMs completely dedicated to message processing while cached results are stored off-heap, directly in DRAM or in any disk-based device, without the overhead of additional JVMs to manage.

Getting Off-Heap through Elastic Data

Thanks to the strong integration between OSB and Coherence, enabling Elastic Data is not difficult and requires zero changes in the OSB projects since everything is done at the Coherence level. To enable Elastic Data in OSB, two steps need to be performed:

1) Set up a custom Coherence cluster replacing any Coherence cluster used in your OSB domain by one that uses an external cluster configuration file, where all off-heap configurations will be set.

2) Set up a custom cache configuration that overrides OSB built-in cache configuration. This is necessary because Elastic Data is only activated when it is used as the cache implementation.

These two steps will be explained in more detail in the next section.

Setting up a Custom Coherence Cluster

In order to set up a custom Coherence cluster, you need to create a cluster configuration file. Listing 1 shows a example of cluster configuration file that includes the Elastic Data details. The interesting part to note in the cluster configuration file is the resource managers: there is one resource manager called ramjournal-manager that is used to configure DRAM details and another resource manager called flashjournal-manager that is used to configure SSD details.

<?xml version='1.0'?>
<coherence>
   <cluster-config>
      <journaling-config>
         <ramjournal-manager>
            <maximum-value-size>1M</maximum-value-size>
            <maximum-size>6G</maximum-size>
            <off-heap>true</off-heap>
         </ramjournal-manager>
         <flashjournal-manager>
            <maximum-value-size>1M</maximum-value-size>
            <directory>/u01/ssd-mount-point</directory>
         </flashjournal-manager>
      </journaling-config>
   </cluster-config>
</coherence>

Listing 1: Cluster configuration file, with adjusts in the resource managers.

In the ramjournal-manager section, the maximum value size was set to 1 MB. This means that each entry can be up to 1 MB in size. For the MDM scenario where each result has 640 KB of size, this should be enough. Also, the maximum size was set to 6 GB, which should hold around 10% (10,000) of the total volume of data expected in the MDM scenario. Finally, it was set that all the memory space be allocated off-heap, directly in DRAM.

When using the off-heap support, you must set the maximum total size of NIO used for direct buffer allocation. This can be accomplished through the JVM property -XX:MaxDirectMemorySize along with the other JVM properties. Here is an example: -Xms4g -Xmx4g -XX:MaxDirectMemorySize=6g.

In the flashjournal-manager section, the maximum value size was also set to 1 MB since the entries that will be stored there will be the same as the ramjournal-manager. Also, the full path location of the disk-based device was set. There is no need to set the maximum size; the flashjournal-manager calculates the maximum size based on the free space available on the disk that can hold a theoretical volume of 2 TB, composed of 512 journal files with 4 GB each.

To complete the set up, create a Coherence cluster in the OSB domain, and associate it with the OSB cluster as shown in figure 7. This Coherence cluster should use the external cluster configuration file.

result-caching-article-figure-6

Figure 7: Coherence cluster in the OSB domain using the external cluster configuration file.

Setting up a Custom Cache Configuration

Create a cache configuration file as shown in listing 2. The difference between the cache configuration shown in listing 2 and the one that comes out-of-the-box with OSB is the backing-map-scheme section. Instead of using the local-scheme which is the OSB default approach, the ramjournal-scheme is used.

<?xml version="1.0"?>
   <cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config
   coherence-cache-config.xsd">
   <caching-scheme-mapping>
      <cache-mapping>
         <cache-name>/osb/service/ResultCache</cache-name>
         <scheme-name>elastic-data-aware-scheme</scheme-name>
      </cache-mapping>
   </caching-scheme-mapping>
   <caching-schemes>
      <distributed-scheme>
         <scheme-name>elastic-data-aware-scheme</scheme-name>
         <service-name>ORA-OSB-deployments</service-name>
         <backing-map-scheme>
            <ramjournal-scheme />
         </backing-map-scheme>
         <autostart>true</autostart>
      </distributed-scheme>
   </caching-schemes>
</cache-config>

Listing 2: Cache configuration file with the Elastic Data feature enabled.

To complete the set up, edit the Coherence cluster created in the previous section and create a new cache configuration, as shown in figure 8. Please note that this cache configuration need to be created using some naming rules. The Name field must be set to "/osb/service/ResultCache" and the JNDI Name field must be set to "servicebus/result-cache".

result-caching-article-figure-7

Figure 8: Coherence cluster with the new cache configuration.

How Does the Elastic Data Feature Work?

The heart of the Elastic Data feature is the journaling. Journaling refers to the technique of recording state changes in a sequence of modifications called a journal. As changes occur, the journal records each value for a specific key and a tree structure that is stored in memory and keeps track of which journal entry contains the current value for a particular key. There are two journal implementations available, the RAM Journal and the Flash Journal, and they work seamlessly with each other. If, for example, the RAM Journal runs out of memory, the Flash Journal automatically accepts the overflow from the RAM Journal, allowing for caches to expand far beyond the size of DRAM.

The RAM Journal implementation allows data to be stored off-heap, which means that data is stored in memory but not necessarily in the heap space. To read and write data directly in DRAM, the RAM Journal implementation allocates NIO buffers and automatically manages the lifecycle of those buffers using its own garbage collector. It is important to note that only values are stored off-heap; keys are always allocated on-heap, which means that the heap size should be large enough to store all the keys.

Returning to the MDM scenario, considering that eventually the total volume of data expected would be loaded into the cache, OSB would be running without memory pressure according to the allocation scheme shown in table 1.

Storage Type Data Type Allocation Amount What It Means
On-Heap (DRAM) Customer Identifier (Integer with 4 bytes) ≈ 4.3 MB 100% of All Customer Keys
Off-Heap (DRAM) Customer Data Payload (640 KB for each) ≈ 6.1 GB 10% of the Customers Data
Off-Heap (SSD) Customer Data Payload (640 KB for each) ≈ 55 GB 90% of the Customers Data

Table 1: Amount of data allocated per storage type.

Conclusion

With the significant increase of OSB deployments that handle large amounts of data, the usage of the Result Caching feature is becoming increasingly common. However, for scenarios where the amount of data starts to jeopardize the OSB performance, off-heap caching using out-of-process Coherence servers should be considered.

This article showed a viable alternative to the usage of out-of-process Coherence servers, which consists of leveraging the Elastic Data feature available in Oracle Coherence. Through a case study and examples, we’ve shown why and how to implement off-heap caching through Elastic Data.

It is important to note that off-heap caching through Elastic Data is not a replacement for the usage of out-of-process Coherence servers, nor a silver bullet for GC issues. It should only be evaluated if having an easy-to-manage OSB domain is a primary concern. Using out-of-process Coherence servers is the product official solution for handling large amounts of data off-heap.

References

- Oracle Service Bus 12.1.3 Documentation, Creating and Configuring Business Services. http://docs.oracle.com/middleware/1213/osb/develop/osb-business-services.htm

- Oracle Service Bus 12.1.3 Documentation, Improving Performance by Caching Results. http://docs.oracle.com/middleware/1213/osb/develop/osb-business-services.htm#CHDDCGEE

- Oracle Coherence 12.1.3 Documentation, Implementing Storage and Backing Maps. http://docs.oracle.com/middleware/1213/coherence/develop-applications/cache_back.htm

- Oracle Coherence 12.1.3 Documentation, Using the Elastic Data Feature to Store Data. http://docs.oracle.com/middleware/1213/coherence/develop-applications/cache_back.htm#BJFEEBCB

- Oracle Coherence YouTube Channel, Introducing Elastic Data in Coherence 3.7.                          https://www.youtube.com/watch?v=p3aSPb4D1Hg

- Mark Smith, Oracle Service Bus Best Practices, OSB and Coherence Integration. https://blogs.oracle.com/MarkSmith/entry/osb_and_coherence_integration_1

- William Markito, Caching Strategies for Oracle Service Bus 11g.                      http://www.oracle.com/technetwork/articles/soa/bus-coherence-caching-421276.html

Greg Mally

Consulting Solutions Architect


Previous Post

ODI 11g and 12c Repository Structures Available on MOS

Christophe Dupupet | 1 min read

Next Post


Fusion Applications WebCenter Content Integration – Automating File Import/Export

Jack Desai | 8 min read