How Many ODI Master Repositories Should We Have?

Introduction

A question that often comes up is how many Master Repositories should be created for a proper ODI architecture. The short answer is that this will depend on the network configuration. This article will elaborate on this very simplistic answer and will help define best practices for the architecture of ODI repositories in a corporate environment.

How Many ODI Master Repositories Should We Have?

Master and Work Repositories

Before delving into the specifics of Master Repositories, it is important to have a good understanding of what Master Repositories are and how they differ from Work Repositories.

Master Repositories are used to store what can be considered sensitive information:

–           Information related to the connection to source and target systems: JDBC URLs, user names and passwords used to connect to the different systems, LDAP connectivity information, and schemas where data can be found. In fact, all the information that is managed with the Topology navigator is stored in the Master Repository;

–          Information related to ODI internal security, if security is handled by ODI instead of using an external server for roles management (ODI 12c and later) and authentication management: ODI users names and passwords, ODI users privileges and profiles, profiles definition;

–          Versions: when a new version of an object is created in the ODI studio, it is saved in the Master Repository.

Work Repositories are used to store the objects that result from the work of developers:

–          Source and target metadata (models in the Designer navigator);

–          Projects and all their children objects: folders, interfaces (ODI 11g and earlier releases) or mappings (ODI 12c and later), packages, procedures, variables, sequences, Knowledge Modules,  User functions;

–          Scenarios, load plans and schedules;

–          Logs resulting from code execution.

There are also Work Repositories that are labeled “Execution” Work Repositories. These can be used in production environments to make sure that source code will not be modified hastily in a live environment. These Work Repositories only contain scenarios, load plans, schedules and the execution logs.

Repositories relationships

All Work Repositories are attached to a Master Repository. A Master Repository can group multiple Work Repositories, but a Work Repository is attached to one and only one Master Repository.

When multiple Work Repositories are attached to the same Master Repository, they all share the same Topology and Security definitions. Versioned objects can be restored in any of the Work Repositories sharing the same Master Repository.

When several Work Repositories are attached to the same Master Repositories, each repository typically matches an execution environment so that different versions of the objects can be used in parallel. The use of execution contexts will allow for the execution of the objects in the proper environment.

Figure 1 represents an environment where we would have 3 Work Repositories sharing the same Master Repository: development, test and production.

Figure 1

Figure 1: three Work Repositories sharing a single Master Repository

In this example, release 1.0 or the scenarios can be running in production, while release 2.0 is developed and tested in separate Work Repositories. The versioned source code for release 1.0 of the scenario is available is the Master Repository.

Challenges encountered in a corporate environment

In a corporate environment, it is to be expected that the production environment is isolated from the rest of the information systems, in particular isolated from development and test environments. Often times, firewalls will prevent data exchanges and communication between these environments.

This will force us into having a separate Master Repository for the production environment. We have to expect that the architecture in a corporate environment will look more like the one represented in Figure 2 below.

Figure 2

Figure 2: Three Work Repositories in a corporate environment with Firewall.

In architectures of this type we can still take advantage of the notion of contexts as we move objects from the Development repository to the Test/QA repository (and take advantage of one single Master Repository inside the firewall). From then on, the promotion of objects to the production environment is limited to the components that have been validated in the Test/QA environment.

The synchronization of the Topology objects is usually quite limited: logical schema names will have to match, and contexts will have to match. But the physical architecture will be specific to each environment.

The bulk of the objects that will be promoted to the production environment is comprised of Scenarios and Load Plans that have been successfully tested.

One challenge with the setup we have so far is that the process of promoting objects to the production environment is not tested.

A more robust environment would be to force the validation of the process of promotion of scenarios and topology to the production environment beforehand. To perform this operation, we do not have to operate behind the firewall, but we need to replicate the environment. A pre-production environment (looks like production, but is not production) will allow us to perform this validation. This new approach is represented in Figure 3.

Figure 3

Figure 3: Corporate environment with Firewall and pre-production repositories

There will be customers who will want to have dedicated Master Repositories for each environment and each repository. This is absolutely a valid choice, but sharing Master Repositories will reduce the number of administrative tasks (such as ODI upgrades, topology updates, etc.) and allow for more flexibility in the evolution of the infrastructure.

Expanding the infrastructure

Now that we have a solid foundation for our infrastructure we can expand further.

Let’s look back at our original example: we had version 1.0 of the scenario in the production Repository and version 2.0 in the development Repository. We still need to be able to fix potential problem in the production environment, but for safety reasons we do not have the source code in that environment (this prevents over-zealous developers from introducing untested fixes directly in a production environment with potentially disastrous effects). A common solution is to introduce a “Hotfix” repository, where the source code of the objects used in production can be restored and corrected as needed. Corrected objects can then be tested again before they are promoted to the production environment. This is another case for a repository that would be “like” production, without using the actual production repository. Here we can share the same Master Repository as the pre-production repository, as shown on figure 4.

Figure 4

Figure 4: introducing a repository to fix issues identified in the production environment.

Repositories infrastructure and objects promotion

We can now superimpose objects movements over our architecture. Figure 5 represents objects movements as follows:

–          Orange arrows represent the required synchronizations from a topology perspective. However, since the physical definition of servers will be different from environment to environment, only the name of the Logical Schemas and Contexts must be synchronized.

–          Red arrows represent the movement of scenarios and load plans (execution components)

–          Yellow arrows represent the movement of source objects in and out of source control (internal or external to ODI).

Note: if you look carefully, you will notice a little discrepancy on this picture: we represent the ability to restore source code in a test repository that is marked as “Execution” Work Repository: obviously you cannot import code in such a repository. But some customers will want to have the source code available in their validation environment to allow for more intelligent testing. If you have that preference, then use a Development repository for your Test environment. If not, just remove the arrow that connects the test environment to source control.

Figure 5

Figure 5: Complete environment with detailed objects movement.

From this picture we can see the following movements:

–          Scenarios and load plans (future production objects) are loaded from the development repository to the test repository for validation. Once validated, they are loaded to the pre-production repository so that the promotion process can be validated. Upon success, the objects are promoted to production

–          The source code for promoted objects is versioned as the scenarios and load plans are promoted to the test environment (obviously intermediate versions can be created independently of the promotions). When objects are promoted to production, the matching source code can be restored in a hotfix environment to make sure that it is available in case issues are identified in the production environment

–          If fixes are performed in the hotfix environment, the corrected source code is versioned. At the same time Scenarios and Load Plans are promoted to the test environment. From there, they follow the same path as the objects promoted from the development environment: from Test to pre-production and ultimately to production.

For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit Oracle A-Team Chronicles for ODI.

Comments

  1. Loudymer Garcia says:

    Hello,

    I just want to inquire regarding the master repository IDs. We want to create ODI setup that will be the same as Figure2, should we create master repositories with different IDs for Dev and Prod? Will the codes be migrated properly across environments using “Synonym mode INSERT_UPDATE” without encountering reference ID issues?
    Thanks in advance. 🙂

Add Your Comment