Understanding the Endeca CAS & EAC APIs

Introduction

I’ve always felt that the best way to understand something is to take it apart and try to put it back together. In this blog we’ll be doing that by deconstructing the Endeca application scripts and reconstructing them in Java, revealing their inner workings and familiarizing developers with the Endeca CAS, RecordStore, and EAC API’s. Beyond exploring these API’s, the solutions presented herein may be useful to Endeca application developers needing greater flexibility and control than that available by the default scripts, and those who prefer to work in Java over BeanShell and shell scripts.

Main Article

The Endeca CAS Server is a Jetty based servlet container that manages record stores, dimensions, and crawling operations. The CAS Server API is an interface for interacting with the CAS Server. By default, the CAS Service runs on port 8500. Similarly, the Endeca EAC Central Server runs on Tomcat, and coordinates the command, control, and monitoring of EAC applications. By default, it runs on port 8888. Each of these servers, and their respective APIs, are explained in the following Endeca documents:

Content Acquisition System Developer’s Guide
Content Acquisition System API Guide
Platform Services Application Controller Guide

We will use these APIs to re-write the scripts generated by the deployment template for the Discover Electronics reference application, using Java instead of shell script.

To begin, we need to generate the scripts that we’ll be converting. Detailed instructions for this procedure are provided in the CAS Quick Start Guide, but the basic syntax for deploying the Endeca Discover Electronics CAS application is:

cd \Endeca\ToolsAndFrameworks\11.1.0\deployment_template\bin
deploy --app C:\Endeca\ToolsAndFrameworks\11.1.0\reference\discover-data-cas\deploy.xml

Make sure to answer N when prompted to install a base deployment.

Once the deploy command has finished, you should see the following files included in the C:\Endeca\Apps\Discover\control directory:

initialize_services.bat      
load_baseline_test_data.bat  
baseline_update.bat          
promote_content.bat          

These are the scripts that we will be re-writing in Java. After running our Java application, we should be able to navigate to the following URLs and see the same results as having executed the above scripts:

http://localhost:8006/discover
http://localhost:8006/discover-authoring

initialize_services

The first script that we will begin analyzing is initialize_services. Opening the file in a text editor, we see that the first thing it does is set some environment variables. Rather than use system variables, it is customary for Java applications to read from property files, so we’ll create a config.properties file to store our configuration, and load it using the following syntax:

try {
    configProperties.load(ResourceHelper.class.getClassLoader().getResourceAsStream("config.properties"));
} catch (IOException e) {
    log.error("Cannot load configuration properties.", e);
}

Next, the script checks if the --force argument was specified. If it was, the script removes any existing crawl configuration, record stores, dimension value id managers, and lastly the application. The code below shows how to remove the crawl configuration, record stores, and dimval id managers:

public static CasCrawler getCasCrawler() throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    CasCrawlerLocator locator = CasCrawlerLocator.create(host, port);
    locator.setPortSsl(Boolean.parseBoolean(getConfigProperty("cas.ssl")));
    locator.ping();
    return locator.getService();
}

public static ComponentInstanceManager getComponentInstanceManager() throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    ComponentInstanceManagerLocator locator = ComponentInstanceManagerLocator.create(host, port);
    locator.setPortSsl(Boolean.parseBoolean(getConfigProperty("cas.ssl")));
    locator.ping();
    return locator.getService();
}

public static boolean deleteCrawl(String id) {
    try {
        getCasCrawler().deleteCrawl(new CrawlId(id));
    } catch (ItlException|IOException e) {
        log.error("Unable to delete crawl '"+id+"'", e);
    }
}

public static void deleteComponentInstance(String name) {
    try {
        getComponentInstanceManager().deleteComponentInstance(new ComponentInstanceId(name));
    } catch (ComponentManagerException|IOException e) {
        log.error("Unable to delete component instance '"+name+"'", e);
    }
}

However, removing the application is a bit more involved and requires interacting with the EAC, whose configuration is stored in AppConfig.xml:

<app appName="Discover" eacHost="jprantza01" eacPort="8888" 
    dataPrefix="Discover" sslEnabled="false" lockManager="LockManager">
  <working-dir>${ENDECA_PROJECT_DIR}</working-dir>
  <log-dir>./logs</log-dir>
</app>

So we need to load AppConfig.xml, which is a Spring-based ApplicationContext configuration file:

Resource appConfigResource = new FileSystemResource(getConfigProperty("app.config"));
if (!appConfigResource.exists()) {
    appConfigResource = new ClassPathResource(appConfig);
}
if (!appConfigResource.exists()) {
    log.error("Cannot load application configuration: "+appConfig);
} else {
    XmlBeanDefinitionReader xmlReader = new XmlBeanDefinitionReader(appContext);
    xmlReader.loadBeanDefinitions(appConfigResource);
    PropertyPlaceholderConfigurer propertySubstituter = new PropertyPlaceholderConfigurer();
    propertySubstituter.setIgnoreResourceNotFound(true);
    propertySubstituter.setIgnoreUnresolvablePlaceholders(true);
    appContext.addBeanFactoryPostProcessor(propertySubstituter);
    appContext.refresh();
}

Note that the propertySubstituter (PropertyPlaceholderConfigurer) is necessary to allow for expansion of properties like ${ENDECA_PROJECT_DIR}. These properties must exist in your environment.

Once the appContext has been loaded, we can remove an app by retrieving all beans of type Component or CustomComponent and removing their definitions with:

public static void removeApp(String appName) {
    try {
        Collection<Component> components = getAppContext().getBeansOfType(Component.class).values();
        if (components.size() > 0) {
            Application app = toApplication(components.iterator().next());
            if (app.isDefined() && app.getAppName().equals(appName)) {
                Collection<CustomComponent> customComponents = getAppContext().getBeansOfType(CustomComponent.class).values();
                for (CustomComponent customComponent: customComponents) {
                    try {
                        customComponent.removeDefinition();
                    } catch (EacComponentControlException e) {
                        log.error("Unable to remove definition for "+customComponent.getElementId(), e);
                    }
                }
                app.removeDefinition();
            }
            else {
                log.warn("Application '"+appName+"' is not defined.");
            }
        }
    }
    catch (AppConfigurationException|EacCommunicationException|EacProvisioningException e) {
        log.error("Unable to remove application '"+appName+"'", e);
    }
}

Provided that the app state is clean, the script then goes on to create the record stores, create the dimension value id managers, and set the configuration on the data record store, which can be accomplished using the following code:

public static void createComponentInstance(String type, String name) {
    try {
        getComponentInstanceManager().createComponentInstance(new ComponentTypeId(type), new ComponentInstanceId(name));
    } catch (ComponentManagerException|IOException e) {
        log.error("Unable to create "+typeId+" instance '"+name+"'", e);
    }
}

public static void setConfiguration(RecordStore recordStore, File configFile) {
    try {
        recordStore.setConfiguration(RecordStoreConfiguration.load(configFile));
    } catch (RecordStoreConfigurationException e) {
        StringBuilder errorText = new StringBuilder();
        for (RecordStoreConfigurationError error: e.getFaultInfo().getErrors()) {
            errorText.append(error.getErrorMessage()).append("\n");
        }
        log.error("Invalid RecordStore configuration:\n"+errorText);
    } catch (RecordStoreException e) {
        log.error("Unable to set RecordStore configuration", e);
    }
}

It then calls out to the following BeanShell script, found in InitialSetup.xml:

<script id="InitialSetup">
  <bean-shell-script>
    <![CDATA[ 
  IFCR.provisionSite();
  CAS.importDimensionValueIdMappings("Discover-dimension-value-id-manager", 
      	InitialSetup.getWorkingDir() + "/test_data/initial_dval_id_mappings.csv");
    ]]>
  </bean-shell-script>
</script>

Now, if we wanted to convert these scripts to Java as well, we could do the following:

IFCRComponent ifcr = (IFCRComponent) getAppContext().getBean("IFCR", IFCRComponent.class);
ifcr.provisionSite();
...

But to keep this exercise simple, I chose not to convert the BeanShell scripts, and rather to leave it as an exercise for the reader. All that the BeanShell scripts do is bind to Spring Beans that are defined elsewhere in the configuration, and call their Java methods. For example, the IFCR component is defined in WorkbenchConfig.xml.

Instead, to execute the BeanShell scripts, you can use the convenience method invokeBeanMethod():

try {
    invokeBeanMethod("InitialSetup", "run");
} catch (IllegalAccessException|InvocationTargetException e) {
    log.warn("Failed to configure EAC application. Services not initialized properly.", e);
    releaseManagedLocks();
}

After the initial setup is complete, we can create the crawl configuration using the following code:

public static void createCrawl(CrawlConfig config) {
    try {
        List<ConfigurationMessage> messages = getCasCrawler().createCrawl(config);
        StringBuilder messageText = new StringBuilder();
        for (ConfigurationMessage message: messages) {
            messageText.append(message.getMessage()).append("\n");
        }
        log.info(messageText.toString());
    }
    catch (CrawlAlreadyExistsException e) {
        log.error("Crawl unsuccessful. A crawl with id '"+config.getCrawlId()+"' already exists.");
    }
    catch (InvalidCrawlConfigException|IOException e) {
        log.error("Unable to create crawl "+config.getCrawlId(), e);
    }
}

Finally, to import the content we can use either invokeBeanMethod() to call methods on the IFCR component, or look up the IFCRComponent using getBean() and call the import methods on it directly.

load_baseline_test_data

The next script, load_baseline_test_data, is responsible for loading the test data into the record stores. The two record stores that need to be populated are: Discover-data, and Discover-dimvals. These record stores are populated using the data from the following files:

Discover-data C:/Endeca/Apps/Discover/test_data/baseline/rs_baseline_data.xml.gz
Discover-dimvals C:/Endeca/Apps/Discover/test_data/baseline/rs_baseline_dimvals.xml.gz

To do this, we’ll first need to create or locate the record stores:

public static RecordStore getRecordStore(final String instanceName) throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    RecordStoreLocator locator = RecordStoreLocator.create(host, port, instanceName);
    locator.ping();
    return locator.getService();
}

Then, the following code can be used to load the data:

public boolean loadData(final String recordStoreName, final String dataFileName, final boolean isBaseline) {
    File dataFile = new File(dataFileName);
    if (!dataFile.exists() || !dataFile.isFile()) { // verify file exists
        log.error("Invalid data file: " + dataFile);
        return false; // failure
    }
    TransactionId txId = null;
    RecordReader reader = null;
    RecordStoreWriter writer = null;
    RecordStore recordStore = null;
    int numRecordsWritten = 0;
    try {
        recordStore = getRecordStore(recordStoreName);
        txId = recordStore.startTransaction(TransactionType.READ_WRITE);
        reader = RecordIOFactory.createRecordReader(dataFile);
        writer = RecordStoreWriter.createWriter(recordStore, txId, 500);
        if (isBaseline) {
            writer.deleteAll();
        }
        for (; reader.hasNext(); numRecordsWritten++) {
            writer.write(reader.next());
        }
        close(writer); // must close before commit
        recordStore.commitTransaction(txId);
        log.info(numRecordsWritten + " records written.");
    }
    catch (IOException|RecordStoreException e) {
        log.error("Unable to update RecordStore '"+recordStoreName+"'", e);
        rollbackTransaction(recordStore, txId);
        return false; // failure
    }
    finally {
        close(reader);
        close(writer);
    }
    return true; // success
}

This code will open the record store for write access, remove all existing records, iterate through all records in the data file, and write them to the record store. Then either commit or roll back the transaction, and close any resources. This is called once for each record store. That’s all that the load_baseline_test_data script does.

baseline_update & promote_content

The last two scripts, baseline_update and promote_content, simply call out to the BeanShell scripts ‘BaselineUpdate’ and ‘PromoteAuthoringToLive’, which reside in DataIngest.xml, and WorkbenchConfig.xml respectively. BaselineUpdate will run the crawl, update and distribute the indexes. PromoteAuthoringToLive will export the configurations to the LiveDgraphCluster, and update the assemblers on LiveAppServerCluster. Both of these BeanShell scripts can be called by using either invokeBeanMethod() or getBean().

Source Code

Attached below are a set of Java files that execute the same behavior as the application scripts, using the methods outlined above. The class files reflect the scripts they are modeled after:

Script Java Class
initialize_services com.oracle.ateam.endeca.example.itl.Initializer
load_baseline_test_data com.oracle.ateam.endeca.example.itl.Loader
baseline_update com.oracle.ateam.endeca.example.itl.Updater
promote_content com.oracle.ateam.endeca.example.itl.Promoter

 
You can run each Java class individually, or you can run everything all at once by using com.oracle.ateam.endeca.example.itl.Driver. Included in the distribution are build scripts, run scripts, and sample configuration files. If you have Endeca installed in a directory other than the default, then you may need to modify some files slightly.

Hopefully this exercise has helped eliminate some of the mystery behind what these scripts actually do. Feel free to modify the code as you need, but keep in mind that new product releases may modify the deployment templates, so keep an eye out for changes if you decide to incorporate this code into your solutions.

The attached source code requires Gradle, Maven, and Java 7 SDK to build. Once extracted, edit “scripts/mvn_install.bat” to point to your Endeca installation directory. Then run the script to install the dependent libraries into a local Maven repository. Finally, run “gradlew build” to build “discover_data_cas_java-1.0.jar”, and “gradlew javadoc” to build the javadocs.

DiscoverDataCASJavaSource

Add Your Comment