Conversions in WebCenter Content

Introduction

One of the guiding principles with WebCenter Content has been to make it as easy as possible to consume content.  And part of that means viewing content in a format that is optimal for the end user… regardless of the format the content was created in.  So WebCenter Content has a long history of converting files from one format to another.  Often this involves converting a proprietary desktop publishing format to something more open that can be viewed directly from a browser.  Or taking a high resolution image and creating a rendition that download quickly over a slow network.

Over the life of the product, the types and methods for those conversions has grown to provide a broad range of options.  It’s sometimes confusing to know what conversion are available and where exactly they are done (Content Server or Inbound Refinery), so I’ve put together a flowchart and list describing all of the different types of conversion, how and where they are done, and the pros and cons of each.  This list covers what’s available as of the current release – WebCenter Content 11g PS5.

WCC Conversion Decision Tree
(click for full-version)

Main Article

PDF Conversions

Where: Inbound Refinery
When: Upon check-in
How: Multiple ways
Platform: All (* but depends)

So PDF conversions are probably the most common type of conversion done with WCC.  This involves converting a desktop publishing format (e.g. Microsoft Word) into Adobe PDF format.  The benefits obviously include being able to read the document directly in the browser (with a PDF reader plug-in) and not requiring the 3rd party product to read the proprietary format. In addition, PDFs also provide additional benefits such as being able to start viewing the document before the entire file downloads, possible compression on file size, and the ability to provide watermarks and additional security on the file.  And optionally, PDF/A format can be chosen which is recognized as an approved archival format.

Within PDF conversions, there are several different methods that can be used to create the PDF, depending on the needs and requirements.

PDFExportConverter – This method uses Oracle’s own OutsideIn filters to directly convert multiple format types into PDF.  The benefits include multiple platform support (any platform that WCC supports), fastest conversion, and no 3rd party software requirements.  The main downside to this type of conversion is it has the lowest fidelity to the original document. Meaning it won’t always exactly match the look and feel of the original document.  These formats are supported by the OutsideIn filters for conversion to PDF.

WinNativeConverter – Like the name implies, this type of conversion uses the native applications on Windows to do the conversion.  By using the original application that was used to create the document, you will get the best fidelity of PDF compared to the original.  The downside is that the Inbound Refinery can only be run on Windows and not other platforms.  It also requires a distiller engine to convert the PostScript format that gets printed from the native applications to PDF.  The recommended choice for that is AFPL Ghostscript.

OpenOfficeConversion – The Open Office conversion is a bit of a compromise between the two types of conversions mentioned above.  It uses Apache Open Office to open and convert the native file. In most cases, it will give you better fidelity of PDF then the PDFExportConverter, but still not as good as WinNativeConverter.  Also, it does support more than just Windows, so it has broader platform support then WinNativeConverter.

Tiff Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses a 3rd party (CVISION PdfCompressor) engine to perform OCR and PDF conversion
Platform: Windows Only

When needing to convert TIFF formatted files into PDFs, this can be done with either PDFExportConverter or Tiff Converter.  The major difference is if optical character recognition (OCR) needs to be performed on the file in order to extract the full-text off the image.  If OCR is required, then Tiff Converter is used for that type of conversion.  In addition, a 3rd party tool, CVISION PdfCompressor, is required to do the actual OCR and conversion piece.  Tiff Converter acts as the controller between the Inbound Refinery and PdfCompressor.  But because PdfCompressor is a Windows-only application, the Inbound Refinery must also be on Windows.

XML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to convert native formats into XML
Platform: All

The XML Converter allows for native documents to be converted into 2 flavors of XML: FlexionXML (based on FlexionDoc schema) and SearchML (based on the SearchML schema).  In addition, those formats can go through additional transformation with a custom XSLT.  Because the XML Converter utilizes the Oracle OutsideIn filter technology, it supports all platforms.

DAM Converter

Where: Inbound Refinery
When: Upon check-in and updates
How: Can use both Oracle OutsideIn filters as well as 3rd party applications to do image conversions.  Flip Factory is required for video conversions.
Platform: All (* but depends)

DAM Converter is used to create multiple renditions of either image or video files.  The primary goal is to convert original formats which can typically be high resolution and large in size into other formats that are geared towards web or print delivery.  One thing that is unique to DAM Converter is the metadata that is used to specify the rendition set can be updated after the item has been submitted which will send the file back to the Inbound Refinery to be reprocessed.

When using the image converter, the Inbound Refinery comes with the Oracle OutsideIn filters to create renditions, so nothing else is required and it can run on all platforms.  But the converter also supports other types of image converters which are command-line driven such as Adobe Photoshop, XnView NConvert, ImageMagick.  Some are commercial and some are freeware.  Each has different capabilities for different use-cases and are supported on various platforms.  But for general purpose re-sizing, resolution, and format changes, OutsideIn can handle it.

For video conversion, Telestream’s Flip Factory is required.  The DAM Converter acts as the controller between the Inbound Refinery and Flip Factory.  What makes this integration a bit unique is that it is handled purely at a file system level.  This means that Flip Factory, which is a Windows-only application, does not need to reside on the same server as the Inbound Refinery.  They simply need shared file system access between servers.  So the Inbound Refinery can be on Linux while Flip Factory is on Windows.

HTML Converter

Where: Inbound Refinery
When: Upon check-in
How: Uses Microsoft Office to convert Office documents into HTML
Platform: Windows Only

HTML Converter uses Microsoft Office to save the documents as HTML documents, collects the output (into a zip file if multiple files), and returns them to Content Server.  Using the HTML save output directly from Office, you get a very good fidelity of HTML compared to the original native format.  This is especially true for Excel and Visio which are less text-based.  The downside is you have no control over the HTML output to make any changes or provide consistency between conversions.  It’s simply formatted based on Office’s formatting.  Also, it does not apply any templating around the content to insert code before or after the content or present the document within the structure of a larger HTML page such as in the case of Site Studio.

Dynamic Converter

Where: Content Server
When: Upon check-in or on-demand
How: Uses Oracle OutsideIn filters to convert native documents into HTML
Platform: All

Like HTML Converter, Dynamic Converter converts Office documents into HTML.  But there are several key differences between the two.  First is Dynamic Converter uses OutsideIn filters to convert to HTML so it supports a wide range of different native formats. Another difference is the processing happens on the Content Server side and not Inbound Refinery.  This allows the conversion to happen on-demand the first time the HTML version is requested.  Alternatively, DC can be configured to do the conversion upon check-in and cache the results so they are immediately available and don’t need to go through conversion on first request. DC also supports a wide range of controls over how the HTML is precisely formatted.  The result can be very minimal and clean HTML with various div or span tags to allow styling with CSS.  This can lead to a more consistent look and feel between converted documents.  In also allows for insertion of code before or after the content to embed the output within a template and is what is used within Site Studio.

Thumbnail Creation

Where: Content Server or Inbound Refinery
When: Upon check-in
How: Uses Oracle OutsideIn filters to create a thumbnail representation of the document to be used on search results
Platform: All

As a new feature in PS5, thumbnails can now be generated directly in the Content Server and not require the document to be sent to the Inbound Refinery (if it doesn’t need other conversions).  This allows the document to become available much more quickly.  But if the file is sent to the Inbound Refinery for other types of conversions, the thumbnail can be generated at that point.

For further information on conversions, see the documentation on Conversions as well as Dynamic Converter.

Add Your Comment