Reading time: < 12 minutes
Choosing your data preparation approach is a critical task that should not be missed during data migration.
In continuation to my previous blog “Options for migrating clean data to Fusion Applications Cloud”, I would like to expand on the option of leveraging Oracle Enterprise Data Quality tool to analyze, standardize, cleanse, and de-duplicate your source data to cloud. Traditionally, most of the implementation projects will design the data conversion as a series of tasks to convert/migrate data from legacy sources into Oracle Fusion Applications Cloud. I have repeatedly seen data issues causing challenges in this model during the user acceptance testing. Also, this traditional “Lift & Shift” method ends up taking multiple iterations of code fix, more business resource effort during testing, and typically delay the project with increased implementation cost. Instead, “Transform & Shift” method recommends a well-planned conversion activity, data analysis to quantify potential data issues pro-actively during the functional discovery, and design phase that facilitates a successful data migration to cloud.
During the implementation of Oracle fusion applications cloud, when you migrate data from your legacy systems, data issues such as inaccurate, incomplete, inconsistent, and duplicate data may cause business functions to fail in Oracle fusion applications cloud. Hence, understanding of existing source system data, business processes, and the target system data requirements is a critical component for effective data migration.
When you are in the process of cloud transformation and implementing Oracle fusion applications cloud, data preparation is an essential component to enable your clean data migration.
“Ultimately, poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point you either have to stop and clear the windshield or risk everything.” — Ken Orr, The Cutter Consortium
Cloud transformation program is the great opportunity to identify and resolve your legacy / home-grown data issues. The goal of the data preparation exercise is to provide a reusable/repeatable process to combine, cleanse, and transform data to Oracle fusion applications cloud from one or more sources.
The data preparation effort can be classified as the following tasks:
This step is to assess source data to identify potential data issues, without doing any changes to source data.
Completeness: Identify missing / unusable data
Compliance: Validate type, format, pattern conformity / reference integrity
Consistency: Identify data consistency challenges & conflicts
Correctness: Verify out-of-date or incorrect data
Copies: Identify repeated values, matched/duplicate records
Reports to assist business teams to finalize the approach
Finalize your approach
Quantify data issues
Determine data quality rules that can be automated.
Identify data issues that need manual business user intervention to decide the rules/policies.
Map and Prepare files based on the Oracle fusion applications cloud data requirements
Thorough planning and creating a data preparation approach in the early stage of the project will be realized as the major critical success factor on any successful data migration.
Typical data migration requirements:
I am confident that you would be familiar with the data issues that are very typical during data migration. The key data issues that were identified are given below as a reference:
Customer Data Issues
Sequence errors (John Smith or Smith John)
Typo errors / transposed characters (Browne, Brown, Brawon) – Poor spelling of name and address
Concatenated names, Nick Names, Aliases, multiple names stored in a single field
Noise errors such as dashes, slashes, special characters, titles on names
Abbreviations (LTD, Limited, Ltd., Inc, Incorporated, Corp.)
Data truncations (Corpora, Enterpri,) mostly due to length of the field
Missing or Mixed or extra data (Missing Last Name, business and individual names mixed, non-standard name constructs)
Date format errors (Date of Birth in different formats, invalid dates)
Inconsistent formats in identifiers, phone numbers, email address, street address, state, country, dates, ranges, status, codes, categories, etc.
Product / Asset Data Issues
Several data combined in a single attribute such as product description containing dimensions/classification (Canon Camera Auto Zoom 50mm Lens)
Missing / Variations in data (dimension details)
Typo / Inconsistent data (Manufacturer name: HP, HP Inc., Compaq)
Inconsistent formatting and use of abbreviation on descriptions or Manufacturer data from different suppliers
Functional / Transactional Data Issues
Incorrect customer or product or employee assignments
Invalid dates with multiple formats from different sources / records
Format issues (number / amount fields)
Currency code / reference data issues
Third-party data files, format issues, missing, inconsistent data
While this approach is given as a reference for your Oracle Fusion applications cloud implementation projects, these may be applicable for all cloud transformation projects.
EDQ is a browser-based data quality profiling, analysis, parsing, standardization, matching and merging tool, designed to help you understand, improve, protect, and govern the quality of the information your business uses, all from a single integrated environment. Based on my experience, EDQ can handle customer, product, asset, financial, orders, invoices, any transactional, operational data. You can process (input/output) data in different file formats (CSV/TXT/XML) and also can directly read/write data from your database tables. EDQ provides the ability to enable the processes as web-services and create jobs and automate to a major extent.
Technically, imagine EDQ as the data engine that takes input, processes data (parse, transform, cleanse, match), and provides an output that can be used in target application as needed. EDQ is a business-friendly data management tool which allows users to start using the functionalities almost immediately within a few hours after installation. You can perform data analysis and verification using Oracle EDQ on millions of records within hours, and your business will benefit from the results very early in the project.
EDQ may be deployed at your on-premise or Oracle Infrastructure Cloud using the marketplace image.
The diagram below shows the ease of use of EDQ where a business user / developer can interactively configure data quality rules and view the results all on the same application.
The image given below provides a very high-level list of data management processors that are available in EDQ as out-of-the-box for profiling, standardization, audit, transformation, matching, etc.
Plan for iterative data analysis on sample data and full data set as needed.
Plan the data analysis tasks well in advance during the discovery/design stage of your project to quantify the potential data issues that could be used to help your source system fix the data issues and also to help in suggesting the rules for fixing data issues to your business.
Plan the availability of business resources throughout the project.
Prioritize your data issue resolution process and requirements for the phased approach.
Plan training on the EDQ for your business users and development team.
Collaboration with business is critical for data analysis, decision making on data conflict resolution, and defining the business rules for data quality.
Do not over-engineer on the data preparation solution. Focus on data quality rules required to solve most of the issues. Note that applying data quality rules is not a one-time need and it must be planned as an ongoing activity to support your data governance.
Avoid automation for the initial data load conversion. Instead, focus on the analysis and rules.
Do not miss out on the historical data alignment along with your data migration. For example, if you are consolidating or merging customers to form a single customer record, historical data should also be pointing to your new customer record. This can be a post-implementation task.
Plan to design and build the data preparation solution as a repeatable process that can be used in your phased implementation and also for your merger & acquisition tryouts.
Focus on solving the critical and high priority data issues, and plan your time wisely on the low priority of issues. Try to build rules that will fix the data issues quickly by collaborating with business users. The focus should be on time-to-results.
With adequate planning and execution of data preparation for your cloud transformation, you are mitigating the risk of project delays and increased implementation cost.
EDQ helps in developing repeatable processes for migrating clean data to Oracle fusion applications cloud. It will pro-actively make you aware of your source data issues that can be resolved by configuring data quality rules and automate as needed based on your business priorities prior to loading into Oracle fusion applications cloud.
While EDQ is an Oracle tool that can be leveraged, you could use other similar tools if you already have as well. But the most important take away from this blog is that you must plan data preparation in the early stage of your project.