Prepare your data using Enterprise Data Quality for cloud transformation

May 31, 2020 | 7 minute read
Bala Mahalingam
Consulting Solution Architect A-Team
Text Size 100%:

Reading time: < 12 minutes

Introduction

Choosing your data preparation approach is a critical task that should not be missed during data migration.

In continuation to my previous blog “Options for migrating clean data to Fusion Applications Cloud”, I would like to expand on the option of leveraging Oracle Enterprise Data Quality tool to analyze, standardize, cleanse, and de-duplicate your source data to cloud. Traditionally, most of the implementation projects will design the data conversion as a series of tasks to convert/migrate data from legacy sources into Oracle Fusion Applications Cloud. I have repeatedly seen data issues causing challenges in this model during the user acceptance testing. Also, this traditional “Lift & Shift” method ends up taking multiple iterations of code fix, more business resource effort during testing, and typically delay the project with increased implementation cost. Instead, “Transform & Shift” method recommends a well-planned conversion activity, data analysis to quantify potential data issues pro-actively during the functional discovery, and design phase that facilitates a successful data migration to cloud.

During the implementation of Oracle fusion applications cloud, when you migrate data from your legacy systems, data issues such as inaccurate, incomplete, inconsistent, and duplicate data may cause business functions to fail in Oracle fusion applications cloud. Hence, understanding of existing source system data, business processes, and the target system data requirements is a critical component for effective data migration.

Data Preparation Approach

When you are in the process of cloud transformation and implementing Oracle fusion applications cloud, data preparation is an essential component to enable your clean data migration.

    “Ultimately, poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point you either have to stop and clear the windshield or risk everything.”  — Ken Orr, The Cutter Consortium

Cloud transformation program is the great opportunity to identify and resolve your legacy / home-grown data issues. The goal of the data preparation exercise is to provide a reusable/repeatable process to combine, cleanse, and transform data to Oracle fusion applications cloud from one or more sources.

The data preparation effort can be classified as the following tasks:

  1. Data analysis

    • This step is to assess source data to identify potential data issues, without doing any changes to source data.

      • Completeness: Identify missing / unusable data

      • Compliance: Validate type, format, pattern conformity / reference integrity

      • Consistency: Identify data consistency challenges & conflicts

      • Correctness: Verify out-of-date or incorrect data

      • Copies: Identify repeated values, matched/duplicate records 

  2. Review results

    • Reports to assist business teams to finalize the approach

  3. Finalize your approach

    • Quantify data issues

    • Determine data quality rules that can be automated.

    • Identify data issues that need manual business user intervention to decide the rules/policies.         

  4. Map and Prepare files based on the Oracle fusion applications cloud data requirements

Data Preparation

Thorough planning and creating a data preparation approach in the early stage of the project will be realized as the major critical success factor on any successful data migration.

Typical data migration requirements:

  • Data from multiple source systems (legacy mainframe, on-premise commercial/homegrown applications, other cloud applications) including structured and unstructured (images, files, etc.)
  • Data standardization/cleansing (such as transformation, validation/audit, re-formatting, parsing) rules to be identified and applied to prepare the data to be migrated into Oracle fusion applications cloud.
  • 3rd party data (such as Supplier data files, address verification, product data, customer hierarchy, individual/organization profile data).
  • Need for splitting the records or combining data from different formats to the target.

I am confident that you would be familiar with the data issues that are very typical during data migration. The key data issues that were identified are given below as a reference:

  • Customer Data Issues

    • Sequence errors (John Smith or Smith John)

    • Typo errors / transposed characters (Browne, Brown, Brawon) – Poor spelling of name and address

    • Concatenated names, Nick Names, Aliases, multiple names stored in a single field

    • Noise errors such as dashes, slashes, special characters, titles on names

    • Abbreviations (LTD, Limited, Ltd., Inc, Incorporated, Corp.)

    • Data truncations (Corpora, Enterpri,) mostly due to length of the field

    • Missing or Mixed or extra data (Missing Last Name, business and individual names mixed, non-standard name constructs)

    • Date format errors (Date of Birth in different formats, invalid dates)

    • Inconsistent formats in identifiers, phone numbers, email address, street address, state, country, dates, ranges, status, codes, categories, etc.

    • Duplicate customers

  • Product / Asset Data Issues

    • Several data combined in a single attribute such as product description containing dimensions/classification (Canon Camera Auto Zoom 50mm Lens)

    • Missing / Variations in data (dimension details)

    • Typo / Inconsistent data (Manufacturer name: HP, HP Inc., Compaq)

    • Inconsistent formatting and use of abbreviation on descriptions or Manufacturer data from different suppliers

    • Item Duplicates

  • Functional / Transactional Data Issues

    • Missing information

    • Incorrect customer or product or employee assignments

    • Invalid dates with multiple formats from different sources / records

    • Format issues (number / amount fields)

    • Currency code / reference data issues

  • Third-party data files, format issues, missing, inconsistent data

While this approach is given as a reference for your Oracle Fusion applications cloud implementation projects, these may be applicable for all cloud transformation projects. 

Oracle Enterprise Data Quality (EDQ)

EDQ is a browser-based data quality profiling, analysis, parsing, standardization, matching and merging tool, designed to help you understand, improve, protect, and govern the quality of the information your business uses, all from a single integrated environment. Based on my experience, EDQ can handle customer, product, asset, financial, orders, invoices, any transactional, operational data. You can process (input/output) data in different file formats (CSV/TXT/XML) and also can directly read/write data from your database tables. EDQ provides the ability to enable the processes as web-services and create jobs and automate to a major extent.

Technically, imagine EDQ as the data engine that takes input, processes data (parse, transform, cleanse, match), and provides an output that can be used in target application as needed. EDQ is a business-friendly data management tool which allows users to start using the functionalities almost immediately within a few hours after installation. You can perform data analysis and verification using Oracle EDQ on millions of records within hours, and your business will benefit from the results very early in the project.

EDQ may be deployed at your on-premise or Oracle Infrastructure Cloud using the marketplace image.

The diagram below shows the ease of use of EDQ where a business user / developer can interactively configure data quality rules and view the results all on the same application.

The image given below provides a very high-level list of data management processors that are available in EDQ as out-of-the-box for profiling, standardization, audit, transformation, matching, etc.

Data Preparation Guidelines

  • Plan for iterative data analysis on sample data and full data set as needed.

  • Plan the data analysis tasks well in advance during the discovery/design stage of your project to quantify the potential data issues that could be used to help your source system fix the data issues and also to help in suggesting the rules for fixing data issues to your business.

  • Plan the availability of business resources throughout the project.

  • Prioritize your data issue resolution process and requirements for the phased approach.

  • Plan training on the EDQ for your business users and development team.

  • Collaboration with business is critical for data analysis, decision making on data conflict resolution, and defining the business rules for data quality.

  • Do not over-engineer on the data preparation solution. Focus on data quality rules required to solve most of the issues. Note that applying data quality rules is not a one-time need and it must be planned as an ongoing activity to support your data governance.

  • Avoid automation for the initial data load conversion. Instead, focus on the analysis and rules.

  • Do not miss out on the historical data alignment along with your data migration. For example, if you are consolidating or merging customers to form a single customer record, historical data should also be pointing to your new customer record. This can be a post-implementation task.

  • Plan to design and build the data preparation solution as a repeatable process that can be used in your phased implementation and also for your merger & acquisition tryouts.

  • Focus on solving the critical and high priority data issues, and plan your time wisely on the low priority of issues. Try to build rules that will fix the data issues quickly by collaborating with business users. The focus should be on time-to-results.

Conclusion

With adequate planning and execution of data preparation for your cloud transformation, you are mitigating the risk of project delays and increased implementation cost.

EDQ helps in developing repeatable processes for migrating clean data to Oracle fusion applications cloud. It will pro-actively make you aware of your source data issues that can be resolved by configuring data quality rules and automate as needed based on your business priorities prior to loading into Oracle fusion applications cloud.

While EDQ is an Oracle tool that can be leveraged, you could use other similar tools if you already have as well. But the most important take away from this blog is that you must plan data preparation in the early stage of your project.

References

Oracle Enterprise Data Quality

Get started with your EDQ in Oracle Infrastructure Cloud

Bala Mahalingam

Consulting Solution Architect A-Team

Bala has over 27 years of techno-functional and hands-on product development, implementation, solution architecture  and consulting delivery experience with most of his career spent in architecture, design, development and deployment of applications and technology both on-premises and cloud. His expertise on data management, data integration, data quality and data governance has enabled him to help several customers globally in Hi-Tech, healthcare, financial, life-sciences, automotive and manufacturing industries.


Previous Post

BI Cloud Connector – Further Automation Options

Ulrich Janke | 16 min read

Next Post


Fortinet FortiGate - Hub & Spoke architecture

Ionut Neubauer | 5 min read