A CIOs checklist in building a Modern Data Platform.

January 18, 2023 | 17 minute read
Nick Goddard
Senior Director | Cloud Solution Architecture for Data Management and Analytics.
Text Size 100%:

 

Data Platform

I often get the opportunity to speak with CIOs when their organisations are embarking on a Modern Data Platform initiative, or they’re in the middle of one. The work that the A-Team does grants us the opportunity to help customers with their transformation projects, both traditional as well as innovative. During these engagements our scope and role can be diverse as each customer has a unique ask, but commonly we are here to plug a knowledge gap of some description.

When engaged, we provide architecture guidance, architecture reviews and best practice leadership to help shape how the design is optimally produced. We won’t do the system integrator’s job, but we do work with them to provide a guiding hand to produce the best solution for the customer based upon Oracle’s best practice.

From our experience when issues do occur, they are not always down to technical/configuration skills; though this is number two of the most common causes. In fact, lack of planning and awareness to the interdependencies involved in scoping and designing a a multifacited solution is number one.

Because a Modern Data Platform brings together a multitude of technologies that operate seamlessly together with business processes wrapped around them, the failure points could be many, unless done correctly.

Reviewing the information available on adopting a Modern Data Platform, there is an abundance of technical information showcasing the configuration and interoperability at a technology layer, however what is not readily available is a very simple checklist on what to work on when adopting such strategy. It was this discovery that was the stimulus for this blog.

I’ll endeavour to cover the important aspects to help those that are thinking about how all of this can be pulled together. This may not be an exhaustive list, but it is intended to highlight the key areas of designing a Modern Data Platform for any organisation from a business perspective.

Synopsis.

A Modern Data Platform is a comprehensive architecture for accessing, organising, managing, and analysing data. It typically includes a variety of technologies: storage, data processing, data governance, data integration, security, and data visualisation. The objective of this architecture is to provide a single, integrated environment for storing, processing, and analysing all classes of information; structured and un-structured, regardless of where it comes from/resides, or how it is used.

By adopting this approach, it will enable organisations to obtain insights from their information faster, thus enabling them to make better informed decisions.

Building a Modern Data Platform involves several key steps:

  1. Defining the business objectives and use cases.
  2. Assessing the current state of the organisation's data architecture and infrastructure.
  3. Developing a strategic roadmap that aligns with the overall business strategy.
  4. Selecting the right vendor/technology stack.
  5. Implementing data governance and security.
  6. Implementing data integration and interoperability.
  7. Building a data science and machine learning capability to support advanced analytics and AI initiatives.
  8. Building a Data Culture.
  9. Monitoring and Analytics.

 

Defining the Business Objectives and Use-Cases.

Defining the business objectives and use cases for a Modern Data Platform is important for a number of reasons. It helps to ensure that the data platform is aligned with the organisation's overall business strategy and goals. By clearly identifying the business objectives and use-cases, organisations can make sure that the data platform is being built to support key business initiatives and deliver value.

  1. Identify the key business problems or opportunities that the data platform will address: These could include improving customer experiences, increasing efficiency, reducing costs, or driving innovation.
  2. Identify the key stakeholders within the organisation who will be using the data platform and understand their needs and requirements. This will include, but not limited to business users, data analysts, data scientists, and IT staff.
  3. Identify the specific business processes or activities that the data platform will support, including any data-driven decisions or actions that will be taken as a result of using the platform.
  4. Determine what types of data will be needed to support the identified use cases, as well as where that data will come from. This may include internal data sources, such as transactional or operational data, as well as external data sources, such as public data sets or data from partners or customers.
  5. Determine the types of data processing and analytics that will be needed to support the identified use cases. This may include batch processing, real-time streaming, machine learning, or other advanced analytics techniques.
  6. Determine the policies and procedures that will be needed to manage and protect the data on the platform, as well as any compliance requirements that must be met.

 

Assess the current state of the organisation's data architecture and infrastructure.

Assessing the current state of the organisation's data architecture and infrastructure before implementing a data platform can help identify any existing issues or constraints that may impact the platform's effectiveness, ensuring the platform aligns with the organisation's goals and objectives, and provide a baseline for measuring the platform's performance and success.

  1. Conduct a data inventory identifying all data sources within the organisation, including structured, semi-structured and unstructured data, and understand how the data is being used and by whom.
  2. Assess the quality of the data, evaluating the completeness, accuracy, consistency and timeliness of the information held; identifing and prioritise data quality issues that need to be addressed.
  3. Analyse all data flows within the organisation and identify any bottlenecks, duplication or data silos.
  4. Evaluate and analyse the organisation's current data architecture and infrastructure including databases, data warehousing, data lakes, and cloud-based services to understand their strengths and limitations.
  5. Assess the current data security and compliance policies and procedures and identify any gaps or vulnerabilities.
  6. Measure the performance of the current data architecture and infrastructure, including data processing and query times, to understand where improvements can be made.
  7. Gather feedback from key stakeholders including data analysts, data scientists and business users about their experience with the current data architecture and infrastructure.
  8. Benchmark against industry standards, comparing the current state of the organisation's data architecture and infrastructure against the organisations industry.
  9. Identify gaps and opportunities by using the findings from the above steps to identify gaps and opportunities in the organisation's data architecture and infrastructure.

 

Develop a strategic roadmap for the Modern Data Platform that aligns with the overall business strategy.

Developing a strategic roadmap for a Modern Data Platform will help align with the overall business strategy. This is important as it will make sure the platform will achieve the specific goals and objectives set by the business and IT. The roadmap will help with prioritising initiatives, identifying dependencies & defining metrics. By having a robust roadmap in place will provide a way to continuously monitor, and if required, adapt the platform services to respond to changing business needs.

  1. Understand the overall business strategy and objectives of the organisation and ensure that the data platform strategy aligns with them.
  2. Identify and assess the key use cases for the data platform that will support the organisation's business objectives.
  3. Develop a clear vision for the data platform that outlines how it will support the organisation's business objectives.
  4. Define specific goals and objectives for the data platform that align with the overall business strategy.
  5. Prioritise initiatives that will be needed to achieve the goals and objectives.
  6. Identify dependencies and potential roadblocks that may impact the implementation of the initiatives.
  7. Establish a timeline for the initiatives, including milestones, deliverables, owners and the programme governance.
  8. Define metrics to measure the success of the project and track progress towards achieving the goals and objectives.
  9. Continuously monitor all services and adapt the strategy as needed to ensure alignment with the overall business strategy and to respond to changing business needs, as and when they occur.
  10. Communicate the strategy to key stakeholders across the organisation and ensure buy-in and alignment.

 

Selecting the right Vendor and Technology Stack.

Selecting the right vendor and technology stack is important as it will ensure the platform is able to handle the organisation's data needs now and in the future. They will need to support scalability, provide robust data management, data governance, security, deep learning and analysis, to meet performance requirements and align with the overall business strategy.

Ideally you will need a vendor that has technology which can handle all these aspects including large volumes of data; structured and un-structured; high concurrency, while at the same time being cost-effective and easy to manage.

Choosing the right stack for your needs, for both now and in the future can help to ensure that the data platform can keep pace with your growth and changing requirements.

Adopting the right approach can help reduce development complexity, lower the cost of ownership, and accelerate the time to market.

As organisations often have multiple data sources and systems, it is important that the data platform can communicate and work seamlessly with other systems. Choosing the right approach can help to ensure that the data platform can integrate with existing systems and support data flow.

  1. First and foremost, the technology stack should support the business objectives and use cases that have been defined for the data platform.
  2. It needs to be able to handle the types of data that will be used on the platform, as well as being able to integrate with the identified data sources seamlessly.
  3. The solution has to able to support the required data processing and have analytic capabilities, including batch processing, real-time streaming, machine learning, or other advanced analytics techniques.
  4. Having the capability to scale up/down as needed to support the changing needs of the organisation is important to manage expectation; performance and reliability as well as controlling operational costs.
  5. Being operationally cost-effective and providing a good return on investment is required. Don’t just look at the monthly/usage fee’s but also look at cost to operate form an employee perspective. Do you have the right skills in the business?
  6. The technology stack should have strong support and resources available, including documentation, tutorials, and a community of users. If you need to provide extensions to the core capability from the vendor, do you want your stack to support a declarative, or imperative approach?

 

Data Governance and Security.

Data governance and security are important to organisations for compliance and protection. An obvious point, but one I want to lead with. Data governance will help to ensure the quality, accuracy, and integrity of data, that will seamlessly integrate with and support governance/compliance management tools, which is essential in demonstrating compliance. Aside to this, it is also important because inaccurate or inconsistent data can lead to poor decision making, wasted resources, and reputational damage. Data governance will also help organisations to manage and maintain their data assets, which can be critical for reporting, and other important business functions.

Having a robust data security solution that integrates and supports the data platform is essential for protecting sensitive or confidential information from unauthorised access, use, or disclosure. Organisations may have regulations that require to protect certain type of data like Personal Identifiable Information (PII) or financial data. Without proper data security measures, organisations can be at risk of data breaches, cyber-attacks, and other security incidents that can lead to significant financial losses, legal liabilities, and damage to their reputation. Even contained within the data platform with perimeter security in place leveraging unstructured data to run Big Data queries, needs to be carefully addressed given potential PII data that could be held within.

Data Governance and Data Security are essential for maintaining customer trust, as a data security breach can lose the customers trust and lead to loss of business.

  1. Define and establish clear policies and procedures for managing and protecting data on the platform. This may include guidelines for data quality, security, privacy, and compliance.
  2. Identify the individuals or teams who will be responsible for managing and overseeing the data on the platform. This may include data-stewards who are responsible for ensuring data quality and data owners who are responsible for the overall management of specific data sets.
  3. Choose and implement data governance tools that support the policies and procedures that have been defined. These may include tools for data discovery, classification, and lineage: data cataloguing, as well as tools for data management & metadata management.
  4. Provide training and education to stakeholders on the data governance policies, procedures, and tools. This will help ensure that everyone involved understands their roles and responsibilities and can effectively use the data governance tools. PII needs to be carefully approached, especially if one needs to pull data into Big Data analysis; using the right Technology approach: ETL vs ELT is a casing point. See my previous Blog on this topic.
  5. Regularly review and monitor the data governance process to ensure that it is effective and aligned with the needs of the organisation. This may involve conducting audits or assessments, as well as making updates to the policies and procedures as needed.

 

Data Integration Strategy.

A good data integration strategy can bring several benefits to an organisation. By integrating data from different sources and systems, organisations can classify, catalogue, and visualise all data sets, to help make sure that that the data is accurate and consistent, through the use of metadata management. Integrating data from multiple sources allows organisations to automate processes and eliminate manual task to augment information, which can save time and reduce errors thus increasing efficiency in operation.

With access to a wider range of data from different sources, organisations can gain a more complete and accurate view of their operations: customers, and markets, leading to better-informed decisions. This normally involve the use of Data Lakes,  Data Warehouses; combined together know as a Data Lakehouse or to enable data flow between systems.

A Data Lake and Data Warehouse from a technology stack perspective, from Oracle.

Data Lake and Data Warehouse

 

Putting it together. Although this is not a technical product specific blog, I wanted include a schematic to help you visualise how Oracle can deliver a complete Modern Data Platform solution.

MDP

 

From a customer management perspective: acquisition and retention, integrating data from various customer touchpoints, such as social media, events, sensors, website interactions, and call centre logs, can provide a more holistic view of the customer, which can help organisations improve their customer service and support. By integrating data allows for the simplification reporting and analysis, as data is readily available and easily accessible from a single source. Integrating data can also facilitate collaboration and data sharing across different departments and teams, which can help organisations make better use of their data assets and improve their performance.

  1. Identify all the data sources that will be integrated with the platform, including both internal and external repositories. This may include, but not limited to transactional data, operational data, public data sets, events, sensor-data, and data from partners or customers.
  2. Define the specific requirements for integrating the identified data sources, including the types of data that will be needed, the frequency of data updates, and any necessary transformations or cleansing of the data.
  3. Choose the data integration approach that best fits the needs of the organisation, considering factors such as the volume and complexity of the data, the required data processing and analytics capabilities, and the level of integration that is needed. Options may include extract, transform, and load (ETL), or extract load and transform (ELT) tools, data lakes, data warehouses, and/or APIs.
  4. Design the overall data integration architecture, including the data flow between applications, systems and source endpoints and the infrastructure needed to support it. This may involve designing logical and physical data models and choosing the appropriate data storage and processing technologies.
  5. Implement the data integration solution and test it to ensure that it is working as expected and meeting the defined requirements.
  6. Regularly monitor and optimise the data integration process to ensure that it is efficient and effective. This may involve making updates, changes or fine-tune data flows on the fly, without waiting executions to complete.  It all depends on your use-case and SLA’s back to the business.

 

Build a data science and machine learning capability to support advanced analytics and AI initiatives.

Data science plays a crucial role in analytics by providing the tools and techniques necessary to analyse large sets of data, identify patterns, and make predictions about future events or outcomes. The field of data science includes a wide range of techniques and methodologies such as statistics, machine learning, and data visualisation, which can be used to analyse data from various sources, such as text, images, and sensor data. By using data science techniques, organisations can gain valuable insights and make data-driven decisions that can help improve their operations, increase revenue, and gain a competitive edge. Additionally, analytics can also be used to identify potential risks and opportunities, allowing organisations to proactively address them before they occur. This makes data science a powerful tool for any organisation looking to improve their bottom line.

  1. Develop a clear strategy for building a data science and machine learning capability that aligns with the organisation's overall business objectives and use cases.
  2. Assess the current data science and machine learning capabilities within the organisation, including skills, tools, and infrastructure.
  3. Identify the specific skills and expertise needed to support the organisation's advanced analytics and AI initiatives.
  4. Look to bring in a data scientist, machine learning engineers, and other relevant professionals with the necessary skills and expertise. Provide ongoing training and development opportunities to ensure that the team stays current with the latest technologies and trends.
  5. Evaluate the infrastructure to verify if your current deployment support data science and machine learning initiatives, including data storage, computing resources, and development environments.
  6. Establish a governance framework to ensure that data science and machine learning initiatives are aligned with the organisation's data governance policies and comply with relevant regulations.
  7. Foster collaboration between the data science function, the business and IT/DevOps within the organisation, communicating the strategy and capabilities to key stakeholders within the organisation to ensure buy-in and alignment.
  8. Implement best practices for data science and machine learning, including project management, version control, code reviews, and testing.

 

Define and build a Data Driven Culture.

Building a data-driven culture is important for a number of reasons as it allows organisations to make decisions based on evidence rather than intuition, or opinion. This can lead to more accurate and effective decision-making, improve outcomes and drive business success. Additionally, a data-driven culture can help organisations to better understand and serve their customers, identify new opportunities for growth and optimise their operations. Furthermore, a data-driven culture can also help to identify and mitigate risks more effectively.

It also can lead to more efficient processes, lower costs and employees can work with more trust and transparency. The Data-driven culture is becoming essential as data and analytics are playing more important roles in decision making of companies and organisations.

  1. Make sure that everyone within the organisation understands the value of data and how it can be used to drive business decisions and actions.
  2. Define clear roles and responsibilities for data management and analysis within the organisation, including data stewards, data analysts, and data scientists.
  3. Offer training and education programs to help employees develop the skills and knowledge needed to work with data effectively. This may include training on data literacy, data analysis tools and techniques, and data governance policies and procedures.
  4. Encourage a culture of using data to inform decision making at all levels of the organisation. This may involve setting up data-driven decision-making processes and incentivising employees for using data to drive business results.
  5. Encourage collaboration and cross-functional working within the organisation to ensure that data is shared and used effectively.
  6. Track and measure the success of the data-driven culture by setting key performance indicators (KPIs) and regularly reviewing progress against them. This will help identify areas for improvement and ensure that the data-driven culture is having a positive impact on the organisation.

 

Define and implement Monitoring and Analytics.

A monitoring system will ensure the availability and performance of the data platform by providing real-time visibility into the performance and help identify and diagnose issues that may impact the service/platform. With the right tooling it could also help to identify potential issues before they become critical. A monitoring system can provide early warning of potential issues and allow organisations to proactively address them, preventing them from becoming major problems.

Gaining insights from your data with a robust analytics platform will allows organisations to integrate structured and unstructured data easily and seamlessly from various sources, such as databases, data lakes, and cloud services. This enables a unified view of data, which can be used to deliver intelligence and (hopefully) will improve decision-making.

Monitoring.

  1. Implement monitoring tools that can track the performance and availability of the data platform across all storage locations. These tools may include metrics such as data transfer times, query response times, and data availability.
  2. Implement log management to collect, analyse, and visualise log data from all storage locations. This can help identify issues and trends and facilitate root cause analysis.
  3. Conducting regular audits and assessments will ensure that the data estate is functioning correctly and meeting the needs of the organisation. This may involve correlating performance metrics, testing data integrity, and evaluating data governance practices.
  4. Establish a process for tracking and resolving issues that arise on the data platform. This may involve creating a ticketing system or using a project management tool to track and prioritise issues.
  5. Regularly review and optimise the data platform to ensure that it is meeting the needs of the organisation and delivering value. This may involve making updates to the platform, such as adding new data sources or implementing new data processing or analytics capabilities.

 

Analytics.

  1. Understand the types and sources of data that will be used for analysis. This includes structured and unstructured data, internal and external data, and real-time and batch data.
  2. Consider the technologies and infrastructure required for data processing, such as batch processing, streaming, and real-time analytics, as well as data storage, such as a data warehouse or data lake.
  3. Understand the needs and requirements for data visualisation and reporting, including the types of charts, graphs, and dashboards that will be used, as well as the level of interactivity and self-service required.
  4. Understand the governance and security requirements for your analytics platform, including data quality, data lineage, data access control and data encryption.
  5. Understand the specific use cases and user requirements for your analytics platform, including the types of analyses that will be performed, the types of users who will use the platform, and the level of collaboration required.
  6. Consider the scalability and flexibility of the analytics platform, including its ability to handle large volumes of data and adapt to changing business requirements over time.
  7. Understand the need for integration with other systems and tools, such as business intelligence and data science platforms, and have a plan for how to achieve this.
  8. Understand the cost and resources required to implement, maintain and upgrade the analytics platform, including hardware and software, personnel, and ongoing maintenance and support.
  9. Understand the pros and cons of cloud-based or on-premise analytics platform, and choose the right option that matches your company needs.
  10. Understand the technical skills required to build, maintain and operate the analytics platform, and plan for how you will acquire and retain those skill

 

Conclusion.

Using a Modern Data Platform to run your business operations can provide a number of benefits. It allows for faster and more efficient data processing, enabling real-time decision making. The ability to handle large volume and variety of data, allows for more comprehensive analysis, and can uncover hidden patterns and insights that would be otherwise unavailable. A Modern Data Platform can also provide better scalability and flexibility, allowing you to easily add or remove data sources as needed. Additionally, it allows for better collaboration and data sharing across teams and departments, which can improve overall business operations. It also enables automation and self-service capabilities, allowing your business to become more agile and responsive to changing market conditions and customer needs. Overall, using a Modern Data Platform can help you gain a more comprehensive understanding of your business operations and make better use of your data, leading to improved performance and competitiveness.

Nick Goddard

Senior Director | Cloud Solution Architecture for Data Management and Analytics.

Nick is a Senior Director in the Product Development A-Team at Oracle.


Previous Post

Event based Streaming Integration for SCM Items Load

Shreenidhi Raghuram | 6 min read

Next Post


Provide Remote Worker Access with OpenConnect using VyOS

Jake Bloom | 3 min read