Type to SearchView Tags

Getting ahead with Data Platform in the Cloud
Parashuram Patil Principal Architect, Digital & Analytics | August 13, 2020
294 Views

Today, almost everyone wants to take advantage of the myriad of benefits that Infrastructure-as-a-code has, to overcome traditional challenges in terms of speed of provisioning, on demand scaling on either side and other operational challenges. Cloud brings in a significant amount of agility with pre-loaded services, platforms and ease of use. Data platform in the cloud, makes a huge difference in bringing analytics to the data consumers quickly and easily.

Typical challenges:

Below are some of the challenges that should be tackled when adapting a cloud solution for a Data platform or data migration:

Cloud brings in a significant amount of agility with pre-loaded services, platforms and ease of use

  • Regional data privacy compliance: The European Union for example, stipulates that its data cannot traverse to any other continent and vice-versa.
  • Lift and Shift vs Transformation: Is IAAS with Lift and Shift a better proposition, by moving existing Apps to Cloud or Transform the landscape with Cloud Native tools?
  • One-time large volume of data transfer: How do we plan and transfer large volumes of data to the cloud services without interrupting and affecting user experience?
  • Data producers vs data consumers: What is the cloud strategy for minimizing the data migration or transfer traffic between Data producers and Data consumers?
  • Cloud Locking: How does one build a cloud strategy to avoid getting locked to a Cloud?
  • Cost: How are costs controlled and governed in Cloud services?
  • Organization Readiness: On-prem lock ins, security concerns, Capital vs Expense debates

Strategy:

  • Below is the typical conceptual architecture for a data platform
    Data Platform

    Cloud provides the ability to decouple all components of data platforms, utilize this functionality to scale your applications and also prevent locking to any vendor tools or technologies.

  • Decouple storage from compute requirement and have different solutions for storage and different solutions for compute, so that we can scale them independently based on demand.
    • Some of the storage solutions are: AWS S3, Azure Blob etc.
    • Compute Solutions are: AWS EMR, Azure HD Insight, Data Bricks etc. (use the storage and compute from same cloud provider)
    • Vendor specific solutions are: Snowflake, Actian Avalanche etc.
  • Have a Database solution in place for aggregated and summarized data for dashboards and reporting performance needs; ensure that this data can be reproducible. Some of the typical databases are AWS Redshift, Azure SQL Warehouse or RDS (depending on the volume of data involved in the cloud data centers).
  • Avoid Lift and Shift of on-prem data management solutions to cloud and take advantage of the cloud solution.
  • Adapt ELT approach in cloud, against ETL.
  • Create a staggered plan to migrate data platform functionality to the cloud, identify inter-dependencies and plan properly to prevent interruptions on the user’s end.
    • Use transient solutions via the Data virtualizations tools as required.
    • Do parallel runs during transient time to avoid interruptions and for reconciliations.
    • Use ML based solutions to identify interdependencies and staggered planning.
    • Use automated reconciliation solutions.
  • Use streaming solutions to enable faster analytics.
  • Have different strategies for operational and analytical use cases.
  • In case any CDC solutions are enabled for replication, have CDC solutions implemented closer to the parent application, so that we avoid large data transfers.
  • Create patterns, templatize them and avoid creating individual components.
  • Decouple the solution components wherever possible.
  • Enable a proper Governance mechanism in the ecosystem.
  • Rationalize, Standardize and Prioritize the assets before moving to cloud.
  • Choose the tooling for the data platform architecture carefully, using the below parameters for tooling:
    • Interoperability
    • Portability
    • Scalability and performance
    • Inbuilt adapters and connectivity
    • Richness in functionality and ease of usage
    • Future roadmap and supportability
    • Performance benchmarks and references etc.

Learnings:

Cloud adaptation is a journey; and during this journey, there are some time-consuming activities

  • Cloud adaptation is a journey; and during this journey, there are some time-consuming activities, which we usually take lightly:
    • Creating the Cloud platform solution, internal security review and sign off before using any cloud services.
    • Determining the firewall tunnel capacity requirement between Organization data centers and cloud data centers and provision for bandwidth where needed.
    • Getting all required firewalls opened between the security zones.
    • Time in these activities varies, based on the maturity of internal services in customer places.
  • Cloud tools selection and governance: Usage patterns need to be identified and tools should be selected by evaluating these usage patterns so that the overall costs fall within budgets.
  • Migrate data that is needed and not all data, as well as plan for the data migration, parallel runs, reconciliation and retirement.
  • Plan adequate time to migrate legacy users to Cloud.
  • Organization Change management plays a very important role in the migration.
  • Expect performance issues to begin with, it takes some time to tune the environment.
  • Use the cloud accounts carefully, so that usage ownership is governed, and accountability is clearly established.

At HCL, we are invested in the Data First philosophy and have designed and developed accelerators that, as a cloud provider, can help organizations migrate to the cloud effectively and efficiently:

  • Advantage-Sketch: Helps in building data acquisition, ingestion and transformation pipelines based on configuration instead of construction in Spark, ADF, Informatica, Talend and other tools.
  • Advantage DQ: Helps in configuring driven data validation, data quality and data health check components to the integration framework.
  • iSee: Framework to bring all system, application, usage, DQ monitoring to single pane of window.
  • Advantage Gatekeeper: Helps in automated testing and reconciliation between applications.

Summary:

Like any other transformation projects, cloud migration is a journey and it will have its own challenges to overcome. Once the basic platform is setup, onboarding new tools/services becomes easy.

Cloud provides a lot of flexibility and scaling options and its low storage costs will help in preserving lots of historical data for analysis during cloud migration. Data platform in Cloud will enable different scenarios that bring ‘the art of the possible’ to the consumers.