Type to SearchView Tags

A Practical Guide for Data Management Processes and Tools/Technologies to Use
Kannan Mahendramani Senior Solution Architect, Digital & Analytics | October 13, 2020
392 Views

Per Data Science survey (Crowdflower’s Data Science Report), it is found that 80% of the time spent in data analytics is dedicated toward data preparation/processing. There are various processes involved in preparing data for consumption via business intelligence reporting, data analytics, and data visualization, among others. The way data is prepared will often determine whether the data will be consistent and accurate for consumers receiving reports or viewing dashboards. This guide will cover all aspects of the data management process— identifying data sources, understanding data quality, wrangling the data, integrating the data, loading the data, testing the data, applying data governance rules, verifying the results, and making the data ready for consumption via data analytics, data visualization, business intelligence reporting, etc.

80% of the time spent analyzing data is dedicated to data processing.

For each process, we will explore what are the leading tools/technologies that can be leveraged and what are the HCL solution accelerators that will improve the productivity and reduce the data lifecycle timelines.

This guide can empower more business leaders to identify the tools and skills needed for their data management requirements, reducing reliance on IT. Additionally, this streamlines data processes because everyone in the business will be equipped with the knowledge needed to translate data and understand what decisions need to be made. After you read through this guide, you will have an astute understanding of the data management process that will help you stay ahead of the competition.

Per 2017 Gartner report, 60% data analytics projects fail due to non-alignment with business strategy or lack the right talent. Having a data literate staff saves time, energy, and money, which in turn, will help advance the current state of your operations. 85% of organizations see data as one of the most valuable assets to their business.

high quality data high quality data

Poorly managed data can have some significant impact, so it is important to choose the right tool for the right data requirement (data process):

  • Reporting on and complying with strict financial and data-retention regulations can lead to penalties.
  • The analytics that is used to deliver value to the business will see a quality drop that impacts decision-making quality of the business.
  • A lot of time can be lost in manual data processing/data preparation, meaning that many chances to add business value are gone.
  • Inconsistencies across subject areas/applications can sneak into reports, with huge potential effects for the business.

 

# Task Data Management Process (Terminology) Leading Tools / Technologies HCL Partnership HCL Accelerators
1 Understanding data relationships- what, where, and how; with easy search Data Discovery Mentis (iDiscover), Denodo, Azure SQL DB Alteryx, Mentis, Talend, Denodo, Microsoft Azure iSee, MetaWisdom
2 Identifying DQ issues in the current system; detecting and correcting inaccurate data; Data Profiling; implements data quality check; develops balance control and validation Informatica DQ, IBM InfoSphere QualityStage/ Information Analyzer, Talend Data Quality, Microsoft Data Quality Services Informatica, IBM, Talend, Microsoft Advantage DQ
3 Data architectures address data in storage, data in use, and data in motion; descriptions of data stores, data groups and data items; creating a data model (Conceptual, Logical, Physical); Data Architecture (Modeling) CA Erwin Data Modeler, Idera ER/Studio, Oracle SQL Developer Data Modeler, IBM InfoSphere Data Architect CA, IBM  
4 Connecting to diverse data sources (Abstraction layer) Data Integration (ETL), Data Preparation, Data Virtualization Informatica, IBM, Talend & other ETL tools, Denodo, Azure Databricks, Alteryx, Collibra Informatica, IBM, Talend, Denodo, Microsoft, Alteryx, Collibra Sketch
5 Data processing- Complex (Multiple joins, complex filters, derivations, Pl/SQL-stored procedures in transformation, data generation, capture historical changes in data, batch/stream processing) Data Integration (ETL) Informatica, IBM DataStage, SAP BODS, Talend, Oracle Data Integrator, Azure Event Hub, Azure Data Factory Informatica, IBM, Talend, Microsoft Sketch, nervIO
6 Data processing (Simple to medium complexity)- with filters to eliminate bad records, data aggregation, column derivations/arithmetic expressions, transformations (Standardization, data type conversions, business to technical logic conversion) Data Preparation; Data Virtualization Alteryx, Trifacta Wrangler, Denodo, Microsoft Power View / Power Query, Talend Alteryx, Denodo, Microsoft, Talend Sketch, nervIO
7 Data consolidation, data migration onto the target platform (Database or files or web service (On premises) Data Integration (ETL); Data Virtualization Informatica, Talend, IBM, Denodo Informatica, IBM, Talend, Denodo Sketch
8 Data consolidation, data migration onto the target platform (On cloud) Data Loading; Data Migration; Informatica, Talend, Microsoft Azure Data Factory, AWS Glue ETLnbsp; Informatica, IBM, Microsoft Sketch
9 Integrating multiple sources of data working in silos as a unified layer; access data in real time; short time to development with less budget; data distribution via jdbc/odbc/ web services (REST/SOAP) Data Virtualization Denodo, Tibco DV Denodo Tibco  
10 Storing data in structured and unstructured formats (On-premises/ Cloud platform) Data Storage Traditional: RDBMS like Oracle, SQL Server, Teradata
Big Data: Hortonworks, Cloudera, mapR
File/BLOB Storage in Cloud: Azure Blobs/Azure Data Lake Storage, Amazon S3, Snowflake
Structured Storage in Cloud: Azure SQL DB, Azure SQL Datawarehouse, AWS RDS, EDW on Redshift, Amazon DynamoDB, Snowflake DB/Warehouse
NoSQL DB: MongoDB, Cosmos DB
Event Streaming Platform: Confluent
REST APIs: JSON, XML
Microsoft, Snowflake, Google, MongoDB, Aerospike, memSQL, neo4j, influxdb, Confluent  
11 Principles and rules governing various types of data and maintaining data catalog, business glossary, data access controls, data privacy, data lineage Data Governance Collibra, Alation, Informatica, IBM, Talend Data Catalog, Denodo Data Catalog, Azure Data Catalog, AWS Glue Data Catalog Collibra, Alation, Informatica, IBM, Talend, Denodo, Microsoft Data Marketplace (DMP)
12 Maintaining a "single version of truth"- customer, product, material, etc. Master Data Management, Reference Data Management Informatica MDM, Informatica Customer 360, Informatica Product 360, Tibco EBX, stibosystems Informatica, Tibco, stibosystems  
13 Managing the life cycle of data from curation to retirement and certifying the trusted data; managing data quality Data Stewardship Collibra DGC, Informatica MDM, IBM Stewardship, Talend Data Stewardship Collibra, Informatica, IBM, Talend  
14 Data retention and data archival Data Archival/Deletion Informatica Data Archive (ILM), IBM InfoSphere Optim Informatica, IBM  
15 Protecting digital data by authentication and authorization Data Security Authentication - Windows LDAP, Active Directory, Azure Active Directory, Kerberos, SAML, OAUTH 2.0 Authorization - Database, ETL, Data Preparation, Data Virtualization internal product feature - Enable SSL, Data Access Controls, Imperva Imperva  
16 Data encryption; data anonymization Data Masking IBM InfoSphere Optim, Imperva Data Masking, Mentis (iMask), Informatica Data Privacy IBM, Imperva, Mentis, Informatica  
17 Interactive and modern visual representation of data in charts/dashboards Data Visualization Microsoft PowerBI, Tableau, Qlik, ThoughtSpot, Tibco Spotfire, Information Builders WebFocus, Yellowfin Microsoft, Tableau, Qlik, Thoughtspot, Yellowfin, Information Builders  
18 Data representation (Descriptive) Data Reporting Tableau, Microsoft, SAP, MicroStrategy, Tibco Spotfire, Qlik, r3 Tableau, Microsoft, Tibco, r3 Visual Library
19 Data analytics (Diagnostic, Predictive, Prescriptive) Data Science and Machine Learning Platforms SAS, Alteryx, Dataiku, Azure Data Lake Analytics (ADLA), HDInsight, Azure ML, Python, R Studio, Spark Alteryx, Dataiku, DataRobot, Databricks, H2Oai, ARRIA PerisKop, Embedded Analytics
20 To understand, manage, and maintain about data- Data Dictionary Metadata Management Informatica, IBM, Talend, Denodo Informatica, IBM, Talend, Denodo MetaWisdom
21 Test Data (Pre- and Post-Loads); Compare source vs target data sets Data Testing, Data Reconciliation Custom ETL Testing   GateKeeper
22 Monitoring/auditing data Data Auditing/Monitoring DI / DV tools in-built feature   iSee
23 Test data generation Test Data Generation Informatica, IBM, Talend Informatica, IBM, Talend