Effective decision making and operational efficiency are the major driving factors for the manufacturing segment. With the help of this blog, I am sharing my thoughts on having a centralised approach to data quality, and that we at HCLTech ERS have put the right steps towards it.
Let us look at the three reasons, why it is necessary, especially for the manufacturing segment, to have a standard approach to data quality:
- Manufacturing segment leads the data race
- Data quality management is the differentiating factor
- Chance to accelerate maturity for managing data quality
Let us look at the first reason, manufacturing sector leads the data race. The rise of new digital industrial technology, known as Industry 4.0. As per BCG, Industry 4.0 “is a transformation that makes it possible to gather and analyse data across machines, enabling faster, more flexible, and more efficient processes to produce higher-quality goods at reduced costs.“
This transformation is based on a set of technologies, in other words, the pillars of Industry 4.0, and they are:
- System Integration
- Cloud Computing
- Big Data & Analytics
- Augmented Reality
Let’s take the example of remote sensing driven by IoT devices, which is the most prevalent and sought-after capability. There are many triggers which could influence the quality of sensor data including difficult conditions under which the sensors operate.
A basic expectation from monitoring systems is that it has to be reliable for decision making. Sensors are susceptible to provide unreliable information which is critical as data can be promulgated downstream and misinterpreted. In the real-world scenario, no system can guarantee data quality. Hence, we focussed towards providing a monitoring perspective with metrics on data quality, by defining, evaluating, and communicating the metrics on data quality management.
When it comes to generating data, no sector of the economy can match manufacturing. The manufacturing segment is by far the biggest contributor to new data creation:
- As per McKinsey global institute analysis, manufacturing sector created 2000 PB of new data in 2010.
- As per IDC report the sheer amount of data generated doubles every two years.
- Extrapolating on the same basis, new data creation by the manufacturing sector in 2018 is estimated to be about 52 EB and about 875 EB by 2025. This doesn’t include data used for diagnosis.
It’s logical to say, with the exponential increase in data volume, we can expect an exponential increase in related data issues as well. The next data wave shall be triggered by IoT devices and the manufacturing sector is the primary adopter of the technology. IoT allows manufacturers to use real-time data from sensors to track parts, monitor machinery, and guide actual operations. Major organisations have already made a decisive step towards utilising Big Data for efficient handling of such high volumes, and a centralised approach for data management is a necessity to cater to the workloads.
Next, Quality data is a decisive factor. There are a number of specialised systems working together to provide the necessary data quality analysis elements to enable operational intelligence.
This includes and not limited to:
- DCS instruments
In order to create a common data layer, this data is extracted and ingested into a data repository – like a data warehouse, data lake, or a data mart. This data repository serves as a single source of truth for downstream analysis and processing. Enterprise Manufacturing Intelligence components like Visualisations, Alerts, KPIs, SPC and security elements are some of the direct consumers of the common data layer.
Considering the case wherein the data being ingested to common data layer is not meeting the data quality standards, specifications, or for some reason, the vital elements are incomplete, the downstream analytics will be directly affected by it and could lead to dire consequences.
Moving to big data platform will not be sufficient to enable effective decision making, unless you have trust on the data that you are working with. With an exponential increase in data, problems with the data quality in manufacturing sector also increase exponentially if not factored/handled properly.
With an exponential increase in data, problems with the quality of data also increase exponentially if not handled properly.
As per a CDO survey conducted by Experian, the Top 5 consequences of inaccurate data includes: difficulty using data to drive strategic decision making, regulatory risk, customer experience is less optimal, potential brand damage, loss of revenue opportunities & distrust in the decision being made.
As per the same survey, the factors contributing to data inaccuracies include human error, lack of internal communication, inadequate data strategy, lack of relevant technology, and skills leading to Incomplete data, data delays, duplicates, invalid values, inconsistent data and inaccurate data.
If the data that is being analysed is not trusted, then it doesn’t matter how good the models are, the output is never going to reap the results. After all, “Garbage in / garbage out”.
The third and final reason, accelerate maturity for managing data quality.
As per the Experian study which included a survey with CDO from major global organisations:
- 25% of data is inaccurate, US companies believe on average
- 91% of organisations are affected by common data errors
- 66% of organisations lack a coherent, centralised approach to data quality
- 53% of organisations use manual methods for data management
It takes a considerable amount of time, planning, and effort to move from inactive, to reactive, to proactive, to optimised levels of maturity. HCLTech’s offering could be the first step or a means to accelerate this maturity for managing data quality.
For this, we tap and integrate at the points wherein the large data movement happens using a framework driven approach. Here, the data owners can define the data and related rules; the framework will integrate to the data source and evaluate the quality metrics and provide the capability to visualise the data quality metrics for further analysis.
The benefits of following such an approach include:
- Extensibility, as the solution can be deployed on top of existing Data lake, Data Mart framework, and exposes standard interfaces/connectors to the existing systems
- Productivity, as the solution enables ease-of-use and non-scripting functionality. Our solution radically increases the productivity of the data engineers, helping them create and setup DQ assessment pipelines 70% faster and producing valuable insights.
- Standardisation, the solution provides a standard implementation which is easy to roll-out, execute and support.
To summarise my final reason, there is definitely a scope to catch up and march ahead, this is the chance to accelerate maturity for managing data quality.
- BCG: https://www.bcg.com/en-in/capabilities/operations/embracing-industry-4.0-rediscovering-growth.aspx
- McKinsey Global Institute analysis:https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_exec_summary.ashx
- IDC Report: https://www.emc.com/about/news/press/2011/20110628-01.htm
- State of Data Quality: https://www.experian.com/assets/decision-analytics/white-papers/the%20state%20of%20data%20quality.pdf