Introduction
Water usage is increasing at the rate of more than twice the rate of population growth worldwide. This has made supplying clean water at minimum cost a challenge for water management companies. Though it is difficult, this can be overcome by integrating big data and analytics in water management systems.
Use data analytics insights and intelligence for effective water management solutions.
Data collection and decision-making for water management
Today, most of the industries worldwide are data-driven. With the advent of various sensor-based devices and the Smart Metering Systems that capture data related to water flow, status of equipment, and other analytical data, utility industries that manage water supply across cities and countries can also reap similar benefits by using data and analytics.
Below are some of the remote sources from where these devices collect data related to water:
- Pumping stations
- Storage facilities
- Industrial customer locations
- Retail customer locations
Below are some of the Commercial off-the-shelf (COTS) products that are available in the market that receive data from different sources and help water managers in taking better and well-informed decisions:
- Supervisory Control and Data Acquisition (SCADA)
- Laboratory Information Management Systems (LIMS) and
- Computerized Maintenance Management Systems (CMMS)
These systems have the capability to store water data and then perform immediate actions based on the same. Below are some of such benefits that can be provided by these systems:
- Processing control at local or remote locations
- Monitoring, gathering, and processing real-time data
- Event recording with log files
Here, I will use one such data-driven product SCADA as an example to explain the data collection process that helps in better resource monitoring and improved decision-making by data visualization and data ingestion. The basic SCADA system has programmable logic controllers (PLCs) or remote terminal units (RTUs). These are microcomputers that can communicate with various objects like factory machines, human machine interfaces (HMIs), sensors, and other devices, and then route the information to computers with SCADA software. The simple diagram below explains how data is captured and flow is controlled in SCADA:

Modern SCADA systems utilize relational databases which seamlessly integrate with other ERP systems and store historical data for deeper analysis.
Need for advanced data and analytics beyond COTS product capabilities
Even though these systems have real-time capabilities like showing the current status of water level and giving a warning, the ability to predict any potential problems with the use of smart analytical platforms can be a game changer for any water management company.
With data coming from different sources on a real-time basis, there can be a significant amount of data that is neither being analyzed nor being used for deeper understanding of the system’s performance and making predictions. With growth in populations and the number of data sources that tells us about the changing needs of customers, it’s imperative to leverage big data and cloud technologies. SCADA software can also be used to help in capturing and combining various types of data for finding out the best water management techniques that can satisfy customers’ needs in the best way.
Data captured from systems like SCADA and other sources can be used for:
- Water quality management
- Wastewater treatment
- Leakage management and enhancing network operations
- Improving operational value by proactive maintenance
- Processing automation capability
Azure Cloud-based approach for building analytical applications for water management
The high-level architecture diagram below shows how data from internal and external sources can be ingested to create analytics in the Azure Cloud platform for a utility company that manages water resources.

As depicted in the above diagram, most of the capabilities can be fulfilled using cloud-native PAAS services. Power BI can be leveraged for developing reports and dashboards and Azure ML Studio can be used to build and manage the AI/ML models.
Data Ingestion: The Azure services mentioned below can be used for ingesting data from various water data sources:
- Azure Data Factory for ingesting structured data
- Event hub for ingesting unstructured and semi-structured data
- IoT hub for ingesting event data from IoT connectors
Data processing: For processing of both structured and unstructured data, Azure data factory and Azure Databricks can be used to provide the business logic required for transforming, cleansing, standardizing, enriching data.
Data Quality: As we can understand the source data comes from multiple systems, all of these ingested data from various source systems has to be assessed from a quality point of view and then needs to be considered for data transformation. The need for exclusive tool based solution for data quality can be implemented based on the high degree of bad data expected.
Data Storage: Azure can also be used for data storage and data transformation. Raw, transformed, and curated data can be stored in the Azure Data Lake Gen2 (for the unstructured and semi-structured data), while the target storage can be either Azure Synapse or Azure SQL Database for storing transformed and curated data for structured data types.
Visualization and Self-Service BI: The solution will facilitate business users to be able to use pre-defined data visualizations and create their own repots and dashboards by leveraging models developed in Power BI. This way, dependency on technology teams can be eliminated. Users will be able to perform their own data analysis by extracting data which can be leveraged as data visualizations and exported to various formats.
Data Science: Machine learning can be used for prediction for example, to predict soil moisture, rain and wind thereby helping farmers to determine ideal time for plant, irrigate and harvest.
Alert notification: Azure notification hubs can be used for sending real time alerts for example alert water leakage to users through various devices.