A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India

Abstract The concerns about water contaminants affect most developing countries bypassing rivers over them. The issue is challenging to introduce water quality within the allowed limits for drinking, industrial and agricultural purposes. To tackle this issue, we developed a new tool that harnesses o...

Full description

Saved in:
Bibliographic Details
Main Author: Abhijeet Das
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Discover Sustainability
Subjects:
Online Access:https://doi.org/10.1007/s43621-025-01464-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849420438268018688
author Abhijeet Das
author_facet Abhijeet Das
author_sort Abhijeet Das
collection DOAJ
description Abstract The concerns about water contaminants affect most developing countries bypassing rivers over them. The issue is challenging to introduce water quality within the allowed limits for drinking, industrial and agricultural purposes. To tackle this issue, we developed a new tool that harnesses optimization models, enhancing the reliability and accuracy of water quality assessments. Our research in Mahanadi River Basin, Odisha, presents an enhanced methodology based on data, specifically designed to be beneficial for Water Quality (WQ) based on Synthetic Pollution Index (SPI) and machine learning models such as Long Short-Term Memory (LSTM) and Sparrow Search Algorithm (SSA), for its analysis and interpretation of extensive, intricate data sets on water quality, as well as the allocation of pollution sources or contributing elements, in order to improve knowledge of the water quality and the planning of monitoring networks for efficient water resource management. A Spatial distribution map using Geographical Information System (GIS) is drawn, which is necessary to provide a full grasp of the actual contamination of the river. These approaches were additionally applied to the water quality datasets, generated during two-year period namely, 2022–2024, at nineteen different sites for 21 parameters. From the results, it produces two metrics, TKN and coliform, that are higher than WHO guidelines while maintaining the optimal level of DO throughout the duration of the study. As addressed by SPI model, 10 samples (52.63%) are categorized under very pure zone, 5 samples (26.32%) are categorized under slightly polluted and around 21.05% (4 samples) points towards moderately polluted category. The results highlighted that water quality is degraded at 4 places, is because of human activities, land use and industrialization. The LSTM–WQI outcomes signified that 42.11% of tested locations are categorized under excellent, 15.79% of the examined sites are categorized as good, and rest, 10.53% and 31.58% corresponds to fair and marginal water category. This model shows organic contaminants, leaching from the soil, and commercial sites for waste disposal. The graphic representations obtained by SSA–WQI using GIS map emphasize that the range of the estimated result was between 16 and 92, declaring in a region of good to poor water quality. Noted that, six places had the finest water quality because there had been no modifications to the land’s use. It was shown that the primary causes of the decline in river quality at poor locations were the nutrients group (agricultural runoff), as well as the elements of alkalinity, hardness, EC, and solids (soil leaching and runoff process). In conclusion, this work showed that the water was potentially hazardous to health of the consumers at some investigated locations and highlighted the need to treat industrial and municipal wastewater. Therefore, this innovative research recommends integrating these modelling approaches due to their consistent and reliable results, which can reduce both the time and cost of analysis and be applied locally and globally using similar methodologies. Graphical Abstract
format Article
id doaj-art-5bc99826dce24bc98201c8d58b6adf25
institution Kabale University
issn 2662-9984
language English
publishDate 2025-06-01
publisher Springer
record_format Article
series Discover Sustainability
spelling doaj-art-5bc99826dce24bc98201c8d58b6adf252025-08-20T03:31:45ZengSpringerDiscover Sustainability2662-99842025-06-016113610.1007/s43621-025-01464-7A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, IndiaAbhijeet Das0Department of Civil Engineering, C. V. Raman Global University (CGU)Abstract The concerns about water contaminants affect most developing countries bypassing rivers over them. The issue is challenging to introduce water quality within the allowed limits for drinking, industrial and agricultural purposes. To tackle this issue, we developed a new tool that harnesses optimization models, enhancing the reliability and accuracy of water quality assessments. Our research in Mahanadi River Basin, Odisha, presents an enhanced methodology based on data, specifically designed to be beneficial for Water Quality (WQ) based on Synthetic Pollution Index (SPI) and machine learning models such as Long Short-Term Memory (LSTM) and Sparrow Search Algorithm (SSA), for its analysis and interpretation of extensive, intricate data sets on water quality, as well as the allocation of pollution sources or contributing elements, in order to improve knowledge of the water quality and the planning of monitoring networks for efficient water resource management. A Spatial distribution map using Geographical Information System (GIS) is drawn, which is necessary to provide a full grasp of the actual contamination of the river. These approaches were additionally applied to the water quality datasets, generated during two-year period namely, 2022–2024, at nineteen different sites for 21 parameters. From the results, it produces two metrics, TKN and coliform, that are higher than WHO guidelines while maintaining the optimal level of DO throughout the duration of the study. As addressed by SPI model, 10 samples (52.63%) are categorized under very pure zone, 5 samples (26.32%) are categorized under slightly polluted and around 21.05% (4 samples) points towards moderately polluted category. The results highlighted that water quality is degraded at 4 places, is because of human activities, land use and industrialization. The LSTM–WQI outcomes signified that 42.11% of tested locations are categorized under excellent, 15.79% of the examined sites are categorized as good, and rest, 10.53% and 31.58% corresponds to fair and marginal water category. This model shows organic contaminants, leaching from the soil, and commercial sites for waste disposal. The graphic representations obtained by SSA–WQI using GIS map emphasize that the range of the estimated result was between 16 and 92, declaring in a region of good to poor water quality. Noted that, six places had the finest water quality because there had been no modifications to the land’s use. It was shown that the primary causes of the decline in river quality at poor locations were the nutrients group (agricultural runoff), as well as the elements of alkalinity, hardness, EC, and solids (soil leaching and runoff process). In conclusion, this work showed that the water was potentially hazardous to health of the consumers at some investigated locations and highlighted the need to treat industrial and municipal wastewater. Therefore, this innovative research recommends integrating these modelling approaches due to their consistent and reliable results, which can reduce both the time and cost of analysis and be applied locally and globally using similar methodologies. Graphical Abstracthttps://doi.org/10.1007/s43621-025-01464-7MahanadiWater qualitySynthetic Pollution IndexLong Short-Term MemorySparrow Search AlgorithmLeaching
spellingShingle Abhijeet Das
A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India
Discover Sustainability
Mahanadi
Water quality
Synthetic Pollution Index
Long Short-Term Memory
Sparrow Search Algorithm
Leaching
title A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India
title_full A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India
title_fullStr A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India
title_full_unstemmed A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India
title_short A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India
title_sort data driven approach utilizing machine learning ml and geographical information system gis based time series analysis with data augmentation for water quality assessment in mahanadi river basin odisha india
topic Mahanadi
Water quality
Synthetic Pollution Index
Long Short-Term Memory
Sparrow Search Algorithm
Leaching
url https://doi.org/10.1007/s43621-025-01464-7
work_keys_str_mv AT abhijeetdas adatadrivenapproachutilizingmachinelearningmlandgeographicalinformationsystemgisbasedtimeseriesanalysiswithdataaugmentationforwaterqualityassessmentinmahanadiriverbasinodishaindia
AT abhijeetdas datadrivenapproachutilizingmachinelearningmlandgeographicalinformationsystemgisbasedtimeseriesanalysiswithdataaugmentationforwaterqualityassessmentinmahanadiriverbasinodishaindia