A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India
Abstract The concerns about water contaminants affect most developing countries bypassing rivers over them. The issue is challenging to introduce water quality within the allowed limits for drinking, industrial and agricultural purposes. To tackle this issue, we developed a new tool that harnesses o...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Discover Sustainability |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s43621-025-01464-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849420438268018688 |
|---|---|
| author | Abhijeet Das |
| author_facet | Abhijeet Das |
| author_sort | Abhijeet Das |
| collection | DOAJ |
| description | Abstract The concerns about water contaminants affect most developing countries bypassing rivers over them. The issue is challenging to introduce water quality within the allowed limits for drinking, industrial and agricultural purposes. To tackle this issue, we developed a new tool that harnesses optimization models, enhancing the reliability and accuracy of water quality assessments. Our research in Mahanadi River Basin, Odisha, presents an enhanced methodology based on data, specifically designed to be beneficial for Water Quality (WQ) based on Synthetic Pollution Index (SPI) and machine learning models such as Long Short-Term Memory (LSTM) and Sparrow Search Algorithm (SSA), for its analysis and interpretation of extensive, intricate data sets on water quality, as well as the allocation of pollution sources or contributing elements, in order to improve knowledge of the water quality and the planning of monitoring networks for efficient water resource management. A Spatial distribution map using Geographical Information System (GIS) is drawn, which is necessary to provide a full grasp of the actual contamination of the river. These approaches were additionally applied to the water quality datasets, generated during two-year period namely, 2022–2024, at nineteen different sites for 21 parameters. From the results, it produces two metrics, TKN and coliform, that are higher than WHO guidelines while maintaining the optimal level of DO throughout the duration of the study. As addressed by SPI model, 10 samples (52.63%) are categorized under very pure zone, 5 samples (26.32%) are categorized under slightly polluted and around 21.05% (4 samples) points towards moderately polluted category. The results highlighted that water quality is degraded at 4 places, is because of human activities, land use and industrialization. The LSTM–WQI outcomes signified that 42.11% of tested locations are categorized under excellent, 15.79% of the examined sites are categorized as good, and rest, 10.53% and 31.58% corresponds to fair and marginal water category. This model shows organic contaminants, leaching from the soil, and commercial sites for waste disposal. The graphic representations obtained by SSA–WQI using GIS map emphasize that the range of the estimated result was between 16 and 92, declaring in a region of good to poor water quality. Noted that, six places had the finest water quality because there had been no modifications to the land’s use. It was shown that the primary causes of the decline in river quality at poor locations were the nutrients group (agricultural runoff), as well as the elements of alkalinity, hardness, EC, and solids (soil leaching and runoff process). In conclusion, this work showed that the water was potentially hazardous to health of the consumers at some investigated locations and highlighted the need to treat industrial and municipal wastewater. Therefore, this innovative research recommends integrating these modelling approaches due to their consistent and reliable results, which can reduce both the time and cost of analysis and be applied locally and globally using similar methodologies. Graphical Abstract |
| format | Article |
| id | doaj-art-5bc99826dce24bc98201c8d58b6adf25 |
| institution | Kabale University |
| issn | 2662-9984 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Sustainability |
| spelling | doaj-art-5bc99826dce24bc98201c8d58b6adf252025-08-20T03:31:45ZengSpringerDiscover Sustainability2662-99842025-06-016113610.1007/s43621-025-01464-7A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, IndiaAbhijeet Das0Department of Civil Engineering, C. V. Raman Global University (CGU)Abstract The concerns about water contaminants affect most developing countries bypassing rivers over them. The issue is challenging to introduce water quality within the allowed limits for drinking, industrial and agricultural purposes. To tackle this issue, we developed a new tool that harnesses optimization models, enhancing the reliability and accuracy of water quality assessments. Our research in Mahanadi River Basin, Odisha, presents an enhanced methodology based on data, specifically designed to be beneficial for Water Quality (WQ) based on Synthetic Pollution Index (SPI) and machine learning models such as Long Short-Term Memory (LSTM) and Sparrow Search Algorithm (SSA), for its analysis and interpretation of extensive, intricate data sets on water quality, as well as the allocation of pollution sources or contributing elements, in order to improve knowledge of the water quality and the planning of monitoring networks for efficient water resource management. A Spatial distribution map using Geographical Information System (GIS) is drawn, which is necessary to provide a full grasp of the actual contamination of the river. These approaches were additionally applied to the water quality datasets, generated during two-year period namely, 2022–2024, at nineteen different sites for 21 parameters. From the results, it produces two metrics, TKN and coliform, that are higher than WHO guidelines while maintaining the optimal level of DO throughout the duration of the study. As addressed by SPI model, 10 samples (52.63%) are categorized under very pure zone, 5 samples (26.32%) are categorized under slightly polluted and around 21.05% (4 samples) points towards moderately polluted category. The results highlighted that water quality is degraded at 4 places, is because of human activities, land use and industrialization. The LSTM–WQI outcomes signified that 42.11% of tested locations are categorized under excellent, 15.79% of the examined sites are categorized as good, and rest, 10.53% and 31.58% corresponds to fair and marginal water category. This model shows organic contaminants, leaching from the soil, and commercial sites for waste disposal. The graphic representations obtained by SSA–WQI using GIS map emphasize that the range of the estimated result was between 16 and 92, declaring in a region of good to poor water quality. Noted that, six places had the finest water quality because there had been no modifications to the land’s use. It was shown that the primary causes of the decline in river quality at poor locations were the nutrients group (agricultural runoff), as well as the elements of alkalinity, hardness, EC, and solids (soil leaching and runoff process). In conclusion, this work showed that the water was potentially hazardous to health of the consumers at some investigated locations and highlighted the need to treat industrial and municipal wastewater. Therefore, this innovative research recommends integrating these modelling approaches due to their consistent and reliable results, which can reduce both the time and cost of analysis and be applied locally and globally using similar methodologies. Graphical Abstracthttps://doi.org/10.1007/s43621-025-01464-7MahanadiWater qualitySynthetic Pollution IndexLong Short-Term MemorySparrow Search AlgorithmLeaching |
| spellingShingle | Abhijeet Das A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India Discover Sustainability Mahanadi Water quality Synthetic Pollution Index Long Short-Term Memory Sparrow Search Algorithm Leaching |
| title | A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India |
| title_full | A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India |
| title_fullStr | A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India |
| title_full_unstemmed | A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India |
| title_short | A data-driven approach utilizing machine learning (ML) and geographical information system (GIS)-based time series analysis with data augmentation for water quality assessment in Mahanadi River Basin, Odisha, India |
| title_sort | data driven approach utilizing machine learning ml and geographical information system gis based time series analysis with data augmentation for water quality assessment in mahanadi river basin odisha india |
| topic | Mahanadi Water quality Synthetic Pollution Index Long Short-Term Memory Sparrow Search Algorithm Leaching |
| url | https://doi.org/10.1007/s43621-025-01464-7 |
| work_keys_str_mv | AT abhijeetdas adatadrivenapproachutilizingmachinelearningmlandgeographicalinformationsystemgisbasedtimeseriesanalysiswithdataaugmentationforwaterqualityassessmentinmahanadiriverbasinodishaindia AT abhijeetdas datadrivenapproachutilizingmachinelearningmlandgeographicalinformationsystemgisbasedtimeseriesanalysiswithdataaugmentationforwaterqualityassessmentinmahanadiriverbasinodishaindia |