Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters

Accurate water quality prediction is essential for effective water pollution prevention and emergency responses. However, existing research on machine learning (ML)-based data assimilation methods remains limited, particularly in terms of addressing the combined impacts of climate change and anthrop...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhenyu Gao, Guoqiang Wang, Jinyue Chen, Lei Fang, Shilong Ren, A. Yinglan, Shuping Ji, Ruobing Liu, Qiao Wang
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Ecological Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1574954125003462
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233556781400064
author Zhenyu Gao
Guoqiang Wang
Jinyue Chen
Lei Fang
Shilong Ren
A. Yinglan
Shuping Ji
Ruobing Liu
Qiao Wang
author_facet Zhenyu Gao
Guoqiang Wang
Jinyue Chen
Lei Fang
Shilong Ren
A. Yinglan
Shuping Ji
Ruobing Liu
Qiao Wang
author_sort Zhenyu Gao
collection DOAJ
description Accurate water quality prediction is essential for effective water pollution prevention and emergency responses. However, existing research on machine learning (ML)-based data assimilation methods remains limited, particularly in terms of addressing the combined impacts of climate change and anthropogenic activities. To address this gap, we proposed a novel ‘ML–Kalman filter (KF)’ data assimilation framework and evaluated its performance in the Dahei River Basin, a representative semi-arid watershed. Our results demonstrated significant improvements in predicting key water quality parameters, including total nitrogen (TN), total phosphorus (TP), and the permanganate index (CODMn), through the integration of KF with four ML models (LSTM, RF, XGBoost, and SVR). The accuracy enhancement ranged from 4.3 % to 17.6 %, with TP showing the most substantial improvement (9.2 %–17.6 %), followed by TN (6.4 %–11.1 %) and CODMn (4.3 %–12.1 %). After assimilation, the models exhibited the following performance ranking for TN based on the coefficient of determination (R2): LSTM–KF (R2 = 0.909) > RF–KF (R2 = 0.886) > SVR–KF (R2 = 0.840) > XGBoost–KF (R2 = 0.797), with similar trends observed for TP and CODMn. The proposed framework demonstrates strong portability and applicability across different monitoring sections and temporal resolutions, offering a robust solution for regions with limited monitoring capabilities and challenging climatic conditions. These findings provide valuable data and technical support for advancing water pollution prediction and early warning systems, particularly for ecological and environmental departments operating in data-deficient regions.
format Article
id doaj-art-91db876eee6c42369ae8065122974d50
institution Kabale University
issn 1574-9541
language English
publishDate 2025-12-01
publisher Elsevier
record_format Article
series Ecological Informatics
spelling doaj-art-91db876eee6c42369ae8065122974d502025-08-20T05:05:51ZengElsevierEcological Informatics1574-95412025-12-019010333710.1016/j.ecoinf.2025.103337Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parametersZhenyu Gao0Guoqiang Wang1Jinyue Chen2Lei Fang3Shilong Ren4A. Yinglan5Shuping Ji6Ruobing Liu7Qiao Wang8Academician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, ChinaAcademician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, China; Innovation Research Center of Satellite Application, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China; Corresponding author at: Academician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, China.Academician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, China; Shenzhen Research Institute of Shandong University, Shenzhen 518057, ChinaAcademician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, ChinaAcademician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, ChinaInnovation Research Center of Satellite Application, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, ChinaAcademician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, ChinaAcademician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, ChinaAcademician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266237, ChinaAccurate water quality prediction is essential for effective water pollution prevention and emergency responses. However, existing research on machine learning (ML)-based data assimilation methods remains limited, particularly in terms of addressing the combined impacts of climate change and anthropogenic activities. To address this gap, we proposed a novel ‘ML–Kalman filter (KF)’ data assimilation framework and evaluated its performance in the Dahei River Basin, a representative semi-arid watershed. Our results demonstrated significant improvements in predicting key water quality parameters, including total nitrogen (TN), total phosphorus (TP), and the permanganate index (CODMn), through the integration of KF with four ML models (LSTM, RF, XGBoost, and SVR). The accuracy enhancement ranged from 4.3 % to 17.6 %, with TP showing the most substantial improvement (9.2 %–17.6 %), followed by TN (6.4 %–11.1 %) and CODMn (4.3 %–12.1 %). After assimilation, the models exhibited the following performance ranking for TN based on the coefficient of determination (R2): LSTM–KF (R2 = 0.909) > RF–KF (R2 = 0.886) > SVR–KF (R2 = 0.840) > XGBoost–KF (R2 = 0.797), with similar trends observed for TP and CODMn. The proposed framework demonstrates strong portability and applicability across different monitoring sections and temporal resolutions, offering a robust solution for regions with limited monitoring capabilities and challenging climatic conditions. These findings provide valuable data and technical support for advancing water pollution prediction and early warning systems, particularly for ecological and environmental departments operating in data-deficient regions.http://www.sciencedirect.com/science/article/pii/S1574954125003462Kalman filterTime series predictionMachine learningData assimilation
spellingShingle Zhenyu Gao
Guoqiang Wang
Jinyue Chen
Lei Fang
Shilong Ren
A. Yinglan
Shuping Ji
Ruobing Liu
Qiao Wang
Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters
Ecological Informatics
Kalman filter
Time series prediction
Machine learning
Data assimilation
title Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters
title_full Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters
title_fullStr Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters
title_full_unstemmed Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters
title_short Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters
title_sort kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters
topic Kalman filter
Time series prediction
Machine learning
Data assimilation
url http://www.sciencedirect.com/science/article/pii/S1574954125003462
work_keys_str_mv AT zhenyugao kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT guoqiangwang kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT jinyuechen kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT leifang kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT shilongren kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT ayinglan kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT shupingji kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT ruobingliu kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters
AT qiaowang kalmanfilteringassimilatedmachinelearningmethodssignificantlyimprovethepredictionperformanceofwaterqualityparameters