Kalman filtering assimilated machine learning methods significantly improve the prediction performance of water quality parameters

Accurate water quality prediction is essential for effective water pollution prevention and emergency responses. However, existing research on machine learning (ML)-based data assimilation methods remains limited, particularly in terms of addressing the combined impacts of climate change and anthrop...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhenyu Gao, Guoqiang Wang, Jinyue Chen, Lei Fang, Shilong Ren, A. Yinglan, Shuping Ji, Ruobing Liu, Qiao Wang
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Ecological Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1574954125003462
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate water quality prediction is essential for effective water pollution prevention and emergency responses. However, existing research on machine learning (ML)-based data assimilation methods remains limited, particularly in terms of addressing the combined impacts of climate change and anthropogenic activities. To address this gap, we proposed a novel ‘ML–Kalman filter (KF)’ data assimilation framework and evaluated its performance in the Dahei River Basin, a representative semi-arid watershed. Our results demonstrated significant improvements in predicting key water quality parameters, including total nitrogen (TN), total phosphorus (TP), and the permanganate index (CODMn), through the integration of KF with four ML models (LSTM, RF, XGBoost, and SVR). The accuracy enhancement ranged from 4.3 % to 17.6 %, with TP showing the most substantial improvement (9.2 %–17.6 %), followed by TN (6.4 %–11.1 %) and CODMn (4.3 %–12.1 %). After assimilation, the models exhibited the following performance ranking for TN based on the coefficient of determination (R2): LSTM–KF (R2 = 0.909) > RF–KF (R2 = 0.886) > SVR–KF (R2 = 0.840) > XGBoost–KF (R2 = 0.797), with similar trends observed for TP and CODMn. The proposed framework demonstrates strong portability and applicability across different monitoring sections and temporal resolutions, offering a robust solution for regions with limited monitoring capabilities and challenging climatic conditions. These findings provide valuable data and technical support for advancing water pollution prediction and early warning systems, particularly for ecological and environmental departments operating in data-deficient regions.
ISSN:1574-9541