Deep representation learning enables cross-basin water quality prediction under data-scarce conditions

Abstract Artificial intelligence has been extensively used to predict surface water quality to assess the health of aquatic ecosystems proactively. However, water quality prediction in data-scarce conditions is a challenge, especially with heterogeneous data from monitoring sites that lack similarit...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue Zheng, Xiaoran Zhang, Yongchao Zhou, Yiping Zhang, Tuqiao Zhang, Raziyeh Farmani
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:npj Clean Water
Online Access:https://doi.org/10.1038/s41545-025-00466-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850172894669176832
author Yue Zheng
Xiaoran Zhang
Yongchao Zhou
Yiping Zhang
Tuqiao Zhang
Raziyeh Farmani
author_facet Yue Zheng
Xiaoran Zhang
Yongchao Zhou
Yiping Zhang
Tuqiao Zhang
Raziyeh Farmani
author_sort Yue Zheng
collection DOAJ
description Abstract Artificial intelligence has been extensively used to predict surface water quality to assess the health of aquatic ecosystems proactively. However, water quality prediction in data-scarce conditions is a challenge, especially with heterogeneous data from monitoring sites that lack similarity in water quality, hindering the information transfer. A deep learning model is proposed that utilizes representation learning to capture knowledge from source river basins during the pre-training stage, and incorporates meteorological data to accurately predict water quality. This model is successfully implemented and validated using data from 149 monitoring sites across inland China. The results show that the model has outstanding prediction accuracy across all sites, with a mean Nash-Sutcliffe efficiency of 0.80, and has a significant advantage in multi-indicator prediction. The model maintains its excellent performance even when trained with only half of the data. This can be attributed to the representation learning used in the pre-training stage, which enables extensive and accurate prediction under data-scarce conditions. The developed model holds significant potential for cross-basin water quality prediction, which could substantially advance the development of water environment system management.
format Article
id doaj-art-08667cb304fa49f5a35533cf9b752ad1
institution OA Journals
issn 2059-7037
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series npj Clean Water
spelling doaj-art-08667cb304fa49f5a35533cf9b752ad12025-08-20T02:19:58ZengNature Portfolionpj Clean Water2059-70372025-04-018111110.1038/s41545-025-00466-2Deep representation learning enables cross-basin water quality prediction under data-scarce conditionsYue Zheng0Xiaoran Zhang1Yongchao Zhou2Yiping Zhang3Tuqiao Zhang4Raziyeh Farmani5The Institute of Municipal Engineering, Zhejiang UniversityThe Institute of Municipal Engineering, Zhejiang UniversityThe Institute of Municipal Engineering, Zhejiang UniversityThe Institute of Municipal Engineering, Zhejiang UniversityThe Institute of Municipal Engineering, Zhejiang UniversityCentre for Water Systems, Faculty of Environment, Science and Economy, University of ExeterAbstract Artificial intelligence has been extensively used to predict surface water quality to assess the health of aquatic ecosystems proactively. However, water quality prediction in data-scarce conditions is a challenge, especially with heterogeneous data from monitoring sites that lack similarity in water quality, hindering the information transfer. A deep learning model is proposed that utilizes representation learning to capture knowledge from source river basins during the pre-training stage, and incorporates meteorological data to accurately predict water quality. This model is successfully implemented and validated using data from 149 monitoring sites across inland China. The results show that the model has outstanding prediction accuracy across all sites, with a mean Nash-Sutcliffe efficiency of 0.80, and has a significant advantage in multi-indicator prediction. The model maintains its excellent performance even when trained with only half of the data. This can be attributed to the representation learning used in the pre-training stage, which enables extensive and accurate prediction under data-scarce conditions. The developed model holds significant potential for cross-basin water quality prediction, which could substantially advance the development of water environment system management.https://doi.org/10.1038/s41545-025-00466-2
spellingShingle Yue Zheng
Xiaoran Zhang
Yongchao Zhou
Yiping Zhang
Tuqiao Zhang
Raziyeh Farmani
Deep representation learning enables cross-basin water quality prediction under data-scarce conditions
npj Clean Water
title Deep representation learning enables cross-basin water quality prediction under data-scarce conditions
title_full Deep representation learning enables cross-basin water quality prediction under data-scarce conditions
title_fullStr Deep representation learning enables cross-basin water quality prediction under data-scarce conditions
title_full_unstemmed Deep representation learning enables cross-basin water quality prediction under data-scarce conditions
title_short Deep representation learning enables cross-basin water quality prediction under data-scarce conditions
title_sort deep representation learning enables cross basin water quality prediction under data scarce conditions
url https://doi.org/10.1038/s41545-025-00466-2
work_keys_str_mv AT yuezheng deeprepresentationlearningenablescrossbasinwaterqualitypredictionunderdatascarceconditions
AT xiaoranzhang deeprepresentationlearningenablescrossbasinwaterqualitypredictionunderdatascarceconditions
AT yongchaozhou deeprepresentationlearningenablescrossbasinwaterqualitypredictionunderdatascarceconditions
AT yipingzhang deeprepresentationlearningenablescrossbasinwaterqualitypredictionunderdatascarceconditions
AT tuqiaozhang deeprepresentationlearningenablescrossbasinwaterqualitypredictionunderdatascarceconditions
AT raziyehfarmani deeprepresentationlearningenablescrossbasinwaterqualitypredictionunderdatascarceconditions