A Novel Strategy for Automatic Selection of Cross‐Basin Data to Improve Local Machine Learning‐Based Runoff Models

Abstract Previous studies have shown that regional deep learning (DL) models can improve runoff prediction by leveraging large hydrological datasets. However, training a DL regional model using all data without screening may degrade local performance. This study focuses on constructing enhanced loca...

Full description

Saved in:
Bibliographic Details
Main Authors: Congyi Nai, Xingcai Liu, Qiuhong Tang, Liu Liu, Siao Sun, Paul P. J. Gaffney
Format: Article
Language:English
Published: Wiley 2024-05-01
Series:Water Resources Research
Subjects:
Online Access:https://doi.org/10.1029/2023WR035051
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Previous studies have shown that regional deep learning (DL) models can improve runoff prediction by leveraging large hydrological datasets. However, training a DL regional model using all data without screening may degrade local performance. This study focuses on constructing enhanced local models through the utilization of cross‐basin data. To this end, we propose an approach that employs a novel training strategy to optimize DL model training for specific basins. The approach measures the impact of any one basin's gradient on the loss of the basin of interest, providing insights into the relationships between different basins. The approach was validated using 531 basins from the CAMELS dataset. Results suggest that local performance degradation is a common occurrence in regional models, and imbalanced data are likely to result in a specific pattern dominating the entire regional model. In comparison to a regional model simply trained with all basins, the median Nash‐Sutcliffe efficiency (NSE) for our models is 0.031 higher. In particular, the increase in NSE can exceed 0.2 for some dry basins. Our findings indicate that this novel DL strategy can significantly improve model performance in specific basins using large hydrological datasets, while mitigating local performance loss.
ISSN:0043-1397
1944-7973