Prediction of Monthly Temperature Over China Based on a Machine Learning Method

Machine learning has achieved significant success in many statistical application scenarios, but has yet to be fully successful in monthly and seasonal predictions. We identified three statistical challenges in climate prediction: instability of statistical models, complexity of feature factors, and...

Full description

Saved in:
Bibliographic Details
Main Authors: Ping Mei, Zixin Yin, Haoyu Wang, Changzheng Liu, Yaoming Liao, Qiang Zhang, Liping Yin
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Advances in Meteorology
Online Access:http://dx.doi.org/10.1155/adme/6917682
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning has achieved significant success in many statistical application scenarios, but has yet to be fully successful in monthly and seasonal predictions. We identified three statistical challenges in climate prediction: instability of statistical models, complexity of feature factors, and the nonlinearity of the relationship between predictors and predictands. These characteristics limit both traditional empirical forecasting and machine learning methods. This paper proposes a novel method called dynamically modeled machine learning to predict monthly temperature anomalies over China. The core idea of dynamic modeling is that the machine learning model is trained using a sliding time window, so that the relationship between predictors and predictands is optimized for a specific and recent period rather than for the entire time span. One hundred thirty indices related to atmospheric and oceanic circulation and other climatic events from the Beijing Climate Center are used as the feature set. After feature engineering, including feature selection and dimensionality reduction, the predictors are generated and input into a regressor. Five machine learning algorithms are employed as regressors one by one: linear regression (LR), ridge regression (RR), random forest (RF), support vector machine (SVM), and gradient boosting decision trees (GBDTs). The method performs reforecasting for 2012–2021 and compares the results with the output of operational climate models from ECMWF, NCEP, and the Beijing Climate Center. Three quantitative evaluation metrics—predictive score (PS), anomaly correlation coefficient (ACC), and anomaly sign agreement rate—were used to assess the prediction performance of each machine learning regressor, the ensemble model, and three dynamic models. The results demonstrate that the method using GBDTs as the regressor achieves the best predictive performance compared to other methods and operational models, with a monthly average PS score of 84, an ACC value of 0.27, and an anomaly sign agreement rate of 74%.
ISSN:1687-9317