Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations prediction

Abstract In response to the problem of neglecting the periodic and global characteristics of sequence data when predicting PM2.5 concentrations via machine learning models, a PM2.5 concentrations prediction model based on feature space reconstruction and multihead self-attention gated recurrent unit...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoxin Yue, Yulong Bai, Qinghe Yu, Lin Ding, Wei Song, Wenhui Liu, Huhu Ren, Qi Song
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-00911-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849309856035504128
author Xiaoxin Yue
Yulong Bai
Qinghe Yu
Lin Ding
Wei Song
Wenhui Liu
Huhu Ren
Qi Song
author_facet Xiaoxin Yue
Yulong Bai
Qinghe Yu
Lin Ding
Wei Song
Wenhui Liu
Huhu Ren
Qi Song
author_sort Xiaoxin Yue
collection DOAJ
description Abstract In response to the problem of neglecting the periodic and global characteristics of sequence data when predicting PM2.5 concentrations via machine learning models, a PM2.5 concentrations prediction model based on feature space reconstruction and multihead self-attention gated recurrent unit (FSR-MSAGRU) is proposed in this study. First, the raw sequence data are subjected to frequency spectrum analysis to determine the period value of the PM2.5 sequence data. Subsequently, the seasonal trend decomposition procedure based on loess (STL) is employed to capture the periodicity and trend information in the PM2.5 sequence data. Then, the feature space of the PM2.5 sequence data is reconstructed using the raw PM2.5 sequence data, decomposed seasonal components, trend components, and residual components. Finally, the reconstructed feature data are input into multihead self-attention gated recurrent unit (MSAGRU) with the ability to capture global feature information to predict PM2.5 concentrations. Favorable prediction results were attained by the proposed FSR-MSAGRU model across 6 distinct experimental datasets, with a PCC exceeding 0.98 and a decrease in the prediction accuracy metric SMAPE of at least 68% compared to that of the GRU model. Comparative experimental results with 13 reference models demonstrate that the proposed model exhibits better prediction performances and stronger generalization abilities.
format Article
id doaj-art-0647c6eeea3944ea99755ff0671a111c
institution Kabale University
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-0647c6eeea3944ea99755ff0671a111c2025-08-20T03:53:57ZengNature PortfolioScientific Reports2045-23222025-05-0115112110.1038/s41598-025-00911-9Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations predictionXiaoxin Yue0Yulong Bai1Qinghe Yu2Lin Ding3Wei Song4Wenhui Liu5Huhu Ren6Qi Song7College of Physics and Electrical Engineering, Northwest Normal UniversityCollege of Physics and Electrical Engineering, Northwest Normal UniversityCollege of Physics and Electrical Engineering, Northwest Normal UniversityCollege of Physics and Electrical Engineering, Northwest Normal UniversityCollege of Physics and Electrical Engineering, Northwest Normal UniversityCollege of Physics and Electrical Engineering, Northwest Normal UniversityCollege of Physics and Electrical Engineering, Northwest Normal UniversityCollege of Physics and Electrical Engineering, Northwest Normal UniversityAbstract In response to the problem of neglecting the periodic and global characteristics of sequence data when predicting PM2.5 concentrations via machine learning models, a PM2.5 concentrations prediction model based on feature space reconstruction and multihead self-attention gated recurrent unit (FSR-MSAGRU) is proposed in this study. First, the raw sequence data are subjected to frequency spectrum analysis to determine the period value of the PM2.5 sequence data. Subsequently, the seasonal trend decomposition procedure based on loess (STL) is employed to capture the periodicity and trend information in the PM2.5 sequence data. Then, the feature space of the PM2.5 sequence data is reconstructed using the raw PM2.5 sequence data, decomposed seasonal components, trend components, and residual components. Finally, the reconstructed feature data are input into multihead self-attention gated recurrent unit (MSAGRU) with the ability to capture global feature information to predict PM2.5 concentrations. Favorable prediction results were attained by the proposed FSR-MSAGRU model across 6 distinct experimental datasets, with a PCC exceeding 0.98 and a decrease in the prediction accuracy metric SMAPE of at least 68% compared to that of the GRU model. Comparative experimental results with 13 reference models demonstrate that the proposed model exhibits better prediction performances and stronger generalization abilities.https://doi.org/10.1038/s41598-025-00911-9Machine learningFeature space reconstructionMultihead Self-attentionGated recurrent unitPM2.5 concentration prediction
spellingShingle Xiaoxin Yue
Yulong Bai
Qinghe Yu
Lin Ding
Wei Song
Wenhui Liu
Huhu Ren
Qi Song
Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations prediction
Scientific Reports
Machine learning
Feature space reconstruction
Multihead Self-attention
Gated recurrent unit
PM2.5 concentration prediction
title Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations prediction
title_full Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations prediction
title_fullStr Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations prediction
title_full_unstemmed Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations prediction
title_short Novel hybrid data-driven modeling based on feature space reconstruction and multihead self-attention gated recurrent unit: applied to PM2.5 concentrations prediction
title_sort novel hybrid data driven modeling based on feature space reconstruction and multihead self attention gated recurrent unit applied to pm2 5 concentrations prediction
topic Machine learning
Feature space reconstruction
Multihead Self-attention
Gated recurrent unit
PM2.5 concentration prediction
url https://doi.org/10.1038/s41598-025-00911-9
work_keys_str_mv AT xiaoxinyue novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction
AT yulongbai novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction
AT qingheyu novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction
AT linding novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction
AT weisong novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction
AT wenhuiliu novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction
AT huhuren novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction
AT qisong novelhybriddatadrivenmodelingbasedonfeaturespacereconstructionandmultiheadselfattentiongatedrecurrentunitappliedtopm25concentrationsprediction