Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean Peninsula

Air pollution, particularly fine particulate matter (PM<sub>2.5</sub>), poses significant environmental and public health challenges in South Korea. The National Institute of Environmental Research (NIER) currently relies on numerical models such as the Community Multiscale Air Quality (...

Full description

Saved in:
Bibliographic Details
Main Authors: Chae-Yeon Lee, Ju-Yong Lee, Seung-Hee Han, Jin-Goo Kang, Jeong-Beom Lee, Dae-Ryun Choi
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/16/5/524
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850257732547903488
author Chae-Yeon Lee
Ju-Yong Lee
Seung-Hee Han
Jin-Goo Kang
Jeong-Beom Lee
Dae-Ryun Choi
author_facet Chae-Yeon Lee
Ju-Yong Lee
Seung-Hee Han
Jin-Goo Kang
Jeong-Beom Lee
Dae-Ryun Choi
author_sort Chae-Yeon Lee
collection DOAJ
description Air pollution, particularly fine particulate matter (PM<sub>2.5</sub>), poses significant environmental and public health challenges in South Korea. The National Institute of Environmental Research (NIER) currently relies on numerical models such as the Community Multiscale Air Quality (CMAQ) model for PM<sub>2.5</sub> forecasting. However, these models exhibit inherent uncertainties due to limitations in emission inventories, meteorological inputs, and model frameworks. To address these challenges, this study evaluates and compares the forecasting performance of two alternative models: Long Short-Term Memory (LSTM), a deep learning model, and Seasonal Auto Regressive Integrated Moving Average with Exogenous Variables (SARIMAX), a statistical model. The performance evaluation was focused on Seoul, South Korea, and took place over different forecast lead times (D00–D02). The results indicate that for short-term forecasts (D00), SARIMAX outperformed LSTM in all statistical metrics, particularly in detecting high PM<sub>2.5</sub> concentrations, with a 19.43% higher Probability of Detection (POD). However, SARIMAX exhibited a sharp performance decline in extended forecasts (D01–D02). In contrast, LSTM demonstrated relatively stable accuracy over longer lead times, effectively capturing complex PM<sub>2.5</sub> concentration patterns, particularly during high-concentration episodes. These findings highlight the strengths and limitations of statistical and deep learning models. While SARIMAX excels in short-term forecasting with limited training data, LSTM proves advantageous for long-term forecasting, benefiting from its ability to learn complex temporal patterns from historical data. The results suggest that an integrated air quality forecasting system combining numerical, statistical, and machine learning approaches could enhance PM<sub>2.5</sub> forecasting accuracy.
format Article
id doaj-art-3e0ca9664ffe47bfad09d1b42a68e331
institution OA Journals
issn 2073-4433
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Atmosphere
spelling doaj-art-3e0ca9664ffe47bfad09d1b42a68e3312025-08-20T01:56:20ZengMDPI AGAtmosphere2073-44332025-04-0116552410.3390/atmos16050524Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean PeninsulaChae-Yeon Lee0Ju-Yong Lee1Seung-Hee Han2Jin-Goo Kang3Jeong-Beom Lee4Dae-Ryun Choi5Division of Ocean & Atmosphere Sciences, Korea Polar Research Institute, Incheon 21990, Republic of KoreaDepartment of Environmental and Engineering, Graduate School, Anyang University, Anyang 14028, Republic of KoreaDepartment of Environmental and Engineering, Graduate School, Anyang University, Anyang 14028, Republic of KoreaDepartment of Environmental and Energy Engineering, Anyang University, Anyang 14028, Republic of KoreaDepartment of Environmental and Engineering, Graduate School, Anyang University, Anyang 14028, Republic of KoreaDepartment of Environmental and Energy Engineering, Anyang University, Anyang 14028, Republic of KoreaAir pollution, particularly fine particulate matter (PM<sub>2.5</sub>), poses significant environmental and public health challenges in South Korea. The National Institute of Environmental Research (NIER) currently relies on numerical models such as the Community Multiscale Air Quality (CMAQ) model for PM<sub>2.5</sub> forecasting. However, these models exhibit inherent uncertainties due to limitations in emission inventories, meteorological inputs, and model frameworks. To address these challenges, this study evaluates and compares the forecasting performance of two alternative models: Long Short-Term Memory (LSTM), a deep learning model, and Seasonal Auto Regressive Integrated Moving Average with Exogenous Variables (SARIMAX), a statistical model. The performance evaluation was focused on Seoul, South Korea, and took place over different forecast lead times (D00–D02). The results indicate that for short-term forecasts (D00), SARIMAX outperformed LSTM in all statistical metrics, particularly in detecting high PM<sub>2.5</sub> concentrations, with a 19.43% higher Probability of Detection (POD). However, SARIMAX exhibited a sharp performance decline in extended forecasts (D01–D02). In contrast, LSTM demonstrated relatively stable accuracy over longer lead times, effectively capturing complex PM<sub>2.5</sub> concentration patterns, particularly during high-concentration episodes. These findings highlight the strengths and limitations of statistical and deep learning models. While SARIMAX excels in short-term forecasting with limited training data, LSTM proves advantageous for long-term forecasting, benefiting from its ability to learn complex temporal patterns from historical data. The results suggest that an integrated air quality forecasting system combining numerical, statistical, and machine learning approaches could enhance PM<sub>2.5</sub> forecasting accuracy.https://www.mdpi.com/2073-4433/16/5/524PM<sub>2.5</sub> forecastingLSTMSARIMAXair quality predictiondeep learningstatistical modeling
spellingShingle Chae-Yeon Lee
Ju-Yong Lee
Seung-Hee Han
Jin-Goo Kang
Jeong-Beom Lee
Dae-Ryun Choi
Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean Peninsula
Atmosphere
PM<sub>2.5</sub> forecasting
LSTM
SARIMAX
air quality prediction
deep learning
statistical modeling
title Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean Peninsula
title_full Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean Peninsula
title_fullStr Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean Peninsula
title_full_unstemmed Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean Peninsula
title_short Performance Evaluation of PM<sub>2.5</sub> Forecasting Using SARIMAX and LSTM in the Korean Peninsula
title_sort performance evaluation of pm sub 2 5 sub forecasting using sarimax and lstm in the korean peninsula
topic PM<sub>2.5</sub> forecasting
LSTM
SARIMAX
air quality prediction
deep learning
statistical modeling
url https://www.mdpi.com/2073-4433/16/5/524
work_keys_str_mv AT chaeyeonlee performanceevaluationofpmsub25subforecastingusingsarimaxandlstminthekoreanpeninsula
AT juyonglee performanceevaluationofpmsub25subforecastingusingsarimaxandlstminthekoreanpeninsula
AT seungheehan performanceevaluationofpmsub25subforecastingusingsarimaxandlstminthekoreanpeninsula
AT jingookang performanceevaluationofpmsub25subforecastingusingsarimaxandlstminthekoreanpeninsula
AT jeongbeomlee performanceevaluationofpmsub25subforecastingusingsarimaxandlstminthekoreanpeninsula
AT daeryunchoi performanceevaluationofpmsub25subforecastingusingsarimaxandlstminthekoreanpeninsula