Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach

This study applied various machine learning techniques, including shrinkage methods, XGBoost, CSR, and random forest, to forecast ultrafine particulate matter (PM2.5) concentrations in Seoul, South Korea. The analysis incorporated key variables known to significantly influence PM2.5 levels, includin...

Full description

Saved in:
Bibliographic Details
Main Authors: Sophia Park, Myeong Jun Kim
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/16/3/239
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849342562218803200
author Sophia Park
Myeong Jun Kim
author_facet Sophia Park
Myeong Jun Kim
author_sort Sophia Park
collection DOAJ
description This study applied various machine learning techniques, including shrinkage methods, XGBoost, CSR, and random forest, to forecast ultrafine particulate matter (PM2.5) concentrations in Seoul, South Korea. The analysis incorporated key variables known to significantly influence PM2.5 levels, including meteorological data, coal-fired power generation, and PM2.5 concentrations in Dalian, China. Using daily data from 1 January 2018 to 30 June 2023, this study employed the Boruta algorithm, a variable selection technique based on the random forest model, to identify the most influential predictors for predicting PM2.5 concentrations. Out-of-sample multi-period forecasts were evaluated for each model using the RMSE, MAE, and Giacomini–White test to determine the most effective forecasting approach. It was found that the random forest model with the Boruta algorithm outperformed all other models, achieving improvements of 4% to 17% in the RMSE and 4% to 16.5% in the MAE across all forecast horizons. The results indicate that the random forest model and its variant incorporating the Boruta algorithm provided superior short-term forecasting performance. In particular, the Boruta algorithm highlighted the lagged variables of temperature, PM2.5 concentration, mean humidity, and Dalian PM2.5 concentration as critical factors for the accurate prediction of PM2.5 levels in Seoul. These findings underscore the utility of data-driven approaches to improve air quality forecasting and management.
format Article
id doaj-art-e85b570c75a244ccb857365c5ebfebf2
institution Kabale University
issn 2073-4433
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Atmosphere
spelling doaj-art-e85b570c75a244ccb857365c5ebfebf22025-08-20T03:43:21ZengMDPI AGAtmosphere2073-44332025-02-0116323910.3390/atmos16030239Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning ApproachSophia Park0Myeong Jun Kim1Seoul International School, Seongnam-si 13113, Republic of KoreaDevision of International Studies, Kongju National University, Gongjudaehak-ro 56, Gongju-si 32588, Republic of KoreaThis study applied various machine learning techniques, including shrinkage methods, XGBoost, CSR, and random forest, to forecast ultrafine particulate matter (PM2.5) concentrations in Seoul, South Korea. The analysis incorporated key variables known to significantly influence PM2.5 levels, including meteorological data, coal-fired power generation, and PM2.5 concentrations in Dalian, China. Using daily data from 1 January 2018 to 30 June 2023, this study employed the Boruta algorithm, a variable selection technique based on the random forest model, to identify the most influential predictors for predicting PM2.5 concentrations. Out-of-sample multi-period forecasts were evaluated for each model using the RMSE, MAE, and Giacomini–White test to determine the most effective forecasting approach. It was found that the random forest model with the Boruta algorithm outperformed all other models, achieving improvements of 4% to 17% in the RMSE and 4% to 16.5% in the MAE across all forecast horizons. The results indicate that the random forest model and its variant incorporating the Boruta algorithm provided superior short-term forecasting performance. In particular, the Boruta algorithm highlighted the lagged variables of temperature, PM2.5 concentration, mean humidity, and Dalian PM2.5 concentration as critical factors for the accurate prediction of PM2.5 levels in Seoul. These findings underscore the utility of data-driven approaches to improve air quality forecasting and management.https://www.mdpi.com/2073-4433/16/3/239ultrafine dust concentrationPM2.5air qualitymachine learningforecastingrandom forest
spellingShingle Sophia Park
Myeong Jun Kim
Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
Atmosphere
ultrafine dust concentration
PM2.5
air quality
machine learning
forecasting
random forest
title Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_full Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_fullStr Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_full_unstemmed Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_short Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_sort forecasting ultrafine dust concentrations in seoul a machine learning approach
topic ultrafine dust concentration
PM2.5
air quality
machine learning
forecasting
random forest
url https://www.mdpi.com/2073-4433/16/3/239
work_keys_str_mv AT sophiapark forecastingultrafinedustconcentrationsinseoulamachinelearningapproach
AT myeongjunkim forecastingultrafinedustconcentrationsinseoulamachinelearningapproach