Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach

This study applied various machine learning techniques, including shrinkage methods, XGBoost, CSR, and random forest, to forecast ultrafine particulate matter (PM2.5) concentrations in Seoul, South Korea. The analysis incorporated key variables known to significantly influence PM2.5 levels, includin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sophia Park, Myeong Jun Kim
Format:	Article
Language:	English
Published:	MDPI AG 2025-02-01
Series:	Atmosphere
Subjects:	ultrafine dust concentration PM2.5 air quality machine learning forecasting random forest
Online Access:	https://www.mdpi.com/2073-4433/16/3/239
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849342562218803200
author	Sophia Park Myeong Jun Kim
author_facet	Sophia Park Myeong Jun Kim
author_sort	Sophia Park
collection	DOAJ
description	This study applied various machine learning techniques, including shrinkage methods, XGBoost, CSR, and random forest, to forecast ultrafine particulate matter (PM2.5) concentrations in Seoul, South Korea. The analysis incorporated key variables known to significantly influence PM2.5 levels, including meteorological data, coal-fired power generation, and PM2.5 concentrations in Dalian, China. Using daily data from 1 January 2018 to 30 June 2023, this study employed the Boruta algorithm, a variable selection technique based on the random forest model, to identify the most influential predictors for predicting PM2.5 concentrations. Out-of-sample multi-period forecasts were evaluated for each model using the RMSE, MAE, and Giacomini–White test to determine the most effective forecasting approach. It was found that the random forest model with the Boruta algorithm outperformed all other models, achieving improvements of 4% to 17% in the RMSE and 4% to 16.5% in the MAE across all forecast horizons. The results indicate that the random forest model and its variant incorporating the Boruta algorithm provided superior short-term forecasting performance. In particular, the Boruta algorithm highlighted the lagged variables of temperature, PM2.5 concentration, mean humidity, and Dalian PM2.5 concentration as critical factors for the accurate prediction of PM2.5 levels in Seoul. These findings underscore the utility of data-driven approaches to improve air quality forecasting and management.
format	Article
id	doaj-art-e85b570c75a244ccb857365c5ebfebf2
institution	Kabale University
issn	2073-4433
language	English
publishDate	2025-02-01
publisher	MDPI AG
record_format	Article
series	Atmosphere
spelling	doaj-art-e85b570c75a244ccb857365c5ebfebf22025-08-20T03:43:21ZengMDPI AGAtmosphere2073-44332025-02-0116323910.3390/atmos16030239Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning ApproachSophia Park0Myeong Jun Kim1Seoul International School, Seongnam-si 13113, Republic of KoreaDevision of International Studies, Kongju National University, Gongjudaehak-ro 56, Gongju-si 32588, Republic of KoreaThis study applied various machine learning techniques, including shrinkage methods, XGBoost, CSR, and random forest, to forecast ultrafine particulate matter (PM2.5) concentrations in Seoul, South Korea. The analysis incorporated key variables known to significantly influence PM2.5 levels, including meteorological data, coal-fired power generation, and PM2.5 concentrations in Dalian, China. Using daily data from 1 January 2018 to 30 June 2023, this study employed the Boruta algorithm, a variable selection technique based on the random forest model, to identify the most influential predictors for predicting PM2.5 concentrations. Out-of-sample multi-period forecasts were evaluated for each model using the RMSE, MAE, and Giacomini–White test to determine the most effective forecasting approach. It was found that the random forest model with the Boruta algorithm outperformed all other models, achieving improvements of 4% to 17% in the RMSE and 4% to 16.5% in the MAE across all forecast horizons. The results indicate that the random forest model and its variant incorporating the Boruta algorithm provided superior short-term forecasting performance. In particular, the Boruta algorithm highlighted the lagged variables of temperature, PM2.5 concentration, mean humidity, and Dalian PM2.5 concentration as critical factors for the accurate prediction of PM2.5 levels in Seoul. These findings underscore the utility of data-driven approaches to improve air quality forecasting and management.https://www.mdpi.com/2073-4433/16/3/239ultrafine dust concentrationPM2.5air qualitymachine learningforecastingrandom forest
spellingShingle	Sophia Park Myeong Jun Kim Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach Atmosphere ultrafine dust concentration PM2.5 air quality machine learning forecasting random forest
title	Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_full	Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_fullStr	Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_full_unstemmed	Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_short	Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach
title_sort	forecasting ultrafine dust concentrations in seoul a machine learning approach
topic	ultrafine dust concentration PM2.5 air quality machine learning forecasting random forest
url	https://www.mdpi.com/2073-4433/16/3/239
work_keys_str_mv	AT sophiapark forecastingultrafinedustconcentrationsinseoulamachinelearningapproach AT myeongjunkim forecastingultrafinedustconcentrationsinseoulamachinelearningapproach

Forecasting Ultrafine Dust Concentrations in Seoul: A Machine Learning Approach

Similar Items