Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater Accuracy
Abstract In North China, pollen stands as a leading allergen responsible for allergic rhinitis, with climate change exacerbating allergenic pollen sensitization and posing significant health risks to residents. Despite its critical importance, pollen forecasting technology is still not sufficiently...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2024-09-01
|
| Series: | Aerosol and Air Quality Research |
| Subjects: | |
| Online Access: | https://doi.org/10.4209/aaqr.240123 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849325970094292992 |
|---|---|
| author | Wenxi Ruan Ziming Li Zhaobin Sun Xingqin An Yuxin Zhao Shuwen Zhang Yinglin Liang Yaqin Bu Jingyi Xin Xiaoyi Hang |
| author_facet | Wenxi Ruan Ziming Li Zhaobin Sun Xingqin An Yuxin Zhao Shuwen Zhang Yinglin Liang Yaqin Bu Jingyi Xin Xiaoyi Hang |
| author_sort | Wenxi Ruan |
| collection | DOAJ |
| description | Abstract In North China, pollen stands as a leading allergen responsible for allergic rhinitis, with climate change exacerbating allergenic pollen sensitization and posing significant health risks to residents. Despite its critical importance, pollen forecasting technology is still not sufficiently optimized. This study leverages multi-year daily pollen concentration observations and ECMWF (European Centre for Medium-Range Weather Forecasts) real-time forecast data, applying twelve machine learning models to learn perturbations separated from characteristic quantities. Specifically, it forecasts pollen concentrations in Beijing, utilizing R2 and RMSE as evaluation metrics. The findings reveal that the CatBoost, Extra Trees, and XGBoost algorithms perform well for three-day consecutive pollen predictions. Specifically, when considering a one-day prediction period, the R2 values for these algorithms are 0.72, 0.73, and 0.73, respectively. In contrast, algorithms such as Neural Network, LightGBM, and K-nearest Neighbor demonstrate weaker performance, though all models except Neural NetTorch achieve R2 values above 0.50. Notably, the prediction accuracy of Neural NetTorch significantly improves with extended prediction time, with its R2 increasing from 0.34 to 0.67 as the prediction period extends from one day to three days. The Weighted Ensemble model, which adjusts other models based on weighted optimization to mitigate excessive peaks, consistently yields stable results with an R2 exceeding 0.67. Furthermore, the study assesses the importance of feature groups within the model, indicating that pollen emission intensity and phenological characteristics are crucial for both training and testing phases, whereas meteorological factors predominantly influence pollen dispersion. Given the strong impact of meteorological conditions and nonlinear regulation on pollen, a type of bioaerosol, machine learning demonstrates substantial potential for simulating and predicting its concentrations. |
| format | Article |
| id | doaj-art-85ef74e1f23648cdabd272fd23b566ea |
| institution | Kabale University |
| issn | 1680-8584 2071-1409 |
| language | English |
| publishDate | 2024-09-01 |
| publisher | Springer |
| record_format | Article |
| series | Aerosol and Air Quality Research |
| spelling | doaj-art-85ef74e1f23648cdabd272fd23b566ea2025-08-20T03:48:15ZengSpringerAerosol and Air Quality Research1680-85842071-14092024-09-01241111610.4209/aaqr.240123Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater AccuracyWenxi Ruan0Ziming Li1Zhaobin Sun2Xingqin An3Yuxin Zhao4Shuwen Zhang5Yinglin Liang6Yaqin Bu7Jingyi Xin8Xiaoyi Hang9State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological SciencesBeijing Weather Forecast CenterState Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological SciencesState Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological SciencesState Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological SciencesCollege of Traditional Chinese Medicine, Nanjing University of Chinese MedicineSchool of Atmospheric Sciences, Chengdu University of Information TechnologyKey Laboratory of Western China’s Environmental Systems (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou UniversitySchool of Traditional Chinese Medicine, Beijing University of Chinese MedicineSchool of Traditional Chinese Medicine, Beijing University of Chinese MedicineAbstract In North China, pollen stands as a leading allergen responsible for allergic rhinitis, with climate change exacerbating allergenic pollen sensitization and posing significant health risks to residents. Despite its critical importance, pollen forecasting technology is still not sufficiently optimized. This study leverages multi-year daily pollen concentration observations and ECMWF (European Centre for Medium-Range Weather Forecasts) real-time forecast data, applying twelve machine learning models to learn perturbations separated from characteristic quantities. Specifically, it forecasts pollen concentrations in Beijing, utilizing R2 and RMSE as evaluation metrics. The findings reveal that the CatBoost, Extra Trees, and XGBoost algorithms perform well for three-day consecutive pollen predictions. Specifically, when considering a one-day prediction period, the R2 values for these algorithms are 0.72, 0.73, and 0.73, respectively. In contrast, algorithms such as Neural Network, LightGBM, and K-nearest Neighbor demonstrate weaker performance, though all models except Neural NetTorch achieve R2 values above 0.50. Notably, the prediction accuracy of Neural NetTorch significantly improves with extended prediction time, with its R2 increasing from 0.34 to 0.67 as the prediction period extends from one day to three days. The Weighted Ensemble model, which adjusts other models based on weighted optimization to mitigate excessive peaks, consistently yields stable results with an R2 exceeding 0.67. Furthermore, the study assesses the importance of feature groups within the model, indicating that pollen emission intensity and phenological characteristics are crucial for both training and testing phases, whereas meteorological factors predominantly influence pollen dispersion. Given the strong impact of meteorological conditions and nonlinear regulation on pollen, a type of bioaerosol, machine learning demonstrates substantial potential for simulating and predicting its concentrations.https://doi.org/10.4209/aaqr.240123Machine learningForecastingPollen concentrationsLead timeTime series analysis |
| spellingShingle | Wenxi Ruan Ziming Li Zhaobin Sun Xingqin An Yuxin Zhao Shuwen Zhang Yinglin Liang Yaqin Bu Jingyi Xin Xiaoyi Hang Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater Accuracy Aerosol and Air Quality Research Machine learning Forecasting Pollen concentrations Lead time Time series analysis |
| title | Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater Accuracy |
| title_full | Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater Accuracy |
| title_fullStr | Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater Accuracy |
| title_full_unstemmed | Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater Accuracy |
| title_short | Enhancing Pollen Prediction in Beijing, a Chinese Megacity: Leveraging Ensemble Learning Models for Greater Accuracy |
| title_sort | enhancing pollen prediction in beijing a chinese megacity leveraging ensemble learning models for greater accuracy |
| topic | Machine learning Forecasting Pollen concentrations Lead time Time series analysis |
| url | https://doi.org/10.4209/aaqr.240123 |
| work_keys_str_mv | AT wenxiruan enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT zimingli enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT zhaobinsun enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT xingqinan enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT yuxinzhao enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT shuwenzhang enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT yinglinliang enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT yaqinbu enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT jingyixin enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy AT xiaoyihang enhancingpollenpredictioninbeijingachinesemegacityleveragingensemblelearningmodelsforgreateraccuracy |