Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand

Abstract We have used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring...

Full description

Saved in:
Bibliographic Details
Main Authors: Pawan Gupta, Shanshan Zhan, Vikalp Mishra, Aekkapol Aekakkararungroj, Amanda Markert, Sarawut Paibong, Farrukh Chishtie
Format: Article
Language:English
Published: Springer 2021-09-01
Series:Aerosol and Air Quality Research
Subjects:
Online Access:https://doi.org/10.4209/aaqr.210105
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract We have used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring stations in Thailand was spatiotemporally collocated with MERRA2 fields. The integrated data then used to train and validate a supervised MLA’ random forest’ to estimate hourly and daily PM2.5 concentrations. The MLA is cross-validated using a 10-fold random sampling approach. The trained MLA can estimate PM2.5 with close to zero mean bias across the country. The correlation coefficient of 0.95 with slope and intercept values of 0.95 and 0.88 are achieved between observed and estimated PM2.5. The MLA also shows underestimation at hourly scale under very clean conditions (PM2.5 < 10 µg m−3) and overestimation during high loading (PM2.5 > 80 µg m−3). The hourly data also demonstrate high skill in following the diurnal cycle during different seasons of the year. The daily mean PM2.5 (24-hour) values follow day-to-day variability very well (correlation coefficient of 0.98, RMSE = 3.14 µg m−3), showing high value during winter months (November– February) and lower during other seasons. The trained MLA has the potential to reprocess the MERRA2 timeseries for the region, and the bias corrected data can be used in other applications such as long-term trend analysis and health exposure studies. The MLA can also be applied to GEOS forecasted fields to generate bias corrected air quality forecasts for the region.
ISSN:1680-8584
2071-1409