Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models
Abstract In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concen...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2023-02-01
|
Series: | Aerosol and Air Quality Research |
Subjects: | |
Online Access: | https://doi.org/10.4209/aaqr.220428 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823862811389853696 |
---|---|
author | S Srishti Pratyush Agrawal Padmavati Kulkarni Hrishikesh Chandra Gautam Meenakshi Kushwaha V. Sreekanth |
author_facet | S Srishti Pratyush Agrawal Padmavati Kulkarni Hrishikesh Chandra Gautam Meenakshi Kushwaha V. Sreekanth |
author_sort | S Srishti |
collection | DOAJ |
description | Abstract In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concentrations. We collocated nine LCSs and a reference PM2.5 instrument for 9 months, covering all local seasons, in Bengaluru, India. Using the collocation data, we evaluated the performance of the LCSs and trained around 170 ML models to reduce the observed bias in the LCS-measured PM2.5. The ML models included (i) Decision Tree, (ii) Random Forest (RF), (iii) eXtreme Gradient Boosting, and (iv) Support Vector Regression (SVR). A hold-out validation was performed to assess the model performance. Model performance metrics included (i) coefficient of determination (R2), (ii) root mean square error (RMSE), (iii) normalised RMSE, and (iv) mean absolute error. We found that the bias in the LCS PM2.5 measurements varied across different LCS types (RMSE = 8–29 µg m−3) and that SVR models performed best in correcting the LCS PM2.5 measurements. Hyperparameter tuning improved the performance of the ML models (except for RF). The performance of ML models trained with significant predictors (fewer in number than the number of all predictors, chosen based on recursive feature elimination algorithm) was comparable to that of the ‘all predictors’ trained models (except for RF). The performance of most ML models was better than that of the linear models. Finally, as a research objective, we introduced the collocated black carbon mass concentration measurements into the ML models but found no significant improvement in the model performance. |
format | Article |
id | doaj-art-632c88894b0f40d88a6c636401a0627c |
institution | Kabale University |
issn | 1680-8584 2071-1409 |
language | English |
publishDate | 2023-02-01 |
publisher | Springer |
record_format | Article |
series | Aerosol and Air Quality Research |
spelling | doaj-art-632c88894b0f40d88a6c636401a0627c2025-02-09T12:22:13ZengSpringerAerosol and Air Quality Research1680-85842071-14092023-02-0123311510.4209/aaqr.220428Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration ModelsS Srishti0Pratyush Agrawal1Padmavati Kulkarni2Hrishikesh Chandra Gautam3Meenakshi Kushwaha4V. Sreekanth5Center for Study of Science, Technology & PolicyCenter for Study of Science, Technology & PolicyCenter for Study of Science, Technology & PolicyCenter for Study of Science, Technology & PolicyILK LabsCenter for Study of Science, Technology & PolicyAbstract In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concentrations. We collocated nine LCSs and a reference PM2.5 instrument for 9 months, covering all local seasons, in Bengaluru, India. Using the collocation data, we evaluated the performance of the LCSs and trained around 170 ML models to reduce the observed bias in the LCS-measured PM2.5. The ML models included (i) Decision Tree, (ii) Random Forest (RF), (iii) eXtreme Gradient Boosting, and (iv) Support Vector Regression (SVR). A hold-out validation was performed to assess the model performance. Model performance metrics included (i) coefficient of determination (R2), (ii) root mean square error (RMSE), (iii) normalised RMSE, and (iv) mean absolute error. We found that the bias in the LCS PM2.5 measurements varied across different LCS types (RMSE = 8–29 µg m−3) and that SVR models performed best in correcting the LCS PM2.5 measurements. Hyperparameter tuning improved the performance of the ML models (except for RF). The performance of ML models trained with significant predictors (fewer in number than the number of all predictors, chosen based on recursive feature elimination algorithm) was comparable to that of the ‘all predictors’ trained models (except for RF). The performance of most ML models was better than that of the linear models. Finally, as a research objective, we introduced the collocated black carbon mass concentration measurements into the ML models but found no significant improvement in the model performance.https://doi.org/10.4209/aaqr.220428PlantowerBeta attenuation monitorSupport vector regression |
spellingShingle | S Srishti Pratyush Agrawal Padmavati Kulkarni Hrishikesh Chandra Gautam Meenakshi Kushwaha V. Sreekanth Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models Aerosol and Air Quality Research Plantower Beta attenuation monitor Support vector regression |
title | Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models |
title_full | Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models |
title_fullStr | Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models |
title_full_unstemmed | Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models |
title_short | Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models |
title_sort | multiple pm low cost sensors multiple seasons data and multiple calibration models |
topic | Plantower Beta attenuation monitor Support vector regression |
url | https://doi.org/10.4209/aaqr.220428 |
work_keys_str_mv | AT ssrishti multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels AT pratyushagrawal multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels AT padmavatikulkarni multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels AT hrishikeshchandragautam multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels AT meenakshikushwaha multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels AT vsreekanth multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels |