Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models

Abstract In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concen...

Full description

Saved in:
Bibliographic Details
Main Authors: S Srishti, Pratyush Agrawal, Padmavati Kulkarni, Hrishikesh Chandra Gautam, Meenakshi Kushwaha, V. Sreekanth
Format: Article
Language:English
Published: Springer 2023-02-01
Series:Aerosol and Air Quality Research
Subjects:
Online Access:https://doi.org/10.4209/aaqr.220428
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862811389853696
author S Srishti
Pratyush Agrawal
Padmavati Kulkarni
Hrishikesh Chandra Gautam
Meenakshi Kushwaha
V. Sreekanth
author_facet S Srishti
Pratyush Agrawal
Padmavati Kulkarni
Hrishikesh Chandra Gautam
Meenakshi Kushwaha
V. Sreekanth
author_sort S Srishti
collection DOAJ
description Abstract In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concentrations. We collocated nine LCSs and a reference PM2.5 instrument for 9 months, covering all local seasons, in Bengaluru, India. Using the collocation data, we evaluated the performance of the LCSs and trained around 170 ML models to reduce the observed bias in the LCS-measured PM2.5. The ML models included (i) Decision Tree, (ii) Random Forest (RF), (iii) eXtreme Gradient Boosting, and (iv) Support Vector Regression (SVR). A hold-out validation was performed to assess the model performance. Model performance metrics included (i) coefficient of determination (R2), (ii) root mean square error (RMSE), (iii) normalised RMSE, and (iv) mean absolute error. We found that the bias in the LCS PM2.5 measurements varied across different LCS types (RMSE = 8–29 µg m−3) and that SVR models performed best in correcting the LCS PM2.5 measurements. Hyperparameter tuning improved the performance of the ML models (except for RF). The performance of ML models trained with significant predictors (fewer in number than the number of all predictors, chosen based on recursive feature elimination algorithm) was comparable to that of the ‘all predictors’ trained models (except for RF). The performance of most ML models was better than that of the linear models. Finally, as a research objective, we introduced the collocated black carbon mass concentration measurements into the ML models but found no significant improvement in the model performance.
format Article
id doaj-art-632c88894b0f40d88a6c636401a0627c
institution Kabale University
issn 1680-8584
2071-1409
language English
publishDate 2023-02-01
publisher Springer
record_format Article
series Aerosol and Air Quality Research
spelling doaj-art-632c88894b0f40d88a6c636401a0627c2025-02-09T12:22:13ZengSpringerAerosol and Air Quality Research1680-85842071-14092023-02-0123311510.4209/aaqr.220428Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration ModelsS Srishti0Pratyush Agrawal1Padmavati Kulkarni2Hrishikesh Chandra Gautam3Meenakshi Kushwaha4V. Sreekanth5Center for Study of Science, Technology & PolicyCenter for Study of Science, Technology & PolicyCenter for Study of Science, Technology & PolicyCenter for Study of Science, Technology & PolicyILK LabsCenter for Study of Science, Technology & PolicyAbstract In this study, we combined state-of-the-art data modelling techniques (machine learning [ML] methods) and data from state-of-the-art low-cost particulate matter (PM) sensors (LCSs) to improve the accuracy of LCS-measured PM2.5 (PM with aerodynamic diameter less than 2.5 microns) mass concentrations. We collocated nine LCSs and a reference PM2.5 instrument for 9 months, covering all local seasons, in Bengaluru, India. Using the collocation data, we evaluated the performance of the LCSs and trained around 170 ML models to reduce the observed bias in the LCS-measured PM2.5. The ML models included (i) Decision Tree, (ii) Random Forest (RF), (iii) eXtreme Gradient Boosting, and (iv) Support Vector Regression (SVR). A hold-out validation was performed to assess the model performance. Model performance metrics included (i) coefficient of determination (R2), (ii) root mean square error (RMSE), (iii) normalised RMSE, and (iv) mean absolute error. We found that the bias in the LCS PM2.5 measurements varied across different LCS types (RMSE = 8–29 µg m−3) and that SVR models performed best in correcting the LCS PM2.5 measurements. Hyperparameter tuning improved the performance of the ML models (except for RF). The performance of ML models trained with significant predictors (fewer in number than the number of all predictors, chosen based on recursive feature elimination algorithm) was comparable to that of the ‘all predictors’ trained models (except for RF). The performance of most ML models was better than that of the linear models. Finally, as a research objective, we introduced the collocated black carbon mass concentration measurements into the ML models but found no significant improvement in the model performance.https://doi.org/10.4209/aaqr.220428PlantowerBeta attenuation monitorSupport vector regression
spellingShingle S Srishti
Pratyush Agrawal
Padmavati Kulkarni
Hrishikesh Chandra Gautam
Meenakshi Kushwaha
V. Sreekanth
Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models
Aerosol and Air Quality Research
Plantower
Beta attenuation monitor
Support vector regression
title Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models
title_full Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models
title_fullStr Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models
title_full_unstemmed Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models
title_short Multiple PM Low-Cost Sensors, Multiple Seasons’ Data, and Multiple Calibration Models
title_sort multiple pm low cost sensors multiple seasons data and multiple calibration models
topic Plantower
Beta attenuation monitor
Support vector regression
url https://doi.org/10.4209/aaqr.220428
work_keys_str_mv AT ssrishti multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels
AT pratyushagrawal multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels
AT padmavatikulkarni multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels
AT hrishikeshchandragautam multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels
AT meenakshikushwaha multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels
AT vsreekanth multiplepmlowcostsensorsmultipleseasonsdataandmultiplecalibrationmodels