Improving the quantification of peak concentrations for air quality sensors via data weighting

<p>Traditional calibration models for low-cost air quality sensors have demonstrated a tendency to underpredict peak concentrations. We assessed the utility of adding data weights to low-cost sensor colocation data to improve the quantification of peak concentrations when the majority of coloc...

Full description

Saved in:
Bibliographic Details
Main Authors: C. Frischmon, J. Silberstein, A. Guth, E. Mattson, J. Porter, M. Hannigan
Format: Article
Language:English
Published: Copernicus Publications 2025-07-01
Series:Atmospheric Measurement Techniques
Online Access:https://amt.copernicus.org/articles/18/3147/2025/amt-18-3147-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849469802639261696
author C. Frischmon
J. Silberstein
A. Guth
E. Mattson
J. Porter
M. Hannigan
author_facet C. Frischmon
J. Silberstein
A. Guth
E. Mattson
J. Porter
M. Hannigan
author_sort C. Frischmon
collection DOAJ
description <p>Traditional calibration models for low-cost air quality sensors have demonstrated a tendency to underpredict peak concentrations. We assessed the utility of adding data weights to low-cost sensor colocation data to improve the quantification of peak concentrations when the majority of colocation data is at a baseline concentration and varies due to intermittent, transient events. Specifically, we explore the effects of data weighting on three different pollutant colocation datasets: total volatile organic compounds (VOCs), carbon monoxide (CO), and methane (CH<span class="inline-formula"><sub>4</sub></span>). Leveraging two different weighting functions, a sigmoidal and a piecewise weighting regime, we explored the impacts of the base model choice (multilinear regression, MLR, vs. random forest, RF, models), the sensitivity of weighting functions, and the ability of data weighting to improve high-concentration pollution measurements. When compared to unweighted colocation data, we demonstrate significant reductions in both error (root mean square error, RMSE) and bias (mean bias error, MBE) for pollutant peaks across all three datasets when data weighting is employed. For the top percentile of data, we observe an average of 23 % reduction in RMSE and a 35 % reduction in MBE when optimal weights are employed. More significant reductions occurred in the 95th–99th percentile of data, where MBE was reduced by an average of 70 %. RMSE in the 95th-99th percentile was reduced by an average of 26 %. However, data weighting can also generate larger errors at baseline pollutant concentrations. Data weighting regimes were sensitive to input parameters, and input weighting functions may be tuned to better predict peak concentration data without significant reductions in the fidelity of baseline pollutant predictions.</p>
format Article
id doaj-art-8bede2915bbb497c9be44d7e4fc15003
institution Kabale University
issn 1867-1381
1867-8548
language English
publishDate 2025-07-01
publisher Copernicus Publications
record_format Article
series Atmospheric Measurement Techniques
spelling doaj-art-8bede2915bbb497c9be44d7e4fc150032025-08-20T03:25:21ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482025-07-01183147315910.5194/amt-18-3147-2025Improving the quantification of peak concentrations for air quality sensors via data weightingC. Frischmon0J. Silberstein1A. Guth2E. Mattson3J. Porter4M. Hannigan5Department of Mechanical Engineering, University of Colorado Boulder, 1111 Engineering Drive, Boulder, CO 80309, USADepartment of Mechanical Engineering, University of Colorado Boulder, 1111 Engineering Drive, Boulder, CO 80309, USADepartment of Mechanical Engineering, University of Colorado Boulder, 1111 Engineering Drive, Boulder, CO 80309, USAColorado Department of Public Health and Environment, 4300 Cherry Creek Drive South, Glendale, CO 80246, USASouth Coast Air Quality Monitoring District, 21865 Copley Drive Diamond Bar, CA 91765, USADepartment of Mechanical Engineering, University of Colorado Boulder, 1111 Engineering Drive, Boulder, CO 80309, USA<p>Traditional calibration models for low-cost air quality sensors have demonstrated a tendency to underpredict peak concentrations. We assessed the utility of adding data weights to low-cost sensor colocation data to improve the quantification of peak concentrations when the majority of colocation data is at a baseline concentration and varies due to intermittent, transient events. Specifically, we explore the effects of data weighting on three different pollutant colocation datasets: total volatile organic compounds (VOCs), carbon monoxide (CO), and methane (CH<span class="inline-formula"><sub>4</sub></span>). Leveraging two different weighting functions, a sigmoidal and a piecewise weighting regime, we explored the impacts of the base model choice (multilinear regression, MLR, vs. random forest, RF, models), the sensitivity of weighting functions, and the ability of data weighting to improve high-concentration pollution measurements. When compared to unweighted colocation data, we demonstrate significant reductions in both error (root mean square error, RMSE) and bias (mean bias error, MBE) for pollutant peaks across all three datasets when data weighting is employed. For the top percentile of data, we observe an average of 23 % reduction in RMSE and a 35 % reduction in MBE when optimal weights are employed. More significant reductions occurred in the 95th–99th percentile of data, where MBE was reduced by an average of 70 %. RMSE in the 95th-99th percentile was reduced by an average of 26 %. However, data weighting can also generate larger errors at baseline pollutant concentrations. Data weighting regimes were sensitive to input parameters, and input weighting functions may be tuned to better predict peak concentration data without significant reductions in the fidelity of baseline pollutant predictions.</p>https://amt.copernicus.org/articles/18/3147/2025/amt-18-3147-2025.pdf
spellingShingle C. Frischmon
J. Silberstein
A. Guth
E. Mattson
J. Porter
M. Hannigan
Improving the quantification of peak concentrations for air quality sensors via data weighting
Atmospheric Measurement Techniques
title Improving the quantification of peak concentrations for air quality sensors via data weighting
title_full Improving the quantification of peak concentrations for air quality sensors via data weighting
title_fullStr Improving the quantification of peak concentrations for air quality sensors via data weighting
title_full_unstemmed Improving the quantification of peak concentrations for air quality sensors via data weighting
title_short Improving the quantification of peak concentrations for air quality sensors via data weighting
title_sort improving the quantification of peak concentrations for air quality sensors via data weighting
url https://amt.copernicus.org/articles/18/3147/2025/amt-18-3147-2025.pdf
work_keys_str_mv AT cfrischmon improvingthequantificationofpeakconcentrationsforairqualitysensorsviadataweighting
AT jsilberstein improvingthequantificationofpeakconcentrationsforairqualitysensorsviadataweighting
AT aguth improvingthequantificationofpeakconcentrationsforairqualitysensorsviadataweighting
AT emattson improvingthequantificationofpeakconcentrationsforairqualitysensorsviadataweighting
AT jporter improvingthequantificationofpeakconcentrationsforairqualitysensorsviadataweighting
AT mhannigan improvingthequantificationofpeakconcentrationsforairqualitysensorsviadataweighting