A framework for scalable ambient air pollution concentration estimation
Ambient air pollution remains a global challenge, with adverse impacts on health and the environment. Addressing air pollution requires reliable data on pollutant concentrations, which form the foundation for interventions aimed at improving air quality. However, in many regions, including the Unite...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Cambridge University Press
2025-01-01
|
| Series: | Environmental Data Science |
| Subjects: | |
| Online Access: | https://www.cambridge.org/core/product/identifier/S2634460225000093/type/journal_article |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850023899759116288 |
|---|---|
| author | Liam J. Berrisford Lucy S. Neal Helen J. Buttery Benjamin R. Evans Ronaldo Menezes |
| author_facet | Liam J. Berrisford Lucy S. Neal Helen J. Buttery Benjamin R. Evans Ronaldo Menezes |
| author_sort | Liam J. Berrisford |
| collection | DOAJ |
| description | Ambient air pollution remains a global challenge, with adverse impacts on health and the environment. Addressing air pollution requires reliable data on pollutant concentrations, which form the foundation for interventions aimed at improving air quality. However, in many regions, including the United Kingdom, air pollution monitoring networks are characterized by spatial sparsity, heterogeneous placement, and frequent temporal data gaps, often due to issues such as power outages. We introduce a scalable data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements within the United Kingdom. The machine learning framework used is LightGBM, a gradient boosting algorithm based on decision trees, for efficient and scalable modeling. This approach provides a comprehensive dataset for England throughout 2018 at a 1 km2 hourly resolution. Leveraging machine learning techniques and real-world data from the sparsely distributed monitoring stations, we generate 355,827 synthetic monitoring stations across the study area. Validation was conducted to assess the model’s performance in forecasting, estimating missing locations, and capturing peak concentrations. The resulting dataset is of particular interest to a diverse range of stakeholders engaged in downstream assessments supported by outdoor air pollution concentration data for nitrogen dioxide (NO2), Ozone (O3), particulate matter with a diameter of 10 μm or less (PM10), particulate matter with a diameter of 2.5 μm or less PM2.5, and sulphur dioxide (SO2), at a higher resolution than was previously possible. |
| format | Article |
| id | doaj-art-6ed19bf4ab754b929be353beb2250f47 |
| institution | DOAJ |
| issn | 2634-4602 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Cambridge University Press |
| record_format | Article |
| series | Environmental Data Science |
| spelling | doaj-art-6ed19bf4ab754b929be353beb2250f472025-08-20T03:01:15ZengCambridge University PressEnvironmental Data Science2634-46022025-01-01410.1017/eds.2025.9A framework for scalable ambient air pollution concentration estimationLiam J. Berrisford0https://orcid.org/0000-0001-6578-3497Lucy S. Neal1Helen J. Buttery2https://orcid.org/0009-0009-9726-5315Benjamin R. Evans3https://orcid.org/0000-0003-4696-596XRonaldo Menezes4https://orcid.org/0000-0002-6479-6429BioComplex Laboratory, Department of Computer Science, University of Exeter, Exeter, UK Department of Mathematics, University of Exeter, Exeter, UK UKRI Centre for Doctoral Training in Environmental Intelligence, University of Exeter, Exeter, UKMet Office, Exeter, UKMet Office, Exeter, UKMet Office, Exeter, UKBioComplex Laboratory, Department of Computer Science, University of Exeter, Exeter, UK Department of Computer Science, Federal University of Ceará, Fortaleza, BrazilAmbient air pollution remains a global challenge, with adverse impacts on health and the environment. Addressing air pollution requires reliable data on pollutant concentrations, which form the foundation for interventions aimed at improving air quality. However, in many regions, including the United Kingdom, air pollution monitoring networks are characterized by spatial sparsity, heterogeneous placement, and frequent temporal data gaps, often due to issues such as power outages. We introduce a scalable data-driven supervised machine learning model framework designed to address temporal and spatial data gaps by filling missing measurements within the United Kingdom. The machine learning framework used is LightGBM, a gradient boosting algorithm based on decision trees, for efficient and scalable modeling. This approach provides a comprehensive dataset for England throughout 2018 at a 1 km2 hourly resolution. Leveraging machine learning techniques and real-world data from the sparsely distributed monitoring stations, we generate 355,827 synthetic monitoring stations across the study area. Validation was conducted to assess the model’s performance in forecasting, estimating missing locations, and capturing peak concentrations. The resulting dataset is of particular interest to a diverse range of stakeholders engaged in downstream assessments supported by outdoor air pollution concentration data for nitrogen dioxide (NO2), Ozone (O3), particulate matter with a diameter of 10 μm or less (PM10), particulate matter with a diameter of 2.5 μm or less PM2.5, and sulphur dioxide (SO2), at a higher resolution than was previously possible.https://www.cambridge.org/core/product/identifier/S2634460225000093/type/journal_articleair qualitydata sciencemachine learningsustainable developmenturban resilience and justice |
| spellingShingle | Liam J. Berrisford Lucy S. Neal Helen J. Buttery Benjamin R. Evans Ronaldo Menezes A framework for scalable ambient air pollution concentration estimation Environmental Data Science air quality data science machine learning sustainable development urban resilience and justice |
| title | A framework for scalable ambient air pollution concentration estimation |
| title_full | A framework for scalable ambient air pollution concentration estimation |
| title_fullStr | A framework for scalable ambient air pollution concentration estimation |
| title_full_unstemmed | A framework for scalable ambient air pollution concentration estimation |
| title_short | A framework for scalable ambient air pollution concentration estimation |
| title_sort | framework for scalable ambient air pollution concentration estimation |
| topic | air quality data science machine learning sustainable development urban resilience and justice |
| url | https://www.cambridge.org/core/product/identifier/S2634460225000093/type/journal_article |
| work_keys_str_mv | AT liamjberrisford aframeworkforscalableambientairpollutionconcentrationestimation AT lucysneal aframeworkforscalableambientairpollutionconcentrationestimation AT helenjbuttery aframeworkforscalableambientairpollutionconcentrationestimation AT benjaminrevans aframeworkforscalableambientairpollutionconcentrationestimation AT ronaldomenezes aframeworkforscalableambientairpollutionconcentrationestimation AT liamjberrisford frameworkforscalableambientairpollutionconcentrationestimation AT lucysneal frameworkforscalableambientairpollutionconcentrationestimation AT helenjbuttery frameworkforscalableambientairpollutionconcentrationestimation AT benjaminrevans frameworkforscalableambientairpollutionconcentrationestimation AT ronaldomenezes frameworkforscalableambientairpollutionconcentrationestimation |