Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach

Effectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali Suliman AlSalehy, Mike Bailey
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Smart Cities
Subjects:
Online Access:https://www.mdpi.com/2624-6511/8/3/90
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850164787951960064
author Ali Suliman AlSalehy
Mike Bailey
author_facet Ali Suliman AlSalehy
Mike Bailey
author_sort Ali Suliman AlSalehy
collection DOAJ
description Effectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022) of data from ten monitoring stations, combined with meteorological variables. Exploratory analysis revealed distinct diurnal and moderate weekly CO cycles, with prevailing northwesterly winds shaping dispersion. Spatial correlation of CO was low (average 0.14), suggesting strong local sources, unlike temperature (0.92) and wind (0.5–0.6), which showed higher spatial coherence. Seasonal Trend decomposition (STL) confirmed stronger seasonality in meteorological factors than in CO levels. Low wind speeds were associated with elevated CO concentrations. Key predictive features, such as 3-h rolling mean and median values of CO, dominated feature importance. Spatiotemporal analysis highlighted persistent hotspots in industrial areas and unexpectedly high levels in some residential zones. A range of models was tested, with ensemble methods (Extreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost)) achieving the best performance (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and XGBoost producing the lowest Root Mean Squared Error (RMSE) of 0.0371 ppm. This work enhances understanding of CO dynamics in complex urban–industrial areas, providing accurate predictive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and highlighting the importance of local sources and temporal patterns for improving air quality forecasts.
format Article
id doaj-art-7248cbd4f63f4c6fb95a13b8e2f5812a
institution OA Journals
issn 2624-6511
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Smart Cities
spelling doaj-art-7248cbd4f63f4c6fb95a13b8e2f5812a2025-08-20T02:21:53ZengMDPI AGSmart Cities2624-65112025-05-01839010.3390/smartcities8030090Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical ApproachAli Suliman AlSalehy0Mike Bailey1Department of Electrical Engineering and Computer Science, College of Engineering, Oregon State University, Corvallis, OR 97331, USADepartment of Electrical Engineering and Computer Science, College of Engineering, Oregon State University, Corvallis, OR 97331, USAEffectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022) of data from ten monitoring stations, combined with meteorological variables. Exploratory analysis revealed distinct diurnal and moderate weekly CO cycles, with prevailing northwesterly winds shaping dispersion. Spatial correlation of CO was low (average 0.14), suggesting strong local sources, unlike temperature (0.92) and wind (0.5–0.6), which showed higher spatial coherence. Seasonal Trend decomposition (STL) confirmed stronger seasonality in meteorological factors than in CO levels. Low wind speeds were associated with elevated CO concentrations. Key predictive features, such as 3-h rolling mean and median values of CO, dominated feature importance. Spatiotemporal analysis highlighted persistent hotspots in industrial areas and unexpectedly high levels in some residential zones. A range of models was tested, with ensemble methods (Extreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost)) achieving the best performance (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and XGBoost producing the lowest Root Mean Squared Error (RMSE) of 0.0371 ppm. This work enhances understanding of CO dynamics in complex urban–industrial areas, providing accurate predictive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and highlighting the importance of local sources and temporal patterns for improving air quality forecasts.https://www.mdpi.com/2624-6511/8/3/90air quality monitoringforecasting modelsgas and weather datamachine learning for environmental datamulti-location datapredictive analytics
spellingShingle Ali Suliman AlSalehy
Mike Bailey
Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach
Smart Cities
air quality monitoring
forecasting models
gas and weather data
machine learning for environmental data
multi-location data
predictive analytics
title Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach
title_full Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach
title_fullStr Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach
title_full_unstemmed Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach
title_short Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach
title_sort environmental data analytics for smart cities a machine learning and statistical approach
topic air quality monitoring
forecasting models
gas and weather data
machine learning for environmental data
multi-location data
predictive analytics
url https://www.mdpi.com/2624-6511/8/3/90
work_keys_str_mv AT alisulimanalsalehy environmentaldataanalyticsforsmartcitiesamachinelearningandstatisticalapproach
AT mikebailey environmentaldataanalyticsforsmartcitiesamachinelearningandstatisticalapproach