Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach
Effectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Smart Cities |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2624-6511/8/3/90 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850164787951960064 |
|---|---|
| author | Ali Suliman AlSalehy Mike Bailey |
| author_facet | Ali Suliman AlSalehy Mike Bailey |
| author_sort | Ali Suliman AlSalehy |
| collection | DOAJ |
| description | Effectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022) of data from ten monitoring stations, combined with meteorological variables. Exploratory analysis revealed distinct diurnal and moderate weekly CO cycles, with prevailing northwesterly winds shaping dispersion. Spatial correlation of CO was low (average 0.14), suggesting strong local sources, unlike temperature (0.92) and wind (0.5–0.6), which showed higher spatial coherence. Seasonal Trend decomposition (STL) confirmed stronger seasonality in meteorological factors than in CO levels. Low wind speeds were associated with elevated CO concentrations. Key predictive features, such as 3-h rolling mean and median values of CO, dominated feature importance. Spatiotemporal analysis highlighted persistent hotspots in industrial areas and unexpectedly high levels in some residential zones. A range of models was tested, with ensemble methods (Extreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost)) achieving the best performance (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and XGBoost producing the lowest Root Mean Squared Error (RMSE) of 0.0371 ppm. This work enhances understanding of CO dynamics in complex urban–industrial areas, providing accurate predictive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and highlighting the importance of local sources and temporal patterns for improving air quality forecasts. |
| format | Article |
| id | doaj-art-7248cbd4f63f4c6fb95a13b8e2f5812a |
| institution | OA Journals |
| issn | 2624-6511 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Smart Cities |
| spelling | doaj-art-7248cbd4f63f4c6fb95a13b8e2f5812a2025-08-20T02:21:53ZengMDPI AGSmart Cities2624-65112025-05-01839010.3390/smartcities8030090Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical ApproachAli Suliman AlSalehy0Mike Bailey1Department of Electrical Engineering and Computer Science, College of Engineering, Oregon State University, Corvallis, OR 97331, USADepartment of Electrical Engineering and Computer Science, College of Engineering, Oregon State University, Corvallis, OR 97331, USAEffectively managing carbon monoxide (CO) pollution in complex industrial cities like Jubail remains challenging due to the diversity of emission sources and local environmental dynamics. This study analyzes spatiotemporal CO patterns and builds accurate predictive models using five years (2018–2022) of data from ten monitoring stations, combined with meteorological variables. Exploratory analysis revealed distinct diurnal and moderate weekly CO cycles, with prevailing northwesterly winds shaping dispersion. Spatial correlation of CO was low (average 0.14), suggesting strong local sources, unlike temperature (0.92) and wind (0.5–0.6), which showed higher spatial coherence. Seasonal Trend decomposition (STL) confirmed stronger seasonality in meteorological factors than in CO levels. Low wind speeds were associated with elevated CO concentrations. Key predictive features, such as 3-h rolling mean and median values of CO, dominated feature importance. Spatiotemporal analysis highlighted persistent hotspots in industrial areas and unexpectedly high levels in some residential zones. A range of models was tested, with ensemble methods (Extreme Gradient Boosting (XGBoost) and Categorical Boosting (CatBoost)) achieving the best performance (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and XGBoost producing the lowest Root Mean Squared Error (RMSE) of 0.0371 ppm. This work enhances understanding of CO dynamics in complex urban–industrial areas, providing accurate predictive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup><mo>></mo><mn>0.95</mn></mrow></semantics></math></inline-formula>) and highlighting the importance of local sources and temporal patterns for improving air quality forecasts.https://www.mdpi.com/2624-6511/8/3/90air quality monitoringforecasting modelsgas and weather datamachine learning for environmental datamulti-location datapredictive analytics |
| spellingShingle | Ali Suliman AlSalehy Mike Bailey Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach Smart Cities air quality monitoring forecasting models gas and weather data machine learning for environmental data multi-location data predictive analytics |
| title | Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach |
| title_full | Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach |
| title_fullStr | Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach |
| title_full_unstemmed | Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach |
| title_short | Environmental Data Analytics for Smart Cities: A Machine Learning and Statistical Approach |
| title_sort | environmental data analytics for smart cities a machine learning and statistical approach |
| topic | air quality monitoring forecasting models gas and weather data machine learning for environmental data multi-location data predictive analytics |
| url | https://www.mdpi.com/2624-6511/8/3/90 |
| work_keys_str_mv | AT alisulimanalsalehy environmentaldataanalyticsforsmartcitiesamachinelearningandstatisticalapproach AT mikebailey environmentaldataanalyticsforsmartcitiesamachinelearningandstatisticalapproach |