Scalable AI-driven air quality forecasting and classification for public health applications

Abstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Wasil Jalali, Bahir Saidi, Habibullah Farahmand, Mohammad Aref Rezvan Panah, Eda Nur Saruhan
Format: Article
Language:English
Published: Springer 2025-08-01
Series:Discover Atmosphere
Subjects:
Online Access:https://doi.org/10.1007/s44292-025-00052-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849766863744008192
author Mohammad Wasil Jalali
Bahir Saidi
Habibullah Farahmand
Mohammad Aref Rezvan Panah
Eda Nur Saruhan
author_facet Mohammad Wasil Jalali
Bahir Saidi
Habibullah Farahmand
Mohammad Aref Rezvan Panah
Eda Nur Saruhan
author_sort Mohammad Wasil Jalali
collection DOAJ
description Abstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which restrict their ability to generate timely, localized, and actionable insights. There is a growing need for dynamic, data-driven solutions to help policymakers and communities respond more effectively to pollution events. Objective This study aims to build a real-time, scalable air quality prediction and classification system that not only enhances forecasting accuracy but also empowers public health interventions and environmental governance. The research seeks to bridge the gap between AI advancements and their application in under-resourced regions by developing interpretable, deployable tools for real-world use. Methods We designed a hybrid AI framework that combines ensemble machine learning models such as Random Forest and XGBoost with deep learning architectures including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), and the Transformer-based Time Series Mixer (TSMixer). The models were trained on historical air pollution data from Afghanistan’s National Environmental Protection Agency (NEPA) alongside real-time meteorological data from the OpenWeather API. To improve prediction accuracy across regions, we used geospatial clustering techniques to group cities with similar pollution patterns. Additionally, SHAP and LIME were employed to ensure transparency and explainability of model predictions. A Django-based API and user-friendly dashboards were developed for real-time deployment and visualization. Results The TSMixer model stood out in regression tasks, achieving a high R² score of 0.9861 and a low mean squared error (MSE) of 0.0278. In classification tasks, the Random Forest model performed best with an accuracy of 99.96%, slightly outperforming XGBoost at 99.48%. We also assessed the computational efficiency of the models: ensemble ML models like Random Forest had much lower inference times (around 0.0289 s), making them ideal for real-time use, while DL models like CNN and LSTM required higher computational resources (up to 1.25 s per inference). Key pollutant indicators driving predictions included NO2 and PM10, aligning with known environmental patterns identified by SHAP and LIME. Conclusion This study presents a scalable and real-time AI-based framework for air quality prediction tailored to low-resource settings like Afghanistan. By combining models such as TSMixer with geospatial clustering and deploying them through accessible tools, the system offers a practical solution for environmental monitoring. In addition to technical contributions, the framework supports policy-making and public engagement by promoting explainable, localized forecasts.
format Article
id doaj-art-07f82333bd4a47edb16af5af7c91bbec
institution DOAJ
issn 2948-1554
language English
publishDate 2025-08-01
publisher Springer
record_format Article
series Discover Atmosphere
spelling doaj-art-07f82333bd4a47edb16af5af7c91bbec2025-08-20T03:04:26ZengSpringerDiscover Atmosphere2948-15542025-08-013113010.1007/s44292-025-00052-8Scalable AI-driven air quality forecasting and classification for public health applicationsMohammad Wasil Jalali0Bahir Saidi1Habibullah Farahmand2Mohammad Aref Rezvan Panah3Eda Nur Saruhan4Computer Science, Kabul Polytechnic UniversityComputer Science, Kardan UniversityComputer Science, Government College UniversityComputer Science, Parwan UniversityComputer Science and Engineering, Koç UniversityAbstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which restrict their ability to generate timely, localized, and actionable insights. There is a growing need for dynamic, data-driven solutions to help policymakers and communities respond more effectively to pollution events. Objective This study aims to build a real-time, scalable air quality prediction and classification system that not only enhances forecasting accuracy but also empowers public health interventions and environmental governance. The research seeks to bridge the gap between AI advancements and their application in under-resourced regions by developing interpretable, deployable tools for real-world use. Methods We designed a hybrid AI framework that combines ensemble machine learning models such as Random Forest and XGBoost with deep learning architectures including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), and the Transformer-based Time Series Mixer (TSMixer). The models were trained on historical air pollution data from Afghanistan’s National Environmental Protection Agency (NEPA) alongside real-time meteorological data from the OpenWeather API. To improve prediction accuracy across regions, we used geospatial clustering techniques to group cities with similar pollution patterns. Additionally, SHAP and LIME were employed to ensure transparency and explainability of model predictions. A Django-based API and user-friendly dashboards were developed for real-time deployment and visualization. Results The TSMixer model stood out in regression tasks, achieving a high R² score of 0.9861 and a low mean squared error (MSE) of 0.0278. In classification tasks, the Random Forest model performed best with an accuracy of 99.96%, slightly outperforming XGBoost at 99.48%. We also assessed the computational efficiency of the models: ensemble ML models like Random Forest had much lower inference times (around 0.0289 s), making them ideal for real-time use, while DL models like CNN and LSTM required higher computational resources (up to 1.25 s per inference). Key pollutant indicators driving predictions included NO2 and PM10, aligning with known environmental patterns identified by SHAP and LIME. Conclusion This study presents a scalable and real-time AI-based framework for air quality prediction tailored to low-resource settings like Afghanistan. By combining models such as TSMixer with geospatial clustering and deploying them through accessible tools, the system offers a practical solution for environmental monitoring. In addition to technical contributions, the framework supports policy-making and public engagement by promoting explainable, localized forecasts.https://doi.org/10.1007/s44292-025-00052-8Air quality predictionMachine learningDeep learningExplainable AITime-series forecastingEnvironmental monitoring
spellingShingle Mohammad Wasil Jalali
Bahir Saidi
Habibullah Farahmand
Mohammad Aref Rezvan Panah
Eda Nur Saruhan
Scalable AI-driven air quality forecasting and classification for public health applications
Discover Atmosphere
Air quality prediction
Machine learning
Deep learning
Explainable AI
Time-series forecasting
Environmental monitoring
title Scalable AI-driven air quality forecasting and classification for public health applications
title_full Scalable AI-driven air quality forecasting and classification for public health applications
title_fullStr Scalable AI-driven air quality forecasting and classification for public health applications
title_full_unstemmed Scalable AI-driven air quality forecasting and classification for public health applications
title_short Scalable AI-driven air quality forecasting and classification for public health applications
title_sort scalable ai driven air quality forecasting and classification for public health applications
topic Air quality prediction
Machine learning
Deep learning
Explainable AI
Time-series forecasting
Environmental monitoring
url https://doi.org/10.1007/s44292-025-00052-8
work_keys_str_mv AT mohammadwasiljalali scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications
AT bahirsaidi scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications
AT habibullahfarahmand scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications
AT mohammadarefrezvanpanah scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications
AT edanursaruhan scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications