Scalable AI-driven air quality forecasting and classification for public health applications

Abstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Wasil Jalali, Bahir Saidi, Habibullah Farahmand, Mohammad Aref Rezvan Panah, Eda Nur Saruhan
Format: Article
Language:English
Published: Springer 2025-08-01
Series:Discover Atmosphere
Subjects:
Online Access:https://doi.org/10.1007/s44292-025-00052-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which restrict their ability to generate timely, localized, and actionable insights. There is a growing need for dynamic, data-driven solutions to help policymakers and communities respond more effectively to pollution events. Objective This study aims to build a real-time, scalable air quality prediction and classification system that not only enhances forecasting accuracy but also empowers public health interventions and environmental governance. The research seeks to bridge the gap between AI advancements and their application in under-resourced regions by developing interpretable, deployable tools for real-world use. Methods We designed a hybrid AI framework that combines ensemble machine learning models such as Random Forest and XGBoost with deep learning architectures including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), and the Transformer-based Time Series Mixer (TSMixer). The models were trained on historical air pollution data from Afghanistan’s National Environmental Protection Agency (NEPA) alongside real-time meteorological data from the OpenWeather API. To improve prediction accuracy across regions, we used geospatial clustering techniques to group cities with similar pollution patterns. Additionally, SHAP and LIME were employed to ensure transparency and explainability of model predictions. A Django-based API and user-friendly dashboards were developed for real-time deployment and visualization. Results The TSMixer model stood out in regression tasks, achieving a high R² score of 0.9861 and a low mean squared error (MSE) of 0.0278. In classification tasks, the Random Forest model performed best with an accuracy of 99.96%, slightly outperforming XGBoost at 99.48%. We also assessed the computational efficiency of the models: ensemble ML models like Random Forest had much lower inference times (around 0.0289 s), making them ideal for real-time use, while DL models like CNN and LSTM required higher computational resources (up to 1.25 s per inference). Key pollutant indicators driving predictions included NO2 and PM10, aligning with known environmental patterns identified by SHAP and LIME. Conclusion This study presents a scalable and real-time AI-based framework for air quality prediction tailored to low-resource settings like Afghanistan. By combining models such as TSMixer with geospatial clustering and deploying them through accessible tools, the system offers a practical solution for environmental monitoring. In addition to technical contributions, the framework supports policy-making and public engagement by promoting explainable, localized forecasts.
ISSN:2948-1554