Scalable AI-driven air quality forecasting and classification for public health applications
Abstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-08-01
|
| Series: | Discover Atmosphere |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44292-025-00052-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849766863744008192 |
|---|---|
| author | Mohammad Wasil Jalali Bahir Saidi Habibullah Farahmand Mohammad Aref Rezvan Panah Eda Nur Saruhan |
| author_facet | Mohammad Wasil Jalali Bahir Saidi Habibullah Farahmand Mohammad Aref Rezvan Panah Eda Nur Saruhan |
| author_sort | Mohammad Wasil Jalali |
| collection | DOAJ |
| description | Abstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which restrict their ability to generate timely, localized, and actionable insights. There is a growing need for dynamic, data-driven solutions to help policymakers and communities respond more effectively to pollution events. Objective This study aims to build a real-time, scalable air quality prediction and classification system that not only enhances forecasting accuracy but also empowers public health interventions and environmental governance. The research seeks to bridge the gap between AI advancements and their application in under-resourced regions by developing interpretable, deployable tools for real-world use. Methods We designed a hybrid AI framework that combines ensemble machine learning models such as Random Forest and XGBoost with deep learning architectures including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), and the Transformer-based Time Series Mixer (TSMixer). The models were trained on historical air pollution data from Afghanistan’s National Environmental Protection Agency (NEPA) alongside real-time meteorological data from the OpenWeather API. To improve prediction accuracy across regions, we used geospatial clustering techniques to group cities with similar pollution patterns. Additionally, SHAP and LIME were employed to ensure transparency and explainability of model predictions. A Django-based API and user-friendly dashboards were developed for real-time deployment and visualization. Results The TSMixer model stood out in regression tasks, achieving a high R² score of 0.9861 and a low mean squared error (MSE) of 0.0278. In classification tasks, the Random Forest model performed best with an accuracy of 99.96%, slightly outperforming XGBoost at 99.48%. We also assessed the computational efficiency of the models: ensemble ML models like Random Forest had much lower inference times (around 0.0289 s), making them ideal for real-time use, while DL models like CNN and LSTM required higher computational resources (up to 1.25 s per inference). Key pollutant indicators driving predictions included NO2 and PM10, aligning with known environmental patterns identified by SHAP and LIME. Conclusion This study presents a scalable and real-time AI-based framework for air quality prediction tailored to low-resource settings like Afghanistan. By combining models such as TSMixer with geospatial clustering and deploying them through accessible tools, the system offers a practical solution for environmental monitoring. In addition to technical contributions, the framework supports policy-making and public engagement by promoting explainable, localized forecasts. |
| format | Article |
| id | doaj-art-07f82333bd4a47edb16af5af7c91bbec |
| institution | DOAJ |
| issn | 2948-1554 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Atmosphere |
| spelling | doaj-art-07f82333bd4a47edb16af5af7c91bbec2025-08-20T03:04:26ZengSpringerDiscover Atmosphere2948-15542025-08-013113010.1007/s44292-025-00052-8Scalable AI-driven air quality forecasting and classification for public health applicationsMohammad Wasil Jalali0Bahir Saidi1Habibullah Farahmand2Mohammad Aref Rezvan Panah3Eda Nur Saruhan4Computer Science, Kabul Polytechnic UniversityComputer Science, Kardan UniversityComputer Science, Government College UniversityComputer Science, Parwan UniversityComputer Science and Engineering, Koç UniversityAbstract Background Air pollution remains one of the most pressing public health and environmental issues, particularly in developing countries like Afghanistan, where reliable air quality monitoring infrastructure is lacking. Traditional systems often rely on static data and limited sensors, which restrict their ability to generate timely, localized, and actionable insights. There is a growing need for dynamic, data-driven solutions to help policymakers and communities respond more effectively to pollution events. Objective This study aims to build a real-time, scalable air quality prediction and classification system that not only enhances forecasting accuracy but also empowers public health interventions and environmental governance. The research seeks to bridge the gap between AI advancements and their application in under-resourced regions by developing interpretable, deployable tools for real-world use. Methods We designed a hybrid AI framework that combines ensemble machine learning models such as Random Forest and XGBoost with deep learning architectures including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), and the Transformer-based Time Series Mixer (TSMixer). The models were trained on historical air pollution data from Afghanistan’s National Environmental Protection Agency (NEPA) alongside real-time meteorological data from the OpenWeather API. To improve prediction accuracy across regions, we used geospatial clustering techniques to group cities with similar pollution patterns. Additionally, SHAP and LIME were employed to ensure transparency and explainability of model predictions. A Django-based API and user-friendly dashboards were developed for real-time deployment and visualization. Results The TSMixer model stood out in regression tasks, achieving a high R² score of 0.9861 and a low mean squared error (MSE) of 0.0278. In classification tasks, the Random Forest model performed best with an accuracy of 99.96%, slightly outperforming XGBoost at 99.48%. We also assessed the computational efficiency of the models: ensemble ML models like Random Forest had much lower inference times (around 0.0289 s), making them ideal for real-time use, while DL models like CNN and LSTM required higher computational resources (up to 1.25 s per inference). Key pollutant indicators driving predictions included NO2 and PM10, aligning with known environmental patterns identified by SHAP and LIME. Conclusion This study presents a scalable and real-time AI-based framework for air quality prediction tailored to low-resource settings like Afghanistan. By combining models such as TSMixer with geospatial clustering and deploying them through accessible tools, the system offers a practical solution for environmental monitoring. In addition to technical contributions, the framework supports policy-making and public engagement by promoting explainable, localized forecasts.https://doi.org/10.1007/s44292-025-00052-8Air quality predictionMachine learningDeep learningExplainable AITime-series forecastingEnvironmental monitoring |
| spellingShingle | Mohammad Wasil Jalali Bahir Saidi Habibullah Farahmand Mohammad Aref Rezvan Panah Eda Nur Saruhan Scalable AI-driven air quality forecasting and classification for public health applications Discover Atmosphere Air quality prediction Machine learning Deep learning Explainable AI Time-series forecasting Environmental monitoring |
| title | Scalable AI-driven air quality forecasting and classification for public health applications |
| title_full | Scalable AI-driven air quality forecasting and classification for public health applications |
| title_fullStr | Scalable AI-driven air quality forecasting and classification for public health applications |
| title_full_unstemmed | Scalable AI-driven air quality forecasting and classification for public health applications |
| title_short | Scalable AI-driven air quality forecasting and classification for public health applications |
| title_sort | scalable ai driven air quality forecasting and classification for public health applications |
| topic | Air quality prediction Machine learning Deep learning Explainable AI Time-series forecasting Environmental monitoring |
| url | https://doi.org/10.1007/s44292-025-00052-8 |
| work_keys_str_mv | AT mohammadwasiljalali scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications AT bahirsaidi scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications AT habibullahfarahmand scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications AT mohammadarefrezvanpanah scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications AT edanursaruhan scalableaidrivenairqualityforecastingandclassificationforpublichealthapplications |