Assessing the diagnostic accuracy of machine learning algorithms for identification of asthma in United States adults based on NHANES dataset

Abstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare diff...

Full description

Saved in:
Bibliographic Details
Main Authors: Omid Kohandel Gargari, Mobina Fathi, Shahryar Rajai Firouzabadi, Ida Mohammadi, Mohammad Hossein Mahmoudi, Mehran Sarmadi, Arman Shafiee
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88345-1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Asthma diagnosis poses challenges due to underreporting of symptoms, misdiagnoses, and limitations in existing diagnostic tests. Machine learning (ML) offers a promising avenue for addressing these challenges by leveraging demographic and clinical data. In this study, we aim to compare different ML diagnostic models and obtain the most valuable features for asthma diagnosis using data from the National Health and Nutrition Examination Survey (NHANES) dataset. A total of 8,888 participants with available asthma diagnosis data from the 2017–2018 NHANES survey were included. After careful selection of variables related to asthma, various ML algorithms including Support Vector Machine (SVM), Random Forest (RF), AdaBoost (ADA), XGBoost (XGB), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP) were evaluated. SVM and ADA emerged as top performers with the highest area under the curve (AUC) scores of 0.72 and 0.71, respectively. RF exhibited high accuracy but low precision. Feature interpretation using SHapley Additive exPlanations (SHAP) values identified significant predictors such as close relative asthma history, dietary fat intake, and chronic bronchitis. Feature reduction experiments showed promising results without significant loss in predictive performance. Our findings demonstrate the potential diagnosis ability of ML algorithms, particularly SVM and ADA, in asthma diagnosis by incorporating diverse clinical and demographic factors. In addition, close relative asthma history, dietary fat intake, and chronic bronchitis could be suggested as the valuable asthma diagnosis features. These outcomes can bring promising results in early diagnosis of asthma.
ISSN:2045-2322