Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms

Background: Arthritis is a major healthcare issue and accurate diagnosis is important to treatment. Objective: The study aimed to identify and intuitively visualize feature importance of factors associated with osteoarthritis versus rheumatoid arthritis in a representative population of United State...

Full description

Saved in:
Bibliographic Details
Main Authors: Alexander A. Huang, Samuel Y. Huang
Format: Article
Language:English
Published: Levy Library Press 2024-11-01
Series:Journal of Scientific Innovation in Medicine
Subjects:
Online Access:https://account.journalofscientificinnovationinmedicine.org/index.php/ll-j-jsim/article/view/181
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846114829116375040
author Alexander A. Huang
Samuel Y. Huang
author_facet Alexander A. Huang
Samuel Y. Huang
author_sort Alexander A. Huang
collection DOAJ
description Background: Arthritis is a major healthcare issue and accurate diagnosis is important to treatment. Objective: The study aimed to identify and intuitively visualize feature importance of factors associated with osteoarthritis versus rheumatoid arthritis in a representative population of United States adults. Methods: A retrospective analysis was conducted using a nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017–2020). All adult patients greater than 18 years of age (total of 1,483 individuals) with either Osteoarthritis or Rheumatoid Arthritis were included. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. Dendrogram and heatmap were created based on clustering of model statistics. The National Center for Health Statistics Ethics Review Board authorized the data acquisition and analysis. Results: 1,483 patients met the inclusion criteria of adults greater than 18 years of age with demographic questionnaire information completed. The machine learning model had 56 out of a total of 681 features that were found to be significant on univariate analysis (P < 0.01). The XGBoost model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.710. The four highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Income to Poverty Ratio (8.7%), Hip Circumference (6.5%), Dietary Folate Equivalent Intake (Folate DFE) (6.1%) and Globulin (5.1%). Cluster 1 of the heatmap and dendrogram also included Income to Poverty Ratio, Direct HDL Cholesterol (mmol/L), BMXHIP–Hip Circumference, Folate DFE, and Globulin indicating they were most similar in having high aggregate gain, cover, and frequency metrics. Conclusion: Machine learning models that incorporate dendrograms and heat maps can offer additional summaries of model statistics that assist in differentiating factors between osteoarthritis and rheumatoid arthritis. The clinical models can assist in physician diagnosis of common conditions. Teaser Text: Dendrogram Statistics.
format Article
id doaj-art-b71efdbc1a4e4cc8bb46c225ff6cf72f
institution Kabale University
issn 2579-0153
language English
publishDate 2024-11-01
publisher Levy Library Press
record_format Article
series Journal of Scientific Innovation in Medicine
spelling doaj-art-b71efdbc1a4e4cc8bb46c225ff6cf72f2024-12-20T07:28:45ZengLevy Library PressJournal of Scientific Innovation in Medicine2579-01532024-11-01714410.29024/jsim.181180Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and DendrogramsAlexander A. Huang0https://orcid.org/0000-0003-4970-4968Samuel Y. Huang1https://orcid.org/0000-0003-3663-004XNorthwestern University Feinberg School of MedicineIcahn School of Medicine at Mount Sinai South Nassau, NYBackground: Arthritis is a major healthcare issue and accurate diagnosis is important to treatment. Objective: The study aimed to identify and intuitively visualize feature importance of factors associated with osteoarthritis versus rheumatoid arthritis in a representative population of United States adults. Methods: A retrospective analysis was conducted using a nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017–2020). All adult patients greater than 18 years of age (total of 1,483 individuals) with either Osteoarthritis or Rheumatoid Arthritis were included. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. Dendrogram and heatmap were created based on clustering of model statistics. The National Center for Health Statistics Ethics Review Board authorized the data acquisition and analysis. Results: 1,483 patients met the inclusion criteria of adults greater than 18 years of age with demographic questionnaire information completed. The machine learning model had 56 out of a total of 681 features that were found to be significant on univariate analysis (P < 0.01). The XGBoost model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.710. The four highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Income to Poverty Ratio (8.7%), Hip Circumference (6.5%), Dietary Folate Equivalent Intake (Folate DFE) (6.1%) and Globulin (5.1%). Cluster 1 of the heatmap and dendrogram also included Income to Poverty Ratio, Direct HDL Cholesterol (mmol/L), BMXHIP–Hip Circumference, Folate DFE, and Globulin indicating they were most similar in having high aggregate gain, cover, and frequency metrics. Conclusion: Machine learning models that incorporate dendrograms and heat maps can offer additional summaries of model statistics that assist in differentiating factors between osteoarthritis and rheumatoid arthritis. The clinical models can assist in physician diagnosis of common conditions. Teaser Text: Dendrogram Statistics.https://account.journalofscientificinnovationinmedicine.org/index.php/ll-j-jsim/article/view/181shapely additive explanationsmachine learningxgboostnhanesdendrogramheatmap
spellingShingle Alexander A. Huang
Samuel Y. Huang
Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms
Journal of Scientific Innovation in Medicine
shapely additive explanations
machine learning
xgboost
nhanes
dendrogram
heatmap
title Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms
title_full Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms
title_fullStr Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms
title_full_unstemmed Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms
title_short Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms
title_sort innovative machine learning approach for distinguishing rheumatoid arthritis and osteoarthritis integrating shapely additive explanations and dendrograms
topic shapely additive explanations
machine learning
xgboost
nhanes
dendrogram
heatmap
url https://account.journalofscientificinnovationinmedicine.org/index.php/ll-j-jsim/article/view/181
work_keys_str_mv AT alexanderahuang innovativemachinelearningapproachfordistinguishingrheumatoidarthritisandosteoarthritisintegratingshapelyadditiveexplanationsanddendrograms
AT samuelyhuang innovativemachinelearningapproachfordistinguishingrheumatoidarthritisandosteoarthritisintegratingshapelyadditiveexplanationsanddendrograms