Comparative study of imputation strategies to improve the sarcopenia prediction task

Objective Sarcopenia, a condition characterized by the progressive loss of skeletal muscle mass and strength, poses significant challenges in research due to missing data. Incomplete datasets undermine the accuracy and reliability of studies, necessitating effective imputation techniques. This study...

Full description

Saved in:
Bibliographic Details
Main Authors: Shakhzod Karimov, Dilmurod Turimov, Wooseong Kim, Jiyoun Kim
Format: Article
Language:English
Published: SAGE Publishing 2025-01-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076241301960
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841525281605550080
author Shakhzod Karimov
Dilmurod Turimov
Wooseong Kim
Jiyoun Kim
author_facet Shakhzod Karimov
Dilmurod Turimov
Wooseong Kim
Jiyoun Kim
author_sort Shakhzod Karimov
collection DOAJ
description Objective Sarcopenia, a condition characterized by the progressive loss of skeletal muscle mass and strength, poses significant challenges in research due to missing data. Incomplete datasets undermine the accuracy and reliability of studies, necessitating effective imputation techniques. This study conducts a comparative analysis of three advanced methods—multiple imputation by chained equations (MICE), support vector regression, and K-nearest neighbors (KNN)—to address data completeness issues in sarcopenia research. Methods Following imputation, we utilized machine learning models, including logistic regression, gradient boosting, support vector machine, and random forest, to classify sarcopenia. The methodology encompassed rigorous data preprocessing, normalization, and the synthetic minority oversampling technique to address class imbalance and ensure unbiased model performance. Results The results revealed substantial variations in model accuracy based on the imputation method employed. The gradient boosting model consistently exhibited superior performance across all imputation strategies, demonstrating its robustness with imputed datasets. Additionally, KNN and MICE emerged as effective imputation techniques, preserving the original data distribution and enabling more accurate classification outcomes. Conclusion This study underscores the pivotal role of imputation methods in maintaining data integrity and enhancing predictive accuracy in sarcopenia research. The gradient boosting model's reliability across all strategies highlights its potential as a robust classifier, while the suitability of KNN and MICE for preserving data distribution supports their application in similar research contexts. These findings contribute to more reliable and valid insights in sarcopenia studies, ultimately supporting improved clinical outcomes.
format Article
id doaj-art-27b2b8933a7b4755b7b529bdd14ff288
institution Kabale University
issn 2055-2076
language English
publishDate 2025-01-01
publisher SAGE Publishing
record_format Article
series Digital Health
spelling doaj-art-27b2b8933a7b4755b7b529bdd14ff2882025-01-17T17:03:54ZengSAGE PublishingDigital Health2055-20762025-01-011110.1177/20552076241301960Comparative study of imputation strategies to improve the sarcopenia prediction taskShakhzod Karimov0Dilmurod Turimov1Wooseong Kim2Jiyoun Kim3 Department of Computer Engineering, , Seongnam-si, Republic of Korea Department of Computer Engineering, , Seongnam-si, Republic of Korea Department of Computer Engineering, , Seongnam-si, Republic of Korea Department of Exercise Rehabilitation & Welfare, , Incheon, Republic of KoreaObjective Sarcopenia, a condition characterized by the progressive loss of skeletal muscle mass and strength, poses significant challenges in research due to missing data. Incomplete datasets undermine the accuracy and reliability of studies, necessitating effective imputation techniques. This study conducts a comparative analysis of three advanced methods—multiple imputation by chained equations (MICE), support vector regression, and K-nearest neighbors (KNN)—to address data completeness issues in sarcopenia research. Methods Following imputation, we utilized machine learning models, including logistic regression, gradient boosting, support vector machine, and random forest, to classify sarcopenia. The methodology encompassed rigorous data preprocessing, normalization, and the synthetic minority oversampling technique to address class imbalance and ensure unbiased model performance. Results The results revealed substantial variations in model accuracy based on the imputation method employed. The gradient boosting model consistently exhibited superior performance across all imputation strategies, demonstrating its robustness with imputed datasets. Additionally, KNN and MICE emerged as effective imputation techniques, preserving the original data distribution and enabling more accurate classification outcomes. Conclusion This study underscores the pivotal role of imputation methods in maintaining data integrity and enhancing predictive accuracy in sarcopenia research. The gradient boosting model's reliability across all strategies highlights its potential as a robust classifier, while the suitability of KNN and MICE for preserving data distribution supports their application in similar research contexts. These findings contribute to more reliable and valid insights in sarcopenia studies, ultimately supporting improved clinical outcomes.https://doi.org/10.1177/20552076241301960
spellingShingle Shakhzod Karimov
Dilmurod Turimov
Wooseong Kim
Jiyoun Kim
Comparative study of imputation strategies to improve the sarcopenia prediction task
Digital Health
title Comparative study of imputation strategies to improve the sarcopenia prediction task
title_full Comparative study of imputation strategies to improve the sarcopenia prediction task
title_fullStr Comparative study of imputation strategies to improve the sarcopenia prediction task
title_full_unstemmed Comparative study of imputation strategies to improve the sarcopenia prediction task
title_short Comparative study of imputation strategies to improve the sarcopenia prediction task
title_sort comparative study of imputation strategies to improve the sarcopenia prediction task
url https://doi.org/10.1177/20552076241301960
work_keys_str_mv AT shakhzodkarimov comparativestudyofimputationstrategiestoimprovethesarcopeniapredictiontask
AT dilmurodturimov comparativestudyofimputationstrategiestoimprovethesarcopeniapredictiontask
AT wooseongkim comparativestudyofimputationstrategiestoimprovethesarcopeniapredictiontask
AT jiyounkim comparativestudyofimputationstrategiestoimprovethesarcopeniapredictiontask