Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demand

Abstract This study aims to construct a prediction model for the demand for medical and daily care services of the elderly and to explore the factors that affect the demand for medical and daily care services of the elderly. In this study, a questionnaire survey on the demand for medical and daily c...

Full description

Saved in:
Bibliographic Details
Main Authors: Guangmei Yang, Guangdong Wang, Leping Wan, Xinle Wang, Yan He
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-92722-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850023322588282880
author Guangmei Yang
Guangdong Wang
Leping Wan
Xinle Wang
Yan He
author_facet Guangmei Yang
Guangdong Wang
Leping Wan
Xinle Wang
Yan He
author_sort Guangmei Yang
collection DOAJ
description Abstract This study aims to construct a prediction model for the demand for medical and daily care services of the elderly and to explore the factors that affect the demand for medical and daily care services of the elderly. In this study, a questionnaire survey on the demand for medical and daily care services of 1291 elderly was conducted using multi-stage stratified whole cluster random sampling. SPSS21.0 statistical analysis software was used to describe the basic data of the elderly statistically, and univariate analysis was used to screen variables for model construction and binary logistic regression analysis. The acquired dataset has class imbalance, and to handle this issue, Synthetic Minority Over Sampling Technique with TomekLink (SMOTE-TomekLink) was adopted to resample the dataset for class-balancing. To improve computational efficiency, we used three algorithms to develop prediction models, including Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Light Gradient Boosting Machine (LightGBM) algorithms. The performance of each model was measured, and the performance of the prediction model was obtained using the following performance metrics: accuracy (ACC), recall (R), precision (P), F1-score, and area under the receiver operating characteristic (AUC). The prediction models for the medical and daily care services demand of the elderly were developed and validated using 12 and 13 key features, respectively. The LightGBM algorithm emerged as the superior prediction model for estimating the service needs of the elderly. For the medical service demand prediction model, LightGBM achieved an AUC of 0.910 and F1-score of 0.841. In the daily care services demand prediction model, LightGBM demonstrated an AUC of 0.906 and an F1-score of 0.819. In the LightGBM model, the analysis of feature importance indicates that the number of chronic diseases, education level, and financial sources emerge as the most significant predictors for the demand of healthcare services, encompassing both medical and daily care services. Based on questionnaire information combined with feature selection, unbalanced data processing and machine learning methods, this study constructed a machine learning model for predicting the demand for medical and daily care services for the elderly, and analyzed the influencing factors of the demand for medical and daily care services for the elderly, providing a reference for the construction and verification of future prediction models for the demand for medical and daily care services for the elderly.
format Article
id doaj-art-c9509a2a547848c2b0f1e34e37b3ad40
institution DOAJ
issn 2045-2322
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-c9509a2a547848c2b0f1e34e37b3ad402025-08-20T03:01:23ZengNature PortfolioScientific Reports2045-23222025-03-0115111210.1038/s41598-025-92722-1Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demandGuangmei Yang0Guangdong Wang1Leping Wan2Xinle Wang3Yan He4The Affiliated Encephalopathy Hospital of Zhengzhou UniversityNorthwest Agriculture and Forestry University College of Natural Resources and EnvironmentZhengzhou UniversityZhengzhou UniversityHainan Medical UniversityAbstract This study aims to construct a prediction model for the demand for medical and daily care services of the elderly and to explore the factors that affect the demand for medical and daily care services of the elderly. In this study, a questionnaire survey on the demand for medical and daily care services of 1291 elderly was conducted using multi-stage stratified whole cluster random sampling. SPSS21.0 statistical analysis software was used to describe the basic data of the elderly statistically, and univariate analysis was used to screen variables for model construction and binary logistic regression analysis. The acquired dataset has class imbalance, and to handle this issue, Synthetic Minority Over Sampling Technique with TomekLink (SMOTE-TomekLink) was adopted to resample the dataset for class-balancing. To improve computational efficiency, we used three algorithms to develop prediction models, including Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Light Gradient Boosting Machine (LightGBM) algorithms. The performance of each model was measured, and the performance of the prediction model was obtained using the following performance metrics: accuracy (ACC), recall (R), precision (P), F1-score, and area under the receiver operating characteristic (AUC). The prediction models for the medical and daily care services demand of the elderly were developed and validated using 12 and 13 key features, respectively. The LightGBM algorithm emerged as the superior prediction model for estimating the service needs of the elderly. For the medical service demand prediction model, LightGBM achieved an AUC of 0.910 and F1-score of 0.841. In the daily care services demand prediction model, LightGBM demonstrated an AUC of 0.906 and an F1-score of 0.819. In the LightGBM model, the analysis of feature importance indicates that the number of chronic diseases, education level, and financial sources emerge as the most significant predictors for the demand of healthcare services, encompassing both medical and daily care services. Based on questionnaire information combined with feature selection, unbalanced data processing and machine learning methods, this study constructed a machine learning model for predicting the demand for medical and daily care services for the elderly, and analyzed the influencing factors of the demand for medical and daily care services for the elderly, providing a reference for the construction and verification of future prediction models for the demand for medical and daily care services for the elderly.https://doi.org/10.1038/s41598-025-92722-1SMOTE-TomekLinkMachine learningThe elderlyPredictive modelMedical and daily care services demand
spellingShingle Guangmei Yang
Guangdong Wang
Leping Wan
Xinle Wang
Yan He
Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demand
Scientific Reports
SMOTE-TomekLink
Machine learning
The elderly
Predictive model
Medical and daily care services demand
title Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demand
title_full Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demand
title_fullStr Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demand
title_full_unstemmed Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demand
title_short Utilizing SMOTE-TomekLink and machine learning to construct a predictive model for elderly medical and daily care services demand
title_sort utilizing smote tomeklink and machine learning to construct a predictive model for elderly medical and daily care services demand
topic SMOTE-TomekLink
Machine learning
The elderly
Predictive model
Medical and daily care services demand
url https://doi.org/10.1038/s41598-025-92722-1
work_keys_str_mv AT guangmeiyang utilizingsmotetomeklinkandmachinelearningtoconstructapredictivemodelforelderlymedicalanddailycareservicesdemand
AT guangdongwang utilizingsmotetomeklinkandmachinelearningtoconstructapredictivemodelforelderlymedicalanddailycareservicesdemand
AT lepingwan utilizingsmotetomeklinkandmachinelearningtoconstructapredictivemodelforelderlymedicalanddailycareservicesdemand
AT xinlewang utilizingsmotetomeklinkandmachinelearningtoconstructapredictivemodelforelderlymedicalanddailycareservicesdemand
AT yanhe utilizingsmotetomeklinkandmachinelearningtoconstructapredictivemodelforelderlymedicalanddailycareservicesdemand