The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients

Background and Objective: Diabetes Mellitus is a long-term, multifaceted metabolic condition that necessitates ongoing medical management. Hypogonadism is a syndrome that is a clinical and/or biochemical indicator of testosterone deficiency. Cross-sectional studies have reported that 20–80.4% of all...

Full description

Saved in:
Bibliographic Details
Main Authors: Mehmet Kivrak, Ugur Avci, Hakki Uzun, Cuneyt Ardic
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/14/23/2634
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850061266567036928
author Mehmet Kivrak
Ugur Avci
Hakki Uzun
Cuneyt Ardic
author_facet Mehmet Kivrak
Ugur Avci
Hakki Uzun
Cuneyt Ardic
author_sort Mehmet Kivrak
collection DOAJ
description Background and Objective: Diabetes Mellitus is a long-term, multifaceted metabolic condition that necessitates ongoing medical management. Hypogonadism is a syndrome that is a clinical and/or biochemical indicator of testosterone deficiency. Cross-sectional studies have reported that 20–80.4% of all men with Type 2 diabetes have hypogonadism, and Type 2 diabetes is related to low testosterone. This study presents an analysis of the use of ML and EL classifiers in predicting testosterone deficiency. In our study, we compared optimized traditional ML classifiers and three EL classifiers using grid search and stratified k-fold cross-validation. We used the SMOTE method for the class imbalance problem. Methods: This database contains 3397 patients for the assessment of testosterone deficiency. Among these patients, 1886 patients with Type 2 diabetes were included in the study. In the data preprocessing stage, firstly, outlier/excessive observation analyses were performed with LOF and missing value analyses were performed with random forest. The SMOTE is a method for generating synthetic samples of the minority class. Four basic classifiers, namely MLP, RF, ELM and LR, were used as first-level classifiers. Tree ensemble classifiers, namely ADA, XGBoost and SGB, were used as second-level classifiers. Results: After the SMOTE, while the diagnostic accuracy decreased in all base classifiers except ELM, sensitivity values increased in all classifiers. Similarly, while the specificity values decreased in all classifiers, F1 score increased. The RF classifier gave more successful results on the base-training dataset. The most successful ensemble classifier in the training dataset was the ADA classifier in the original data and in the SMOTE data. In terms of the testing data, XGBoost is the most suitable model for your intended use in evaluating model performance. XGBoost, which exhibits a balanced performance especially when the SMOTE is used, can be preferred to correct class imbalance. Conclusions: The SMOTE is used to correct the class imbalance in the original data. However, as seen in this study, when the SMOTE was applied, the diagnostic accuracy decreased in some models but the sensitivity increased significantly. This shows the positive effects of the SMOTE in terms of better predicting the minority class.
format Article
id doaj-art-9b78116c541a403fa2c08bd5eb479f20
institution DOAJ
issn 2075-4418
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj-art-9b78116c541a403fa2c08bd5eb479f202025-08-20T02:50:18ZengMDPI AGDiagnostics2075-44182024-11-011423263410.3390/diagnostics14232634The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes PatientsMehmet Kivrak0Ugur Avci1Hakki Uzun2Cuneyt Ardic3Faculty of Medicine, Biostatistics and Medical Informatics, Recep Tayyip Erdogan University, Rize 53100, TürkiyeFaculty of Medicine, Endocrinology and Metabolism, Recep Tayyip Erdogan University, Rize 53100, TürkiyeFaculty of Medicine, Urology, Recep Tayyip Erdogan University, Rize 53100, TürkiyeFaculty of Medicine, Primary Care Physician, Recep Tayyip Erdogan University, Rize 53100, TürkiyeBackground and Objective: Diabetes Mellitus is a long-term, multifaceted metabolic condition that necessitates ongoing medical management. Hypogonadism is a syndrome that is a clinical and/or biochemical indicator of testosterone deficiency. Cross-sectional studies have reported that 20–80.4% of all men with Type 2 diabetes have hypogonadism, and Type 2 diabetes is related to low testosterone. This study presents an analysis of the use of ML and EL classifiers in predicting testosterone deficiency. In our study, we compared optimized traditional ML classifiers and three EL classifiers using grid search and stratified k-fold cross-validation. We used the SMOTE method for the class imbalance problem. Methods: This database contains 3397 patients for the assessment of testosterone deficiency. Among these patients, 1886 patients with Type 2 diabetes were included in the study. In the data preprocessing stage, firstly, outlier/excessive observation analyses were performed with LOF and missing value analyses were performed with random forest. The SMOTE is a method for generating synthetic samples of the minority class. Four basic classifiers, namely MLP, RF, ELM and LR, were used as first-level classifiers. Tree ensemble classifiers, namely ADA, XGBoost and SGB, were used as second-level classifiers. Results: After the SMOTE, while the diagnostic accuracy decreased in all base classifiers except ELM, sensitivity values increased in all classifiers. Similarly, while the specificity values decreased in all classifiers, F1 score increased. The RF classifier gave more successful results on the base-training dataset. The most successful ensemble classifier in the training dataset was the ADA classifier in the original data and in the SMOTE data. In terms of the testing data, XGBoost is the most suitable model for your intended use in evaluating model performance. XGBoost, which exhibits a balanced performance especially when the SMOTE is used, can be preferred to correct class imbalance. Conclusions: The SMOTE is used to correct the class imbalance in the original data. However, as seen in this study, when the SMOTE was applied, the diagnostic accuracy decreased in some models but the sensitivity increased significantly. This shows the positive effects of the SMOTE in terms of better predicting the minority class.https://www.mdpi.com/2075-4418/14/23/2634SMOTEimbalance problemtotal testosteronemachine learningensemble learning
spellingShingle Mehmet Kivrak
Ugur Avci
Hakki Uzun
Cuneyt Ardic
The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients
Diagnostics
SMOTE
imbalance problem
total testosterone
machine learning
ensemble learning
title The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients
title_full The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients
title_fullStr The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients
title_full_unstemmed The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients
title_short The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients
title_sort impact of the smote method on machine learning and ensemble learning performance results in addressing class imbalance in data used for predicting total testosterone deficiency in type 2 diabetes patients
topic SMOTE
imbalance problem
total testosterone
machine learning
ensemble learning
url https://www.mdpi.com/2075-4418/14/23/2634
work_keys_str_mv AT mehmetkivrak theimpactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients
AT uguravci theimpactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients
AT hakkiuzun theimpactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients
AT cuneytardic theimpactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients
AT mehmetkivrak impactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients
AT uguravci impactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients
AT hakkiuzun impactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients
AT cuneytardic impactofthesmotemethodonmachinelearningandensemblelearningperformanceresultsinaddressingclassimbalanceindatausedforpredictingtotaltestosteronedeficiencyintype2diabetespatients