Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease
Background: Coronary heart disease (CHD) remains a prominent cause of mortality globally, necessitating early and accurate detection methods. Traditional diagnostic approaches can be invasive, costly, and time-consuming, necessitating the need for more efficient alternatives. This aimed to optimize...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co., Ltd.
2024-09-01
|
| Series: | Informatics and Health |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2949953424000122 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849690371977641984 |
|---|---|
| author | Temidayo Oluwatosin Omotehinwa David Opeoluwa Oyewola Ervin Gubin Moung |
| author_facet | Temidayo Oluwatosin Omotehinwa David Opeoluwa Oyewola Ervin Gubin Moung |
| author_sort | Temidayo Oluwatosin Omotehinwa |
| collection | DOAJ |
| description | Background: Coronary heart disease (CHD) remains a prominent cause of mortality globally, necessitating early and accurate detection methods. Traditional diagnostic approaches can be invasive, costly, and time-consuming, necessitating the need for more efficient alternatives. This aimed to optimize the Light Gradient-Boosting Machine (LightGBM) algorithm to enhance its performance and accuracy in the early detection of CHD, providing a reliable, cost-effective, and non-invasive diagnostic tool. Methods: The Framingham Heart Study (FHS) dataset publicly available on Kaggle was used in this study. Multiple Imputations by Chained Equations (MICE) were applied separately to the training and testing sets to handle missing data. Borderline-SMOTE (Synthetic Minority Over-sampling Technique) was used on the training set to balance the dataset. The LightGBM algorithm was selected for its efficiency in classification tasks, and Bayesian Optimization with Tree-structured Parzen Estimator (TPE) was employed to fine-tune its hyperparameters. The optimized LightGBM model was trained and evaluated using metrics such as accuracy, precision, and AUC-ROC on the test set, with cross-validation to ensure robustness and generalizability. Findings: The optimized LightGBM model showed significant improvement in early CHD detection. The baseline LightGBM model with dropped missing values had an accuracy of 0.8333, sensitivity of 0.1081, precision of 0.3429, F1 score of 0.1644, and AUC of 0.6875. With MICE imputation, performance improved to an accuracy of 0.9399, sensitivity of 0.6693, precision of 0.9043, F1 score of 0.7692, and AUC of 0.9457. The combined approach of Borderline-SMOTE, MICE imputation, and TPE for LightGBM achieved an accuracy of 0.9882, sensitivity of 0.9370, precision of 0.9835, F1 score of 0.9597, and AUC of 0.9963, indicating a highly effective and robust model. Interpretation: The optimized model demonstrated outstanding performance in early CHD detection. The study's strengths include its comprehensive approach to addressing missing data and class imbalance and the fine-tuning of hyperparameters through Bayesian Optimization. However, there is a need to test with other datasets for its generalizability to be well-established. This study provides a strong framework for early CHD detection, improving clinical practice by allowing for more precise and dependable diagnostics and effective interventions. |
| format | Article |
| id | doaj-art-d09efeb07fff465dbaccd1b04c01c354 |
| institution | DOAJ |
| issn | 2949-9534 |
| language | English |
| publishDate | 2024-09-01 |
| publisher | KeAi Communications Co., Ltd. |
| record_format | Article |
| series | Informatics and Health |
| spelling | doaj-art-d09efeb07fff465dbaccd1b04c01c3542025-08-20T03:21:19ZengKeAi Communications Co., Ltd.Informatics and Health2949-95342024-09-0112708110.1016/j.infoh.2024.06.001Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart diseaseTemidayo Oluwatosin Omotehinwa0David Opeoluwa Oyewola1Ervin Gubin Moung2Department of Mathematics and Computer Science, Federal University of Health Sciences, P.M.B. 145, Otukpo, Nigeria; Correspondence to: Department of Mathematics and Computer Science, Faculty of Science, Federal University of Health Sciences, P.M.B. 145, Otukpo, Nigeria.Department of Mathematics and Statistics, Federal University Kashere, P.M.B. 0182, Gombe, NigeriaData Technologies and Applications (DaTA) Research Group, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, MalaysiaBackground: Coronary heart disease (CHD) remains a prominent cause of mortality globally, necessitating early and accurate detection methods. Traditional diagnostic approaches can be invasive, costly, and time-consuming, necessitating the need for more efficient alternatives. This aimed to optimize the Light Gradient-Boosting Machine (LightGBM) algorithm to enhance its performance and accuracy in the early detection of CHD, providing a reliable, cost-effective, and non-invasive diagnostic tool. Methods: The Framingham Heart Study (FHS) dataset publicly available on Kaggle was used in this study. Multiple Imputations by Chained Equations (MICE) were applied separately to the training and testing sets to handle missing data. Borderline-SMOTE (Synthetic Minority Over-sampling Technique) was used on the training set to balance the dataset. The LightGBM algorithm was selected for its efficiency in classification tasks, and Bayesian Optimization with Tree-structured Parzen Estimator (TPE) was employed to fine-tune its hyperparameters. The optimized LightGBM model was trained and evaluated using metrics such as accuracy, precision, and AUC-ROC on the test set, with cross-validation to ensure robustness and generalizability. Findings: The optimized LightGBM model showed significant improvement in early CHD detection. The baseline LightGBM model with dropped missing values had an accuracy of 0.8333, sensitivity of 0.1081, precision of 0.3429, F1 score of 0.1644, and AUC of 0.6875. With MICE imputation, performance improved to an accuracy of 0.9399, sensitivity of 0.6693, precision of 0.9043, F1 score of 0.7692, and AUC of 0.9457. The combined approach of Borderline-SMOTE, MICE imputation, and TPE for LightGBM achieved an accuracy of 0.9882, sensitivity of 0.9370, precision of 0.9835, F1 score of 0.9597, and AUC of 0.9963, indicating a highly effective and robust model. Interpretation: The optimized model demonstrated outstanding performance in early CHD detection. The study's strengths include its comprehensive approach to addressing missing data and class imbalance and the fine-tuning of hyperparameters through Bayesian Optimization. However, there is a need to test with other datasets for its generalizability to be well-established. This study provides a strong framework for early CHD detection, improving clinical practice by allowing for more precise and dependable diagnostics and effective interventions.http://www.sciencedirect.com/science/article/pii/S2949953424000122Clinical decision makingCoronary heart diseaseLight gradient-boosting machineMachine learningMICETree-structured Parzen estimator |
| spellingShingle | Temidayo Oluwatosin Omotehinwa David Opeoluwa Oyewola Ervin Gubin Moung Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease Informatics and Health Clinical decision making Coronary heart disease Light gradient-boosting machine Machine learning MICE Tree-structured Parzen estimator |
| title | Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease |
| title_full | Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease |
| title_fullStr | Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease |
| title_full_unstemmed | Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease |
| title_short | Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease |
| title_sort | optimizing the light gradient boosting machine algorithm for an efficient early detection of coronary heart disease |
| topic | Clinical decision making Coronary heart disease Light gradient-boosting machine Machine learning MICE Tree-structured Parzen estimator |
| url | http://www.sciencedirect.com/science/article/pii/S2949953424000122 |
| work_keys_str_mv | AT temidayooluwatosinomotehinwa optimizingthelightgradientboostingmachinealgorithmforanefficientearlydetectionofcoronaryheartdisease AT davidopeoluwaoyewola optimizingthelightgradientboostingmachinealgorithmforanefficientearlydetectionofcoronaryheartdisease AT ervingubinmoung optimizingthelightgradientboostingmachinealgorithmforanefficientearlydetectionofcoronaryheartdisease |