Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease

Background: Coronary heart disease (CHD) remains a prominent cause of mortality globally, necessitating early and accurate detection methods. Traditional diagnostic approaches can be invasive, costly, and time-consuming, necessitating the need for more efficient alternatives. This aimed to optimize...

Full description

Saved in:
Bibliographic Details
Main Authors: Temidayo Oluwatosin Omotehinwa, David Opeoluwa Oyewola, Ervin Gubin Moung
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-09-01
Series:Informatics and Health
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949953424000122
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849690371977641984
author Temidayo Oluwatosin Omotehinwa
David Opeoluwa Oyewola
Ervin Gubin Moung
author_facet Temidayo Oluwatosin Omotehinwa
David Opeoluwa Oyewola
Ervin Gubin Moung
author_sort Temidayo Oluwatosin Omotehinwa
collection DOAJ
description Background: Coronary heart disease (CHD) remains a prominent cause of mortality globally, necessitating early and accurate detection methods. Traditional diagnostic approaches can be invasive, costly, and time-consuming, necessitating the need for more efficient alternatives. This aimed to optimize the Light Gradient-Boosting Machine (LightGBM) algorithm to enhance its performance and accuracy in the early detection of CHD, providing a reliable, cost-effective, and non-invasive diagnostic tool. Methods: The Framingham Heart Study (FHS) dataset publicly available on Kaggle was used in this study. Multiple Imputations by Chained Equations (MICE) were applied separately to the training and testing sets to handle missing data. Borderline-SMOTE (Synthetic Minority Over-sampling Technique) was used on the training set to balance the dataset. The LightGBM algorithm was selected for its efficiency in classification tasks, and Bayesian Optimization with Tree-structured Parzen Estimator (TPE) was employed to fine-tune its hyperparameters. The optimized LightGBM model was trained and evaluated using metrics such as accuracy, precision, and AUC-ROC on the test set, with cross-validation to ensure robustness and generalizability. Findings: The optimized LightGBM model showed significant improvement in early CHD detection. The baseline LightGBM model with dropped missing values had an accuracy of 0.8333, sensitivity of 0.1081, precision of 0.3429, F1 score of 0.1644, and AUC of 0.6875. With MICE imputation, performance improved to an accuracy of 0.9399, sensitivity of 0.6693, precision of 0.9043, F1 score of 0.7692, and AUC of 0.9457. The combined approach of Borderline-SMOTE, MICE imputation, and TPE for LightGBM achieved an accuracy of 0.9882, sensitivity of 0.9370, precision of 0.9835, F1 score of 0.9597, and AUC of 0.9963, indicating a highly effective and robust model. Interpretation: The optimized model demonstrated outstanding performance in early CHD detection. The study's strengths include its comprehensive approach to addressing missing data and class imbalance and the fine-tuning of hyperparameters through Bayesian Optimization. However, there is a need to test with other datasets for its generalizability to be well-established. This study provides a strong framework for early CHD detection, improving clinical practice by allowing for more precise and dependable diagnostics and effective interventions.
format Article
id doaj-art-d09efeb07fff465dbaccd1b04c01c354
institution DOAJ
issn 2949-9534
language English
publishDate 2024-09-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Informatics and Health
spelling doaj-art-d09efeb07fff465dbaccd1b04c01c3542025-08-20T03:21:19ZengKeAi Communications Co., Ltd.Informatics and Health2949-95342024-09-0112708110.1016/j.infoh.2024.06.001Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart diseaseTemidayo Oluwatosin Omotehinwa0David Opeoluwa Oyewola1Ervin Gubin Moung2Department of Mathematics and Computer Science, Federal University of Health Sciences, P.M.B. 145, Otukpo, Nigeria; Correspondence to: Department of Mathematics and Computer Science, Faculty of Science, Federal University of Health Sciences, P.M.B. 145, Otukpo, Nigeria.Department of Mathematics and Statistics, Federal University Kashere, P.M.B. 0182, Gombe, NigeriaData Technologies and Applications (DaTA) Research Group, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, MalaysiaBackground: Coronary heart disease (CHD) remains a prominent cause of mortality globally, necessitating early and accurate detection methods. Traditional diagnostic approaches can be invasive, costly, and time-consuming, necessitating the need for more efficient alternatives. This aimed to optimize the Light Gradient-Boosting Machine (LightGBM) algorithm to enhance its performance and accuracy in the early detection of CHD, providing a reliable, cost-effective, and non-invasive diagnostic tool. Methods: The Framingham Heart Study (FHS) dataset publicly available on Kaggle was used in this study. Multiple Imputations by Chained Equations (MICE) were applied separately to the training and testing sets to handle missing data. Borderline-SMOTE (Synthetic Minority Over-sampling Technique) was used on the training set to balance the dataset. The LightGBM algorithm was selected for its efficiency in classification tasks, and Bayesian Optimization with Tree-structured Parzen Estimator (TPE) was employed to fine-tune its hyperparameters. The optimized LightGBM model was trained and evaluated using metrics such as accuracy, precision, and AUC-ROC on the test set, with cross-validation to ensure robustness and generalizability. Findings: The optimized LightGBM model showed significant improvement in early CHD detection. The baseline LightGBM model with dropped missing values had an accuracy of 0.8333, sensitivity of 0.1081, precision of 0.3429, F1 score of 0.1644, and AUC of 0.6875. With MICE imputation, performance improved to an accuracy of 0.9399, sensitivity of 0.6693, precision of 0.9043, F1 score of 0.7692, and AUC of 0.9457. The combined approach of Borderline-SMOTE, MICE imputation, and TPE for LightGBM achieved an accuracy of 0.9882, sensitivity of 0.9370, precision of 0.9835, F1 score of 0.9597, and AUC of 0.9963, indicating a highly effective and robust model. Interpretation: The optimized model demonstrated outstanding performance in early CHD detection. The study's strengths include its comprehensive approach to addressing missing data and class imbalance and the fine-tuning of hyperparameters through Bayesian Optimization. However, there is a need to test with other datasets for its generalizability to be well-established. This study provides a strong framework for early CHD detection, improving clinical practice by allowing for more precise and dependable diagnostics and effective interventions.http://www.sciencedirect.com/science/article/pii/S2949953424000122Clinical decision makingCoronary heart diseaseLight gradient-boosting machineMachine learningMICETree-structured Parzen estimator
spellingShingle Temidayo Oluwatosin Omotehinwa
David Opeoluwa Oyewola
Ervin Gubin Moung
Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease
Informatics and Health
Clinical decision making
Coronary heart disease
Light gradient-boosting machine
Machine learning
MICE
Tree-structured Parzen estimator
title Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease
title_full Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease
title_fullStr Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease
title_full_unstemmed Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease
title_short Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease
title_sort optimizing the light gradient boosting machine algorithm for an efficient early detection of coronary heart disease
topic Clinical decision making
Coronary heart disease
Light gradient-boosting machine
Machine learning
MICE
Tree-structured Parzen estimator
url http://www.sciencedirect.com/science/article/pii/S2949953424000122
work_keys_str_mv AT temidayooluwatosinomotehinwa optimizingthelightgradientboostingmachinealgorithmforanefficientearlydetectionofcoronaryheartdisease
AT davidopeoluwaoyewola optimizingthelightgradientboostingmachinealgorithmforanefficientearlydetectionofcoronaryheartdisease
AT ervingubinmoung optimizingthelightgradientboostingmachinealgorithmforanefficientearlydetectionofcoronaryheartdisease