Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms

Abstract Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF)...

Full description

Saved in:
Bibliographic Details
Main Authors: Wanicha Tepakhan, Wisarut Srisintorn, Tipparat Penglong, Pirun Saelue
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-01458-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849326606674296832
author Wanicha Tepakhan
Wisarut Srisintorn
Tipparat Penglong
Pirun Saelue
author_facet Wanicha Tepakhan
Wisarut Srisintorn
Tipparat Penglong
Pirun Saelue
author_sort Wanicha Tepakhan
collection DOAJ
description Abstract Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF) and gradient boosting (GB) algorithms. Complete blood count data from 1143 patients with anemia and low mean corpuscular volume were collected (382 patients with IDA, 635 with Thal, and 126 with IDA and Thal). The data were randomly divided into the training and testing datasets in a ratio of 80:20. The RF and GB models had good diagnostic performances for predicting IDA and Thal in the training and testing datasets. In the testing dataset for predicting binary outcomes, GB and RF both had an accuracy of 90.7%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.953. A lower diagnostic performance was observed when patients with IDA and Thal were included. GB and RF showed accuracies of 80.4% and 82.2%, respectively, and AUC-ROC values of 0.910 and 0.899, respectively. In conclusion, we developed a machine learning approach using GB algorithm. This tool is potentially useful in Thal- and IDA-endemic regions.
format Article
id doaj-art-961e618ea02b4b1eaa0513cb3ebff1b7
institution Kabale University
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-961e618ea02b4b1eaa0513cb3ebff1b72025-08-20T03:48:06ZengNature PortfolioScientific Reports2045-23222025-05-011511810.1038/s41598-025-01458-5Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithmsWanicha Tepakhan0Wisarut Srisintorn1Tipparat Penglong2Pirun Saelue3Department of Pathology, Faculty of Medicine, Prince of Songkla UniversityDepartment of Family Medicine and Preventive Medicine, Faculty of Medicine, Prince of Songkla UniversityDepartment of Pathology, Faculty of Medicine, Prince of Songkla UniversityHematology Unit, Division of Internal Medicine, Faculty of Medicine, Prince of Songkla UniversityAbstract Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF) and gradient boosting (GB) algorithms. Complete blood count data from 1143 patients with anemia and low mean corpuscular volume were collected (382 patients with IDA, 635 with Thal, and 126 with IDA and Thal). The data were randomly divided into the training and testing datasets in a ratio of 80:20. The RF and GB models had good diagnostic performances for predicting IDA and Thal in the training and testing datasets. In the testing dataset for predicting binary outcomes, GB and RF both had an accuracy of 90.7%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.953. A lower diagnostic performance was observed when patients with IDA and Thal were included. GB and RF showed accuracies of 80.4% and 82.2%, respectively, and AUC-ROC values of 0.910 and 0.899, respectively. In conclusion, we developed a machine learning approach using GB algorithm. This tool is potentially useful in Thal- and IDA-endemic regions.https://doi.org/10.1038/s41598-025-01458-5Iron deficiency anemiaThalassemiaMachine learningRandom forestGradient boosting
spellingShingle Wanicha Tepakhan
Wisarut Srisintorn
Tipparat Penglong
Pirun Saelue
Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
Scientific Reports
Iron deficiency anemia
Thalassemia
Machine learning
Random forest
Gradient boosting
title Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
title_full Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
title_fullStr Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
title_full_unstemmed Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
title_short Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
title_sort machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
topic Iron deficiency anemia
Thalassemia
Machine learning
Random forest
Gradient boosting
url https://doi.org/10.1038/s41598-025-01458-5
work_keys_str_mv AT wanichatepakhan machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms
AT wisarutsrisintorn machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms
AT tipparatpenglong machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms
AT pirunsaelue machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms