Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms
Abstract Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF)...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-01458-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849326606674296832 |
|---|---|
| author | Wanicha Tepakhan Wisarut Srisintorn Tipparat Penglong Pirun Saelue |
| author_facet | Wanicha Tepakhan Wisarut Srisintorn Tipparat Penglong Pirun Saelue |
| author_sort | Wanicha Tepakhan |
| collection | DOAJ |
| description | Abstract Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF) and gradient boosting (GB) algorithms. Complete blood count data from 1143 patients with anemia and low mean corpuscular volume were collected (382 patients with IDA, 635 with Thal, and 126 with IDA and Thal). The data were randomly divided into the training and testing datasets in a ratio of 80:20. The RF and GB models had good diagnostic performances for predicting IDA and Thal in the training and testing datasets. In the testing dataset for predicting binary outcomes, GB and RF both had an accuracy of 90.7%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.953. A lower diagnostic performance was observed when patients with IDA and Thal were included. GB and RF showed accuracies of 80.4% and 82.2%, respectively, and AUC-ROC values of 0.910 and 0.899, respectively. In conclusion, we developed a machine learning approach using GB algorithm. This tool is potentially useful in Thal- and IDA-endemic regions. |
| format | Article |
| id | doaj-art-961e618ea02b4b1eaa0513cb3ebff1b7 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-961e618ea02b4b1eaa0513cb3ebff1b72025-08-20T03:48:06ZengNature PortfolioScientific Reports2045-23222025-05-011511810.1038/s41598-025-01458-5Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithmsWanicha Tepakhan0Wisarut Srisintorn1Tipparat Penglong2Pirun Saelue3Department of Pathology, Faculty of Medicine, Prince of Songkla UniversityDepartment of Family Medicine and Preventive Medicine, Faculty of Medicine, Prince of Songkla UniversityDepartment of Pathology, Faculty of Medicine, Prince of Songkla UniversityHematology Unit, Division of Internal Medicine, Faculty of Medicine, Prince of Songkla UniversityAbstract Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF) and gradient boosting (GB) algorithms. Complete blood count data from 1143 patients with anemia and low mean corpuscular volume were collected (382 patients with IDA, 635 with Thal, and 126 with IDA and Thal). The data were randomly divided into the training and testing datasets in a ratio of 80:20. The RF and GB models had good diagnostic performances for predicting IDA and Thal in the training and testing datasets. In the testing dataset for predicting binary outcomes, GB and RF both had an accuracy of 90.7%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.953. A lower diagnostic performance was observed when patients with IDA and Thal were included. GB and RF showed accuracies of 80.4% and 82.2%, respectively, and AUC-ROC values of 0.910 and 0.899, respectively. In conclusion, we developed a machine learning approach using GB algorithm. This tool is potentially useful in Thal- and IDA-endemic regions.https://doi.org/10.1038/s41598-025-01458-5Iron deficiency anemiaThalassemiaMachine learningRandom forestGradient boosting |
| spellingShingle | Wanicha Tepakhan Wisarut Srisintorn Tipparat Penglong Pirun Saelue Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms Scientific Reports Iron deficiency anemia Thalassemia Machine learning Random forest Gradient boosting |
| title | Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms |
| title_full | Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms |
| title_fullStr | Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms |
| title_full_unstemmed | Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms |
| title_short | Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms |
| title_sort | machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms |
| topic | Iron deficiency anemia Thalassemia Machine learning Random forest Gradient boosting |
| url | https://doi.org/10.1038/s41598-025-01458-5 |
| work_keys_str_mv | AT wanichatepakhan machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms AT wisarutsrisintorn machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms AT tipparatpenglong machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms AT pirunsaelue machinelearningapproachfordifferentiatingirondeficiencyanemiaandthalassemiausingrandomforestandgradientboostingalgorithms |