Comparison between Logistic Regression and K-Nearest Neighbour Techniques with Application on Thalassemia Patients in Mosul

Thalassemia is a genetic disease that is transmitted from parents to children when both parents are carriers of the genetic mutation. This change leads to a decrease in the number, quality, and condition of red blood platelets and an increase in the rate of red blood platelet damage, which leads to...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammed Al jbory, Hutheyfa Taha
Format: Article
Language:Arabic
Published: College of Computer Science and Mathematics, University of Mosul 2025-06-01
Series:المجلة العراقية للعلوم الاحصائية
Subjects:
Online Access:https://stats.uomosul.edu.iq/article_187789_5503b5254e1a3b33a420e24aee06d343.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Thalassemia is a genetic disease that is transmitted from parents to children when both parents are carriers of the genetic mutation. This change leads to a decrease in the number, quality, and condition of red blood platelets and an increase in the rate of red blood platelet damage, which leads to iron accumulation in the body and a decrease in hemoglobin in the blood. This project aims to develop a model to predict thalassemia using the nearest neighbor technique and the logistic regression model based on the model evaluation criteria: accuracy, recall, precision, F1-score, and AUC. The data were obtained from Al-Hadbaa Specialized Hospital in Mosul. The data set included 280 observations, of which 149 (53.21%) were thalassemia intermedia and 131 (46.78%) were thalassemia major. The data was divided into 70% for training and 30% for screening. The experimental results showed that the logistic regression model performed better than the nearest neighbor algorithm with a precision of 96%, recall of 98%, and F1- score of 97% in the thalassemia intermedia category, while it had a precision of 97%, recall of 95%, and F1- score of 96% in the thalassemia major category, indicating that logistic regression performed well in distinguishing between these two categories. it has been shown that logistic regression is more effective than the K-nearest neighbor algorithm in classifying thalassemia patients, especially those with thalassemia major. The study showed that the type of distance used in the K-nearest neighbor algorithm, whether "Manhattan" or "Chebyshev", has a significant impact on the accuracy of predictions, with the highest accuracy reaching 95% when K = 4. It was also shown that the difference between distance calculation methods and the K value plays a major role in improving the classification results, as it was determined that the optimal value for K is 4, which led to improving the accuracy of predictions. The researcher suggests increasing the data size, as it is possible to improve the accuracy of models by increasing the data size. In addition, the researcher recommends using other artificial intelligence techniques, especially neural networks, to verify any additional improvements.   
ISSN:1680-855X
2664-2956