Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis

Handling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 obser...

Full description

Saved in:
Bibliographic Details
Main Authors: Dhekre Saber Saleh, Mohd Shahizan Othman, Wshyar Omar Khudhur, Eman Attallah Aljabarti
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/acis/2766701
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Handling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 observations with 31 clinical features (5.2% missing values). Feature importance analysis identified thyroid-stimulating hormone (TSH), thyroxine (TT4), and free thyroxine index (FTI) as consistently significant biomarkers across all imputation methods. Five classifiers—Naïve Bayes, linear regression, support vector machines (SVM), LightGBM, and recurrent neural networks (RNN)—were assessed on imputed datasets, with performance evaluated through accuracy, F1 score, and recall. The KNN imputation method enhanced LightGBM’s accuracy by 0.47% over mean imputation (99.06% vs. 98.99%) and by 1.47% over cold deck (99.06% vs. 98.59%), demonstrating its superiority in preserving feature relationships and enhancing predictive power. LightGBM achieved the highest performance with KNN imputation (accuracy: 99.06%, F1: 97.57%, and recall: 97.83%), outperforming other classifiers by 2.5%–4.0% in accuracy. These results underscore the necessity of robust imputation techniques for reliable thyroid disease prediction. The study provides a reproducible framework for managing missing data in healthcare analytics, emphasizing the interplay between imputation, feature importance, and classifier selection to optimize diagnostic accuracy.
ISSN:1687-9732