Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis

Handling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 obser...

Full description

Saved in:
Bibliographic Details
Main Authors: Dhekre Saber Saleh, Mohd Shahizan Othman, Wshyar Omar Khudhur, Eman Attallah Aljabarti
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/acis/2766701
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850275738870087680
author Dhekre Saber Saleh
Mohd Shahizan Othman
Wshyar Omar Khudhur
Eman Attallah Aljabarti
author_facet Dhekre Saber Saleh
Mohd Shahizan Othman
Wshyar Omar Khudhur
Eman Attallah Aljabarti
author_sort Dhekre Saber Saleh
collection DOAJ
description Handling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 observations with 31 clinical features (5.2% missing values). Feature importance analysis identified thyroid-stimulating hormone (TSH), thyroxine (TT4), and free thyroxine index (FTI) as consistently significant biomarkers across all imputation methods. Five classifiers—Naïve Bayes, linear regression, support vector machines (SVM), LightGBM, and recurrent neural networks (RNN)—were assessed on imputed datasets, with performance evaluated through accuracy, F1 score, and recall. The KNN imputation method enhanced LightGBM’s accuracy by 0.47% over mean imputation (99.06% vs. 98.99%) and by 1.47% over cold deck (99.06% vs. 98.59%), demonstrating its superiority in preserving feature relationships and enhancing predictive power. LightGBM achieved the highest performance with KNN imputation (accuracy: 99.06%, F1: 97.57%, and recall: 97.83%), outperforming other classifiers by 2.5%–4.0% in accuracy. These results underscore the necessity of robust imputation techniques for reliable thyroid disease prediction. The study provides a reproducible framework for managing missing data in healthcare analytics, emphasizing the interplay between imputation, feature importance, and classifier selection to optimize diagnostic accuracy.
format Article
id doaj-art-ea4b3e9bc4594b5faf82354a66ab6f75
institution OA Journals
issn 1687-9732
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Applied Computational Intelligence and Soft Computing
spelling doaj-art-ea4b3e9bc4594b5faf82354a66ab6f752025-08-20T01:50:37ZengWileyApplied Computational Intelligence and Soft Computing1687-97322025-01-01202510.1155/acis/2766701Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative AnalysisDhekre Saber Saleh0Mohd Shahizan Othman1Wshyar Omar Khudhur2Eman Attallah Aljabarti3Faculty of ComputingFaculty of ComputingDepartment of Information TechnologyFaculty of Computer Science & Information TechnologyHandling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 observations with 31 clinical features (5.2% missing values). Feature importance analysis identified thyroid-stimulating hormone (TSH), thyroxine (TT4), and free thyroxine index (FTI) as consistently significant biomarkers across all imputation methods. Five classifiers—Naïve Bayes, linear regression, support vector machines (SVM), LightGBM, and recurrent neural networks (RNN)—were assessed on imputed datasets, with performance evaluated through accuracy, F1 score, and recall. The KNN imputation method enhanced LightGBM’s accuracy by 0.47% over mean imputation (99.06% vs. 98.99%) and by 1.47% over cold deck (99.06% vs. 98.59%), demonstrating its superiority in preserving feature relationships and enhancing predictive power. LightGBM achieved the highest performance with KNN imputation (accuracy: 99.06%, F1: 97.57%, and recall: 97.83%), outperforming other classifiers by 2.5%–4.0% in accuracy. These results underscore the necessity of robust imputation techniques for reliable thyroid disease prediction. The study provides a reproducible framework for managing missing data in healthcare analytics, emphasizing the interplay between imputation, feature importance, and classifier selection to optimize diagnostic accuracy.http://dx.doi.org/10.1155/acis/2766701
spellingShingle Dhekre Saber Saleh
Mohd Shahizan Othman
Wshyar Omar Khudhur
Eman Attallah Aljabarti
Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis
Applied Computational Intelligence and Soft Computing
title Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis
title_full Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis
title_fullStr Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis
title_full_unstemmed Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis
title_short Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis
title_sort effective techniques for handling missing values in thyroid disease diagnosis a comparative analysis
url http://dx.doi.org/10.1155/acis/2766701
work_keys_str_mv AT dhekresabersaleh effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis
AT mohdshahizanothman effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis
AT wshyaromarkhudhur effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis
AT emanattallahaljabarti effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis