Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis
Handling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 obser...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2025-01-01
|
| Series: | Applied Computational Intelligence and Soft Computing |
| Online Access: | http://dx.doi.org/10.1155/acis/2766701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850275738870087680 |
|---|---|
| author | Dhekre Saber Saleh Mohd Shahizan Othman Wshyar Omar Khudhur Eman Attallah Aljabarti |
| author_facet | Dhekre Saber Saleh Mohd Shahizan Othman Wshyar Omar Khudhur Eman Attallah Aljabarti |
| author_sort | Dhekre Saber Saleh |
| collection | DOAJ |
| description | Handling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 observations with 31 clinical features (5.2% missing values). Feature importance analysis identified thyroid-stimulating hormone (TSH), thyroxine (TT4), and free thyroxine index (FTI) as consistently significant biomarkers across all imputation methods. Five classifiers—Naïve Bayes, linear regression, support vector machines (SVM), LightGBM, and recurrent neural networks (RNN)—were assessed on imputed datasets, with performance evaluated through accuracy, F1 score, and recall. The KNN imputation method enhanced LightGBM’s accuracy by 0.47% over mean imputation (99.06% vs. 98.99%) and by 1.47% over cold deck (99.06% vs. 98.59%), demonstrating its superiority in preserving feature relationships and enhancing predictive power. LightGBM achieved the highest performance with KNN imputation (accuracy: 99.06%, F1: 97.57%, and recall: 97.83%), outperforming other classifiers by 2.5%–4.0% in accuracy. These results underscore the necessity of robust imputation techniques for reliable thyroid disease prediction. The study provides a reproducible framework for managing missing data in healthcare analytics, emphasizing the interplay between imputation, feature importance, and classifier selection to optimize diagnostic accuracy. |
| format | Article |
| id | doaj-art-ea4b3e9bc4594b5faf82354a66ab6f75 |
| institution | OA Journals |
| issn | 1687-9732 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | Applied Computational Intelligence and Soft Computing |
| spelling | doaj-art-ea4b3e9bc4594b5faf82354a66ab6f752025-08-20T01:50:37ZengWileyApplied Computational Intelligence and Soft Computing1687-97322025-01-01202510.1155/acis/2766701Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative AnalysisDhekre Saber Saleh0Mohd Shahizan Othman1Wshyar Omar Khudhur2Eman Attallah Aljabarti3Faculty of ComputingFaculty of ComputingDepartment of Information TechnologyFaculty of Computer Science & Information TechnologyHandling missing values presents a critical challenge in thyroid disease prediction, significantly impacting diagnostic accuracy. This study evaluates the effectiveness of cold-deck, mean, and K-nearest neighbor (KNN) imputation techniques for predicting thyroid disease using a dataset of 9172 observations with 31 clinical features (5.2% missing values). Feature importance analysis identified thyroid-stimulating hormone (TSH), thyroxine (TT4), and free thyroxine index (FTI) as consistently significant biomarkers across all imputation methods. Five classifiers—Naïve Bayes, linear regression, support vector machines (SVM), LightGBM, and recurrent neural networks (RNN)—were assessed on imputed datasets, with performance evaluated through accuracy, F1 score, and recall. The KNN imputation method enhanced LightGBM’s accuracy by 0.47% over mean imputation (99.06% vs. 98.99%) and by 1.47% over cold deck (99.06% vs. 98.59%), demonstrating its superiority in preserving feature relationships and enhancing predictive power. LightGBM achieved the highest performance with KNN imputation (accuracy: 99.06%, F1: 97.57%, and recall: 97.83%), outperforming other classifiers by 2.5%–4.0% in accuracy. These results underscore the necessity of robust imputation techniques for reliable thyroid disease prediction. The study provides a reproducible framework for managing missing data in healthcare analytics, emphasizing the interplay between imputation, feature importance, and classifier selection to optimize diagnostic accuracy.http://dx.doi.org/10.1155/acis/2766701 |
| spellingShingle | Dhekre Saber Saleh Mohd Shahizan Othman Wshyar Omar Khudhur Eman Attallah Aljabarti Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis Applied Computational Intelligence and Soft Computing |
| title | Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis |
| title_full | Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis |
| title_fullStr | Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis |
| title_full_unstemmed | Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis |
| title_short | Effective Techniques for Handling Missing Values in Thyroid Disease Diagnosis: A Comparative Analysis |
| title_sort | effective techniques for handling missing values in thyroid disease diagnosis a comparative analysis |
| url | http://dx.doi.org/10.1155/acis/2766701 |
| work_keys_str_mv | AT dhekresabersaleh effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis AT mohdshahizanothman effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis AT wshyaromarkhudhur effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis AT emanattallahaljabarti effectivetechniquesforhandlingmissingvaluesinthyroiddiseasediagnosisacomparativeanalysis |