A Novel Aggregated Multiple Imputation Approach for Enhanced Survival Prediction and Classification on Breast Cancer and Lung Cancer Data
Survival analysis is the method of finding the time of occurrence of an event. Survival analysis is used as a prognostic tool in healthcare especially in diagnosing cancer. Any healthcare data suffers with the missing data problem, survival data is not an exception. Data imputation is the way of han...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10798426/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Survival analysis is the method of finding the time of occurrence of an event. Survival analysis is used as a prognostic tool in healthcare especially in diagnosing cancer. Any healthcare data suffers with the missing data problem, survival data is not an exception. Data imputation is the way of handling missing data. In this paper we propose an Aggregated Multiple Imputation technique (AMI) which imputes the missing data with three base imputation techniques like mean imputation, K-nearest neighbour (kNN) imputation and iterative imputation. These techniques were combined by weighted average approach. AMI makes use of the advantages of each method to create imputed values that are more accurate and dependable by using a weighted average approach. The proposed method generates multiple datasets while applying the base imputation techniques. The imputed datasets are then combined using a weighted average, resulting in generation of reliable data by reducing the bias and improving the precision of the imputed value. Breast cancer and lung cancer data from the Surveillance, Epidemiology, and End Results (SEER) program is used for validation of the proposed technique. The imputed data improves the performance of various classifiers and survival prediction models in predicting the overall survival of the cancer patients. The results show that the data imputed using AMI approach improves the performance of the various classifiers and the survival prediction models, compared to the data imputed using the single imputation method. The highest accuracy achieved using the dataset is 91% and the least accuracy is 76% for breast cancer data. The highest accuracy achieved using the dataset is 87% and the least accuracy is 72% for lung cancer data. |
|---|---|
| ISSN: | 2169-3536 |