Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique
Colorectal cancer is a significant global health issue, ranking as the third most common cancer and the second leading cause of cancer-related deaths worldwide. Early diagnosis of this disease is of utmost importance to increase the survival rate and enhance the healthcare system. Many machine learn...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2025-01-01
|
Series: | Applied Computational Intelligence and Soft Computing |
Online Access: | http://dx.doi.org/10.1155/acis/6552580 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825206957664894976 |
---|---|
author | Ishrat Zahan Tani Kah Ong Michael Goh Md Nazmul Islam Md Tarek Aziz S. M. Hasan Mahmud Dip Nandi |
author_facet | Ishrat Zahan Tani Kah Ong Michael Goh Md Nazmul Islam Md Tarek Aziz S. M. Hasan Mahmud Dip Nandi |
author_sort | Ishrat Zahan Tani |
collection | DOAJ |
description | Colorectal cancer is a significant global health issue, ranking as the third most common cancer and the second leading cause of cancer-related deaths worldwide. Early diagnosis of this disease is of utmost importance to increase the survival rate and enhance the healthcare system. Many machine learning (ML) and deep learning (DL) methods have been proposed to facilitate automated early diagnosis of this cancer. However, label noise in medical images and the dependence on a single model can lead to suboptimal model performance, which could potentially hinder the development of a sophisticated automated solution. In this paper, we address label noise in training data and propose a stacking-ensemble model for classifying colorectal cancer along with a trustworthy computer-aided diagnosis (CAD) system. Initially, a variety of filtering methods are extensively analyzed to determine the most suitable image representation, with subsequent data augmentation techniques. Second, a modified VGG-16 model was proposed with fine-tuning that was utilized as a feature extractor to extract meaningful features from the training samples. Third, a prediction uncertainty and probabilistic local outlier factor (pLOF) were applied to the extracted features to address the label noise issue in the training data. Fourth, we adopted a random forest–based recursive feature elimination (RF-RFE) feature selection method with various combinations of features to recursively select the most influential ones for accurate predictions. Fifth, four base ML classifiers and a metamodel were selected to build our final stacking-ensemble model, which integrates the prediction probabilities of multiple models into a meta-feature set to ensure trustworthy predictions. Finally, we integrated these strategies and deployed them into a web application to demonstrate a CAD system. This system not only predicts the disease but also generates the prediction probabilities of each class, which enhances both clarity and diagnostic insight. Our proposed model was compared with different state-of-the-art ML classifiers on a publicly available dataset and demonstrated the highest accuracy of 92.43%. |
format | Article |
id | doaj-art-811ea39438d74094a468e19ace8108fd |
institution | Kabale University |
issn | 1687-9732 |
language | English |
publishDate | 2025-01-01 |
publisher | Wiley |
record_format | Article |
series | Applied Computational Intelligence and Soft Computing |
spelling | doaj-art-811ea39438d74094a468e19ace8108fd2025-02-07T00:47:33ZengWileyApplied Computational Intelligence and Soft Computing1687-97322025-01-01202510.1155/acis/6552580Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble TechniqueIshrat Zahan Tani0Kah Ong Michael Goh1Md Nazmul Islam2Md Tarek Aziz3S. M. Hasan Mahmud4Dip Nandi5Department of Computer Science & EngineeringFaculty of Information Science & Technology (FIST)Department of Computer Science & EngineeringCentre for Advanced Machine Learning and Applications (CAMLAs)Centre for Advanced Machine Learning and Applications (CAMLAs)Department of Computer ScienceColorectal cancer is a significant global health issue, ranking as the third most common cancer and the second leading cause of cancer-related deaths worldwide. Early diagnosis of this disease is of utmost importance to increase the survival rate and enhance the healthcare system. Many machine learning (ML) and deep learning (DL) methods have been proposed to facilitate automated early diagnosis of this cancer. However, label noise in medical images and the dependence on a single model can lead to suboptimal model performance, which could potentially hinder the development of a sophisticated automated solution. In this paper, we address label noise in training data and propose a stacking-ensemble model for classifying colorectal cancer along with a trustworthy computer-aided diagnosis (CAD) system. Initially, a variety of filtering methods are extensively analyzed to determine the most suitable image representation, with subsequent data augmentation techniques. Second, a modified VGG-16 model was proposed with fine-tuning that was utilized as a feature extractor to extract meaningful features from the training samples. Third, a prediction uncertainty and probabilistic local outlier factor (pLOF) were applied to the extracted features to address the label noise issue in the training data. Fourth, we adopted a random forest–based recursive feature elimination (RF-RFE) feature selection method with various combinations of features to recursively select the most influential ones for accurate predictions. Fifth, four base ML classifiers and a metamodel were selected to build our final stacking-ensemble model, which integrates the prediction probabilities of multiple models into a meta-feature set to ensure trustworthy predictions. Finally, we integrated these strategies and deployed them into a web application to demonstrate a CAD system. This system not only predicts the disease but also generates the prediction probabilities of each class, which enhances both clarity and diagnostic insight. Our proposed model was compared with different state-of-the-art ML classifiers on a publicly available dataset and demonstrated the highest accuracy of 92.43%.http://dx.doi.org/10.1155/acis/6552580 |
spellingShingle | Ishrat Zahan Tani Kah Ong Michael Goh Md Nazmul Islam Md Tarek Aziz S. M. Hasan Mahmud Dip Nandi Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique Applied Computational Intelligence and Soft Computing |
title | Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique |
title_full | Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique |
title_fullStr | Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique |
title_full_unstemmed | Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique |
title_short | Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique |
title_sort | addressing label noise in colorectal cancer classification using cross entropy loss and plof methods with stacking ensemble technique |
url | http://dx.doi.org/10.1155/acis/6552580 |
work_keys_str_mv | AT ishratzahantani addressinglabelnoiseincolorectalcancerclassificationusingcrossentropylossandplofmethodswithstackingensembletechnique AT kahongmichaelgoh addressinglabelnoiseincolorectalcancerclassificationusingcrossentropylossandplofmethodswithstackingensembletechnique AT mdnazmulislam addressinglabelnoiseincolorectalcancerclassificationusingcrossentropylossandplofmethodswithstackingensembletechnique AT mdtarekaziz addressinglabelnoiseincolorectalcancerclassificationusingcrossentropylossandplofmethodswithstackingensembletechnique AT smhasanmahmud addressinglabelnoiseincolorectalcancerclassificationusingcrossentropylossandplofmethodswithstackingensembletechnique AT dipnandi addressinglabelnoiseincolorectalcancerclassificationusingcrossentropylossandplofmethodswithstackingensembletechnique |