Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning

Abstract Background Millions of people die from cancer every year. Early cancer detection is crucial for ensuring higher survival rates, as it provides an opportunity for timely medical interventions. This paper proposes a multi-level cancer classification system that uses plasma cfDNA/ctDNA mutatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Amr Eledkawy, Taher Hamza, Sara El-Metwally
Format: Article
Language:English
Published: BMC 2025-04-01
Series:BioData Mining
Subjects:
Online Access:https://doi.org/10.1186/s13040-025-00439-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849726160664002560
author Amr Eledkawy
Taher Hamza
Sara El-Metwally
author_facet Amr Eledkawy
Taher Hamza
Sara El-Metwally
author_sort Amr Eledkawy
collection DOAJ
description Abstract Background Millions of people die from cancer every year. Early cancer detection is crucial for ensuring higher survival rates, as it provides an opportunity for timely medical interventions. This paper proposes a multi-level cancer classification system that uses plasma cfDNA/ctDNA mutations and protein biomarkers to identify seven distinct cancer types: colorectal, breast, upper gastrointestinal, lung, pancreas, ovarian, and liver. Results The proposed system employs a multi-stage binary classification framework where each stage is customized for a specific cancer type. A majority vote feature selection process is employed by combining six feature selectors: Information Value, Chi-Square, Random Forest Feature Importance, Extra Tree Feature Importance, Recursive Feature Elimination, and L1 Regularization. Following the feature selection process, classifiers—including eXtreme Gradient Boosting, Random Forest, Extra Tree, and Quadratic Discriminant Analysis—are customized for each cancer type individually or in an ensemble soft voting setup to optimize predictive accuracy. The proposed system outperformed previously published results, achieving an AUC of 98.2% and an accuracy of 96.21%. To ensure reproducibility of the results, the trained models and the dataset used in this study are made publicly available via the GitHub repository ( https://github.com/SaraEl-Metwally/Towards-Precision-Oncology ). Conclusion The identified biomarkers enhance the interpretability of the diagnosis, facilitating more informed decision-making. The system's performance underscores its effectiveness in tissue localization, contributing to improved patient outcomes through timely medical interventions.
format Article
id doaj-art-4734e790466a4667937736fc24ebccc7
institution DOAJ
issn 1756-0381
language English
publishDate 2025-04-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj-art-4734e790466a4667937736fc24ebccc72025-08-20T03:10:17ZengBMCBioData Mining1756-03812025-04-0118114410.1186/s13040-025-00439-8Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learningAmr Eledkawy0Taher Hamza1Sara El-Metwally2Department of Computer Science, Faculty of Computers and Information, Mansoura UniversityDepartment of Computer Science, Faculty of Computers and Information, Mansoura UniversityDepartment of Computer Science, Faculty of Computers and Information, Mansoura UniversityAbstract Background Millions of people die from cancer every year. Early cancer detection is crucial for ensuring higher survival rates, as it provides an opportunity for timely medical interventions. This paper proposes a multi-level cancer classification system that uses plasma cfDNA/ctDNA mutations and protein biomarkers to identify seven distinct cancer types: colorectal, breast, upper gastrointestinal, lung, pancreas, ovarian, and liver. Results The proposed system employs a multi-stage binary classification framework where each stage is customized for a specific cancer type. A majority vote feature selection process is employed by combining six feature selectors: Information Value, Chi-Square, Random Forest Feature Importance, Extra Tree Feature Importance, Recursive Feature Elimination, and L1 Regularization. Following the feature selection process, classifiers—including eXtreme Gradient Boosting, Random Forest, Extra Tree, and Quadratic Discriminant Analysis—are customized for each cancer type individually or in an ensemble soft voting setup to optimize predictive accuracy. The proposed system outperformed previously published results, achieving an AUC of 98.2% and an accuracy of 96.21%. To ensure reproducibility of the results, the trained models and the dataset used in this study are made publicly available via the GitHub repository ( https://github.com/SaraEl-Metwally/Towards-Precision-Oncology ). Conclusion The identified biomarkers enhance the interpretability of the diagnosis, facilitating more informed decision-making. The system's performance underscores its effectiveness in tissue localization, contributing to improved patient outcomes through timely medical interventions.https://doi.org/10.1186/s13040-025-00439-8Multi-cancer classificationMajority vote feature selectionEnsemble learningLiquid biopsyCfDNA/ctDNAProtein biomarkers
spellingShingle Amr Eledkawy
Taher Hamza
Sara El-Metwally
Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning
BioData Mining
Multi-cancer classification
Majority vote feature selection
Ensemble learning
Liquid biopsy
CfDNA/ctDNA
Protein biomarkers
title Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning
title_full Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning
title_fullStr Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning
title_full_unstemmed Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning
title_short Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning
title_sort towards precision oncology a multi level cancer classification system integrating liquid biopsy and machine learning
topic Multi-cancer classification
Majority vote feature selection
Ensemble learning
Liquid biopsy
CfDNA/ctDNA
Protein biomarkers
url https://doi.org/10.1186/s13040-025-00439-8
work_keys_str_mv AT amreledkawy towardsprecisiononcologyamultilevelcancerclassificationsystemintegratingliquidbiopsyandmachinelearning
AT taherhamza towardsprecisiononcologyamultilevelcancerclassificationsystemintegratingliquidbiopsyandmachinelearning
AT saraelmetwally towardsprecisiononcologyamultilevelcancerclassificationsystemintegratingliquidbiopsyandmachinelearning