Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning
Abstract Background Millions of people die from cancer every year. Early cancer detection is crucial for ensuring higher survival rates, as it provides an opportunity for timely medical interventions. This paper proposes a multi-level cancer classification system that uses plasma cfDNA/ctDNA mutatio...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-04-01
|
| Series: | BioData Mining |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13040-025-00439-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849726160664002560 |
|---|---|
| author | Amr Eledkawy Taher Hamza Sara El-Metwally |
| author_facet | Amr Eledkawy Taher Hamza Sara El-Metwally |
| author_sort | Amr Eledkawy |
| collection | DOAJ |
| description | Abstract Background Millions of people die from cancer every year. Early cancer detection is crucial for ensuring higher survival rates, as it provides an opportunity for timely medical interventions. This paper proposes a multi-level cancer classification system that uses plasma cfDNA/ctDNA mutations and protein biomarkers to identify seven distinct cancer types: colorectal, breast, upper gastrointestinal, lung, pancreas, ovarian, and liver. Results The proposed system employs a multi-stage binary classification framework where each stage is customized for a specific cancer type. A majority vote feature selection process is employed by combining six feature selectors: Information Value, Chi-Square, Random Forest Feature Importance, Extra Tree Feature Importance, Recursive Feature Elimination, and L1 Regularization. Following the feature selection process, classifiers—including eXtreme Gradient Boosting, Random Forest, Extra Tree, and Quadratic Discriminant Analysis—are customized for each cancer type individually or in an ensemble soft voting setup to optimize predictive accuracy. The proposed system outperformed previously published results, achieving an AUC of 98.2% and an accuracy of 96.21%. To ensure reproducibility of the results, the trained models and the dataset used in this study are made publicly available via the GitHub repository ( https://github.com/SaraEl-Metwally/Towards-Precision-Oncology ). Conclusion The identified biomarkers enhance the interpretability of the diagnosis, facilitating more informed decision-making. The system's performance underscores its effectiveness in tissue localization, contributing to improved patient outcomes through timely medical interventions. |
| format | Article |
| id | doaj-art-4734e790466a4667937736fc24ebccc7 |
| institution | DOAJ |
| issn | 1756-0381 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | BMC |
| record_format | Article |
| series | BioData Mining |
| spelling | doaj-art-4734e790466a4667937736fc24ebccc72025-08-20T03:10:17ZengBMCBioData Mining1756-03812025-04-0118114410.1186/s13040-025-00439-8Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learningAmr Eledkawy0Taher Hamza1Sara El-Metwally2Department of Computer Science, Faculty of Computers and Information, Mansoura UniversityDepartment of Computer Science, Faculty of Computers and Information, Mansoura UniversityDepartment of Computer Science, Faculty of Computers and Information, Mansoura UniversityAbstract Background Millions of people die from cancer every year. Early cancer detection is crucial for ensuring higher survival rates, as it provides an opportunity for timely medical interventions. This paper proposes a multi-level cancer classification system that uses plasma cfDNA/ctDNA mutations and protein biomarkers to identify seven distinct cancer types: colorectal, breast, upper gastrointestinal, lung, pancreas, ovarian, and liver. Results The proposed system employs a multi-stage binary classification framework where each stage is customized for a specific cancer type. A majority vote feature selection process is employed by combining six feature selectors: Information Value, Chi-Square, Random Forest Feature Importance, Extra Tree Feature Importance, Recursive Feature Elimination, and L1 Regularization. Following the feature selection process, classifiers—including eXtreme Gradient Boosting, Random Forest, Extra Tree, and Quadratic Discriminant Analysis—are customized for each cancer type individually or in an ensemble soft voting setup to optimize predictive accuracy. The proposed system outperformed previously published results, achieving an AUC of 98.2% and an accuracy of 96.21%. To ensure reproducibility of the results, the trained models and the dataset used in this study are made publicly available via the GitHub repository ( https://github.com/SaraEl-Metwally/Towards-Precision-Oncology ). Conclusion The identified biomarkers enhance the interpretability of the diagnosis, facilitating more informed decision-making. The system's performance underscores its effectiveness in tissue localization, contributing to improved patient outcomes through timely medical interventions.https://doi.org/10.1186/s13040-025-00439-8Multi-cancer classificationMajority vote feature selectionEnsemble learningLiquid biopsyCfDNA/ctDNAProtein biomarkers |
| spellingShingle | Amr Eledkawy Taher Hamza Sara El-Metwally Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning BioData Mining Multi-cancer classification Majority vote feature selection Ensemble learning Liquid biopsy CfDNA/ctDNA Protein biomarkers |
| title | Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning |
| title_full | Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning |
| title_fullStr | Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning |
| title_full_unstemmed | Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning |
| title_short | Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning |
| title_sort | towards precision oncology a multi level cancer classification system integrating liquid biopsy and machine learning |
| topic | Multi-cancer classification Majority vote feature selection Ensemble learning Liquid biopsy CfDNA/ctDNA Protein biomarkers |
| url | https://doi.org/10.1186/s13040-025-00439-8 |
| work_keys_str_mv | AT amreledkawy towardsprecisiononcologyamultilevelcancerclassificationsystemintegratingliquidbiopsyandmachinelearning AT taherhamza towardsprecisiononcologyamultilevelcancerclassificationsystemintegratingliquidbiopsyandmachinelearning AT saraelmetwally towardsprecisiononcologyamultilevelcancerclassificationsystemintegratingliquidbiopsyandmachinelearning |