OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic Signatures

ABSTRACT Background Cancer of unknown primary remains a formidable diagnostic challenge due to the inability to pinpoint the primary tumor site, which restricts the use of targeted therapeutics. Although machine‐learning methods that integrate transcriptomic approaches have provided valuable insight...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang Hao, Haochun Huang, Daiyun Huang, Jianwen Ruan, Xin Liu, Jianquan Zhang
Format: Article
Language:English
Published: Wiley 2025-08-01
Series:Cancer Reports
Subjects:
Online Access:https://doi.org/10.1002/cnr2.70311
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849222376326168576
author Yang Hao
Haochun Huang
Daiyun Huang
Jianwen Ruan
Xin Liu
Jianquan Zhang
author_facet Yang Hao
Haochun Huang
Daiyun Huang
Jianwen Ruan
Xin Liu
Jianquan Zhang
author_sort Yang Hao
collection DOAJ
description ABSTRACT Background Cancer of unknown primary remains a formidable diagnostic challenge due to the inability to pinpoint the primary tumor site, which restricts the use of targeted therapeutics. Although machine‐learning methods that integrate transcriptomic approaches have provided valuable insights into tumor origins, they often face challenges in distinguishing biologically similar tumors and typically lack biological interpretability. Aims This study aims to develop a transparent and biologically interpretable machine learning framework to accurately classify tissue‐of‐origin across diverse cancer types, thereby facilitation clinical diagnosis. Methods We designed OncoTrace‐TOO, a novel tissue‐of‐origin classification model based on gene expression profiles. The model utilizes pan‐cancer discriminative molecular features identified through one‐vs‐rest differential expression analysis and applies logistic regression as the classification algorithm. Results OncoTrace‐TOO achieved an overall accuracy of 0.967, with perfect classification for seven cancer types (e.g., CHOL, DLBC, and LAML). The model demonstrated high predictive accuracy in both primary and metastatic cancers across TCGA and GEO validation datasets, with enhanced capability in resolving histologically related malignancies as well as classifying rare cancer subtypes. When applied to independent clinical tumor samples, the model achieved TOO prediction accuracies of 0.857, further validating its robustness. Importantly, the framework offers biologically interpretable predictions by revealing tumor‐specific molecular signatures, thus enhancing its clinical applicability. Conclusions OncoTrace‐TOO not only offers high predictive accuracy for tissue‐of‐origin classification, but also delivers biologically meaningful insights that support clinical decision‐making. This framework holds promise for improving diagnostic precision and guiding personalized treatment in challenging cancer cases.
format Article
id doaj-art-3d724b7c865347bb87083e443fe3cde1
institution Kabale University
issn 2573-8348
language English
publishDate 2025-08-01
publisher Wiley
record_format Article
series Cancer Reports
spelling doaj-art-3d724b7c865347bb87083e443fe3cde12025-08-26T06:00:41ZengWileyCancer Reports2573-83482025-08-0188n/an/a10.1002/cnr2.70311OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic SignaturesYang Hao0Haochun Huang1Daiyun Huang2Jianwen Ruan3Xin Liu4Jianquan Zhang5Hepatobiliary and Pancreatic Surgery Central South University Xiangya School of Medicine Affiliated Haikou Hospital Haikou ChinaDepartment of Biosciences and Bioinformatics Xi'an Jiaotong‐Liverpool University Suzhou ChinaWisdom Lake Academy of Pharmacy Xi'an Jiaotong‐Liverpool University Suzhou ChinaHepatology Central South University Xiangya School of Medicine Affiliated Haikou Hospital Haikou ChinaWisdom Lake Academy of Pharmacy Xi'an Jiaotong‐Liverpool University Suzhou ChinaHepatobiliary and Pancreatic Surgery Central South University Xiangya School of Medicine Affiliated Haikou Hospital Haikou ChinaABSTRACT Background Cancer of unknown primary remains a formidable diagnostic challenge due to the inability to pinpoint the primary tumor site, which restricts the use of targeted therapeutics. Although machine‐learning methods that integrate transcriptomic approaches have provided valuable insights into tumor origins, they often face challenges in distinguishing biologically similar tumors and typically lack biological interpretability. Aims This study aims to develop a transparent and biologically interpretable machine learning framework to accurately classify tissue‐of‐origin across diverse cancer types, thereby facilitation clinical diagnosis. Methods We designed OncoTrace‐TOO, a novel tissue‐of‐origin classification model based on gene expression profiles. The model utilizes pan‐cancer discriminative molecular features identified through one‐vs‐rest differential expression analysis and applies logistic regression as the classification algorithm. Results OncoTrace‐TOO achieved an overall accuracy of 0.967, with perfect classification for seven cancer types (e.g., CHOL, DLBC, and LAML). The model demonstrated high predictive accuracy in both primary and metastatic cancers across TCGA and GEO validation datasets, with enhanced capability in resolving histologically related malignancies as well as classifying rare cancer subtypes. When applied to independent clinical tumor samples, the model achieved TOO prediction accuracies of 0.857, further validating its robustness. Importantly, the framework offers biologically interpretable predictions by revealing tumor‐specific molecular signatures, thus enhancing its clinical applicability. Conclusions OncoTrace‐TOO not only offers high predictive accuracy for tissue‐of‐origin classification, but also delivers biologically meaningful insights that support clinical decision‐making. This framework holds promise for improving diagnostic precision and guiding personalized treatment in challenging cancer cases.https://doi.org/10.1002/cnr2.70311cancer of unknown primarymachine learningmetastasistissue‐of‐origin identificationtranscriptomics
spellingShingle Yang Hao
Haochun Huang
Daiyun Huang
Jianwen Ruan
Xin Liu
Jianquan Zhang
OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic Signatures
Cancer Reports
cancer of unknown primary
machine learning
metastasis
tissue‐of‐origin identification
transcriptomics
title OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic Signatures
title_full OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic Signatures
title_fullStr OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic Signatures
title_full_unstemmed OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic Signatures
title_short OncoTrace‐TOO: Interpretable Machine Learning Framework for Cancer Tissue‐of‐Origin Identification Using Transcriptomic Signatures
title_sort oncotrace too interpretable machine learning framework for cancer tissue of origin identification using transcriptomic signatures
topic cancer of unknown primary
machine learning
metastasis
tissue‐of‐origin identification
transcriptomics
url https://doi.org/10.1002/cnr2.70311
work_keys_str_mv AT yanghao oncotracetoointerpretablemachinelearningframeworkforcancertissueoforiginidentificationusingtranscriptomicsignatures
AT haochunhuang oncotracetoointerpretablemachinelearningframeworkforcancertissueoforiginidentificationusingtranscriptomicsignatures
AT daiyunhuang oncotracetoointerpretablemachinelearningframeworkforcancertissueoforiginidentificationusingtranscriptomicsignatures
AT jianwenruan oncotracetoointerpretablemachinelearningframeworkforcancertissueoforiginidentificationusingtranscriptomicsignatures
AT xinliu oncotracetoointerpretablemachinelearningframeworkforcancertissueoforiginidentificationusingtranscriptomicsignatures
AT jianquanzhang oncotracetoointerpretablemachinelearningframeworkforcancertissueoforiginidentificationusingtranscriptomicsignatures