Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA

Early cancer diagnosis from bisulfite‐treated cell‐free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep‐learning‐based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite‐treat...

Full description

Saved in:
Bibliographic Details
Main Authors: Jilei Liu, Hongru Shen, Yichen Yang, Meng Yang, Qiang Zhang, Kexin Chen, Xiangchun Li
Format: Article
Language:English
Published: Wiley 2024-11-01
Series:Molecular Oncology
Subjects:
Online Access:https://doi.org/10.1002/1878-0261.13745
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846172987554791424
author Jilei Liu
Hongru Shen
Yichen Yang
Meng Yang
Qiang Zhang
Kexin Chen
Xiangchun Li
author_facet Jilei Liu
Hongru Shen
Yichen Yang
Meng Yang
Qiang Zhang
Kexin Chen
Xiangchun Li
author_sort Jilei Liu
collection DOAJ
description Early cancer diagnosis from bisulfite‐treated cell‐free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep‐learning‐based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite‐treated cfDNA sequencing fragments. DECIDIA relies on transformer‐based representation learning of DNA fragments and weakly supervised multiple‐instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non‐cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976–0.984) in 10‐fold cross‐validation settings on the CRC dataset by differentiating cancer patients from cancer‐free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896–0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer‐free controls, although there was no HCC data used in model development. In the settings of cancer‐type classification, we observed that DECIDIA achieved a micro‐average AUROC of 0.963 (95% CI, 0.960–0.966) and an overall accuracy of 82.8% (95% CI, 81.8–83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite‐treated cfDNA methylome.
format Article
id doaj-art-cc67456d0b304a818b85b00963dc4443
institution Kabale University
issn 1574-7891
1878-0261
language English
publishDate 2024-11-01
publisher Wiley
record_format Article
series Molecular Oncology
spelling doaj-art-cc67456d0b304a818b85b00963dc44432024-11-08T18:26:20ZengWileyMolecular Oncology1574-78911878-02612024-11-0118112755276910.1002/1878-0261.13745Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNAJilei Liu0Hongru Shen1Yichen Yang2Meng Yang3Qiang Zhang4Kexin Chen5Xiangchun Li6Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaDepartment of Maxillofacial and Otorhinolaryngology Oncology, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaDepartment of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Prevention and Control of Major Diseases in the Population Ministry of Education, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaEarly cancer diagnosis from bisulfite‐treated cell‐free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep‐learning‐based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite‐treated cfDNA sequencing fragments. DECIDIA relies on transformer‐based representation learning of DNA fragments and weakly supervised multiple‐instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non‐cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976–0.984) in 10‐fold cross‐validation settings on the CRC dataset by differentiating cancer patients from cancer‐free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896–0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer‐free controls, although there was no HCC data used in model development. In the settings of cancer‐type classification, we observed that DECIDIA achieved a micro‐average AUROC of 0.963 (95% CI, 0.960–0.966) and an overall accuracy of 82.8% (95% CI, 81.8–83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite‐treated cfDNA methylome.https://doi.org/10.1002/1878-0261.13745cell‐free DNAearly cancer diagnosisweakly supervised learning
spellingShingle Jilei Liu
Hongru Shen
Yichen Yang
Meng Yang
Qiang Zhang
Kexin Chen
Xiangchun Li
Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA
Molecular Oncology
cell‐free DNA
early cancer diagnosis
weakly supervised learning
title Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA
title_full Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA
title_fullStr Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA
title_full_unstemmed Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA
title_short Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA
title_sort transformer based representation learning and multiple instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite treated plasma cell free dna
topic cell‐free DNA
early cancer diagnosis
weakly supervised learning
url https://doi.org/10.1002/1878-0261.13745
work_keys_str_mv AT jileiliu transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna
AT hongrushen transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna
AT yichenyang transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna
AT mengyang transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna
AT qiangzhang transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna
AT kexinchen transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna
AT xiangchunli transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna