Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA
Early cancer diagnosis from bisulfite‐treated cell‐free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep‐learning‐based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite‐treat...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2024-11-01
|
| Series: | Molecular Oncology |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/1878-0261.13745 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846172987554791424 |
|---|---|
| author | Jilei Liu Hongru Shen Yichen Yang Meng Yang Qiang Zhang Kexin Chen Xiangchun Li |
| author_facet | Jilei Liu Hongru Shen Yichen Yang Meng Yang Qiang Zhang Kexin Chen Xiangchun Li |
| author_sort | Jilei Liu |
| collection | DOAJ |
| description | Early cancer diagnosis from bisulfite‐treated cell‐free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep‐learning‐based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite‐treated cfDNA sequencing fragments. DECIDIA relies on transformer‐based representation learning of DNA fragments and weakly supervised multiple‐instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non‐cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976–0.984) in 10‐fold cross‐validation settings on the CRC dataset by differentiating cancer patients from cancer‐free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896–0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer‐free controls, although there was no HCC data used in model development. In the settings of cancer‐type classification, we observed that DECIDIA achieved a micro‐average AUROC of 0.963 (95% CI, 0.960–0.966) and an overall accuracy of 82.8% (95% CI, 81.8–83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite‐treated cfDNA methylome. |
| format | Article |
| id | doaj-art-cc67456d0b304a818b85b00963dc4443 |
| institution | Kabale University |
| issn | 1574-7891 1878-0261 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Wiley |
| record_format | Article |
| series | Molecular Oncology |
| spelling | doaj-art-cc67456d0b304a818b85b00963dc44432024-11-08T18:26:20ZengWileyMolecular Oncology1574-78911878-02612024-11-0118112755276910.1002/1878-0261.13745Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNAJilei Liu0Hongru Shen1Yichen Yang2Meng Yang3Qiang Zhang4Kexin Chen5Xiangchun Li6Tianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaDepartment of Maxillofacial and Otorhinolaryngology Oncology, Tianjin's Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaDepartment of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Prevention and Control of Major Diseases in the Population Ministry of Education, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaTianjin Cancer Institute, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin Medical University Cancer Institute and Hospital Tianjin Medical University ChinaEarly cancer diagnosis from bisulfite‐treated cell‐free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep‐learning‐based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite‐treated cfDNA sequencing fragments. DECIDIA relies on transformer‐based representation learning of DNA fragments and weakly supervised multiple‐instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non‐cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976–0.984) in 10‐fold cross‐validation settings on the CRC dataset by differentiating cancer patients from cancer‐free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896–0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer‐free controls, although there was no HCC data used in model development. In the settings of cancer‐type classification, we observed that DECIDIA achieved a micro‐average AUROC of 0.963 (95% CI, 0.960–0.966) and an overall accuracy of 82.8% (95% CI, 81.8–83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite‐treated cfDNA methylome.https://doi.org/10.1002/1878-0261.13745cell‐free DNAearly cancer diagnosisweakly supervised learning |
| spellingShingle | Jilei Liu Hongru Shen Yichen Yang Meng Yang Qiang Zhang Kexin Chen Xiangchun Li Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA Molecular Oncology cell‐free DNA early cancer diagnosis weakly supervised learning |
| title | Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA |
| title_full | Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA |
| title_fullStr | Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA |
| title_full_unstemmed | Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA |
| title_short | Transformer‐based representation learning and multiple‐instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite‐treated plasma cell‐free DNA |
| title_sort | transformer based representation learning and multiple instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite treated plasma cell free dna |
| topic | cell‐free DNA early cancer diagnosis weakly supervised learning |
| url | https://doi.org/10.1002/1878-0261.13745 |
| work_keys_str_mv | AT jileiliu transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna AT hongrushen transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna AT yichenyang transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna AT mengyang transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna AT qiangzhang transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna AT kexinchen transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna AT xiangchunli transformerbasedrepresentationlearningandmultipleinstancelearningforcancerdiagnosisexclusivelyfromrawsequencingfragmentsofbisulfitetreatedplasmacellfreedna |