A versatile CRISPR/Cas9 system off-target prediction tool using language model

Abstract Genome editing with the CRISPR/Cas9 system has revolutionized life and medical sciences, particularly in treating monogenic genetic diseases by enabling long-term therapeutic effects from a single intervention. However, the CRISPR/Cas9 system can tolerate mismatches and DNA/RNA bulges at ta...

Full description

Saved in:
Bibliographic Details
Main Authors: Weian Du, Liang Zhao, Kaichuan Diao, Yangyang Zheng, Qianyong Yang, Zhenzhen Zhu, Xiangxing Zhu, Dongsheng Tang
Format: Article
Language:English
Published: Nature Portfolio 2025-06-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-025-08275-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849434147975593984
author Weian Du
Liang Zhao
Kaichuan Diao
Yangyang Zheng
Qianyong Yang
Zhenzhen Zhu
Xiangxing Zhu
Dongsheng Tang
author_facet Weian Du
Liang Zhao
Kaichuan Diao
Yangyang Zheng
Qianyong Yang
Zhenzhen Zhu
Xiangxing Zhu
Dongsheng Tang
author_sort Weian Du
collection DOAJ
description Abstract Genome editing with the CRISPR/Cas9 system has revolutionized life and medical sciences, particularly in treating monogenic genetic diseases by enabling long-term therapeutic effects from a single intervention. However, the CRISPR/Cas9 system can tolerate mismatches and DNA/RNA bulges at target sites, leading to unintended off-target effects that pose challenges for gene-editing therapy development. Existing high-throughput detection and in silico prediction methods are often limited to specifically designed single guide RNAs (sgRNAs) and perform poorly on unseen sequences. To address these limitations, we introduce CCLMoff, a deep learning framework for off-target prediction that incorporates a pretrained RNA language model from RNAcentral. CCLMoff captures mutual sequence information between sgRNAs and target sites and is trained on a comprehensive, updated dataset. This approach enables accurate off-target identification and strong generalization across diverse NGS-based detection datasets. Model interpretation reveals the biological importance of the seed region, underscoring CCLMoff’s analytical capabilities. The development of CCLMoff lays the foundation for a comprehensive, end-to-end sgRNA design platform, enhancing both the precision and efficiency of CRISPR/Cas9-based therapeutics. CCLMoff is a versatile tool and is publicly available at github.com/duwa2/CCLMoff .
format Article
id doaj-art-94394bd412844c9ea4035b7bb00af13f
institution Kabale University
issn 2399-3642
language English
publishDate 2025-06-01
publisher Nature Portfolio
record_format Article
series Communications Biology
spelling doaj-art-94394bd412844c9ea4035b7bb00af13f2025-08-20T03:26:47ZengNature PortfolioCommunications Biology2399-36422025-06-018111010.1038/s42003-025-08275-6A versatile CRISPR/Cas9 system off-target prediction tool using language modelWeian Du0Liang Zhao1Kaichuan Diao2Yangyang Zheng3Qianyong Yang4Zhenzhen Zhu5Xiangxing Zhu6Dongsheng Tang7Gene Editing Technology Center of Guangdong Province, School of Medicine, Foshan UniversityShenzhen Health Development Research and Data Management CenterShenzhen Center for Chronic Disease ControlGuangdong Homy Genetics LtdJiujiang Key Laboratory of Rare Disease Research, Jiujiang UniversityShenzhen Health Development Research and Data Management CenterGene Editing Technology Center of Guangdong Province, School of Medicine, Foshan UniversityGene Editing Technology Center of Guangdong Province, School of Medicine, Foshan UniversityAbstract Genome editing with the CRISPR/Cas9 system has revolutionized life and medical sciences, particularly in treating monogenic genetic diseases by enabling long-term therapeutic effects from a single intervention. However, the CRISPR/Cas9 system can tolerate mismatches and DNA/RNA bulges at target sites, leading to unintended off-target effects that pose challenges for gene-editing therapy development. Existing high-throughput detection and in silico prediction methods are often limited to specifically designed single guide RNAs (sgRNAs) and perform poorly on unseen sequences. To address these limitations, we introduce CCLMoff, a deep learning framework for off-target prediction that incorporates a pretrained RNA language model from RNAcentral. CCLMoff captures mutual sequence information between sgRNAs and target sites and is trained on a comprehensive, updated dataset. This approach enables accurate off-target identification and strong generalization across diverse NGS-based detection datasets. Model interpretation reveals the biological importance of the seed region, underscoring CCLMoff’s analytical capabilities. The development of CCLMoff lays the foundation for a comprehensive, end-to-end sgRNA design platform, enhancing both the precision and efficiency of CRISPR/Cas9-based therapeutics. CCLMoff is a versatile tool and is publicly available at github.com/duwa2/CCLMoff .https://doi.org/10.1038/s42003-025-08275-6
spellingShingle Weian Du
Liang Zhao
Kaichuan Diao
Yangyang Zheng
Qianyong Yang
Zhenzhen Zhu
Xiangxing Zhu
Dongsheng Tang
A versatile CRISPR/Cas9 system off-target prediction tool using language model
Communications Biology
title A versatile CRISPR/Cas9 system off-target prediction tool using language model
title_full A versatile CRISPR/Cas9 system off-target prediction tool using language model
title_fullStr A versatile CRISPR/Cas9 system off-target prediction tool using language model
title_full_unstemmed A versatile CRISPR/Cas9 system off-target prediction tool using language model
title_short A versatile CRISPR/Cas9 system off-target prediction tool using language model
title_sort versatile crispr cas9 system off target prediction tool using language model
url https://doi.org/10.1038/s42003-025-08275-6
work_keys_str_mv AT weiandu aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT liangzhao aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT kaichuandiao aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT yangyangzheng aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT qianyongyang aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT zhenzhenzhu aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT xiangxingzhu aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT dongshengtang aversatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT weiandu versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT liangzhao versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT kaichuandiao versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT yangyangzheng versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT qianyongyang versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT zhenzhenzhu versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT xiangxingzhu versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel
AT dongshengtang versatilecrisprcas9systemofftargetpredictiontoolusinglanguagemodel