TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis
Abstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close t...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | Genome Biology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13059-025-03685-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849343856588357632 |
|---|---|
| author | Huaming Wen Jinbao Yang Xianjia Zhao Xingbin Wang Jiawei Lei Yanchun Li Wenjie Du Dongxi Li Yun Xu Stefano Lonardi Weihua Pan |
| author_facet | Huaming Wen Jinbao Yang Xianjia Zhao Xingbin Wang Jiawei Lei Yanchun Li Wenjie Du Dongxi Li Yun Xu Stefano Lonardi Weihua Pan |
| author_sort | Huaming Wen |
| collection | DOAJ |
| description | Abstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close the gaps in a draft chromosome-level assembly using exclusively PacBio HiFi and Hi-C data. Experimental results on human centromeres and tomato subtelomeres show that TRFill can improve the completeness and correctness of about two-thirds of the tandem repeats. We also show that the improved completeness of subtelomeric tandem repeats in the tomato pangenome enables a population-level analysis of these complex repeats. |
| format | Article |
| id | doaj-art-e8f29a89d7774cf4930cf82c97a202b6 |
| institution | Kabale University |
| issn | 1474-760X |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | Genome Biology |
| spelling | doaj-art-e8f29a89d7774cf4930cf82c97a202b62025-08-20T03:42:49ZengBMCGenome Biology1474-760X2025-07-0126112710.1186/s13059-025-03685-5TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysisHuaming Wen0Jinbao Yang1Xianjia Zhao2Xingbin Wang3Jiawei Lei4Yanchun Li5Wenjie Du6Dongxi Li7Yun Xu8Stefano Lonardi9Weihua Pan10School of Computer Science and Technology, University of Science and Technology of ChinaState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesCollege of Computer Science and Technology, Taiyuan University of TechnologySchool of Computer Science and Technology, University of Science and Technology of ChinaDepartment of Computer Science and Engineering, University of CaliforniaState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesAbstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close the gaps in a draft chromosome-level assembly using exclusively PacBio HiFi and Hi-C data. Experimental results on human centromeres and tomato subtelomeres show that TRFill can improve the completeness and correctness of about two-thirds of the tandem repeats. We also show that the improved completeness of subtelomeric tandem repeats in the tomato pangenome enables a population-level analysis of these complex repeats.https://doi.org/10.1186/s13059-025-03685-5Genome assemblyGap fillingReference-guided genome assemblyTandem repeatsSegmental duplications |
| spellingShingle | Huaming Wen Jinbao Yang Xianjia Zhao Xingbin Wang Jiawei Lei Yanchun Li Wenjie Du Dongxi Li Yun Xu Stefano Lonardi Weihua Pan TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis Genome Biology Genome assembly Gap filling Reference-guided genome assembly Tandem repeats Segmental duplications |
| title | TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis |
| title_full | TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis |
| title_fullStr | TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis |
| title_full_unstemmed | TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis |
| title_short | TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis |
| title_sort | trfill synergistic use of hifi and hi c sequencing enables accurate assembly of tandem repeats for population level analysis |
| topic | Genome assembly Gap filling Reference-guided genome assembly Tandem repeats Segmental duplications |
| url | https://doi.org/10.1186/s13059-025-03685-5 |
| work_keys_str_mv | AT huamingwen trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT jinbaoyang trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT xianjiazhao trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT xingbinwang trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT jiaweilei trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT yanchunli trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT wenjiedu trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT dongxili trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT yunxu trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT stefanolonardi trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis AT weihuapan trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis |