TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis

Abstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close t...

Full description

Saved in:
Bibliographic Details
Main Authors: Huaming Wen, Jinbao Yang, Xianjia Zhao, Xingbin Wang, Jiawei Lei, Yanchun Li, Wenjie Du, Dongxi Li, Yun Xu, Stefano Lonardi, Weihua Pan
Format: Article
Language:English
Published: BMC 2025-07-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-025-03685-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849343856588357632
author Huaming Wen
Jinbao Yang
Xianjia Zhao
Xingbin Wang
Jiawei Lei
Yanchun Li
Wenjie Du
Dongxi Li
Yun Xu
Stefano Lonardi
Weihua Pan
author_facet Huaming Wen
Jinbao Yang
Xianjia Zhao
Xingbin Wang
Jiawei Lei
Yanchun Li
Wenjie Du
Dongxi Li
Yun Xu
Stefano Lonardi
Weihua Pan
author_sort Huaming Wen
collection DOAJ
description Abstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close the gaps in a draft chromosome-level assembly using exclusively PacBio HiFi and Hi-C data. Experimental results on human centromeres and tomato subtelomeres show that TRFill can improve the completeness and correctness of about two-thirds of the tandem repeats. We also show that the improved completeness of subtelomeric tandem repeats in the tomato pangenome enables a population-level analysis of these complex repeats.
format Article
id doaj-art-e8f29a89d7774cf4930cf82c97a202b6
institution Kabale University
issn 1474-760X
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj-art-e8f29a89d7774cf4930cf82c97a202b62025-08-20T03:42:49ZengBMCGenome Biology1474-760X2025-07-0126112710.1186/s13059-025-03685-5TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysisHuaming Wen0Jinbao Yang1Xianjia Zhao2Xingbin Wang3Jiawei Lei4Yanchun Li5Wenjie Du6Dongxi Li7Yun Xu8Stefano Lonardi9Weihua Pan10School of Computer Science and Technology, University of Science and Technology of ChinaState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesCollege of Computer Science and Technology, Taiyuan University of TechnologySchool of Computer Science and Technology, University of Science and Technology of ChinaDepartment of Computer Science and Engineering, University of CaliforniaState Key Laboratory of Genome and Multi-Omics Technologies, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural SciencesAbstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close the gaps in a draft chromosome-level assembly using exclusively PacBio HiFi and Hi-C data. Experimental results on human centromeres and tomato subtelomeres show that TRFill can improve the completeness and correctness of about two-thirds of the tandem repeats. We also show that the improved completeness of subtelomeric tandem repeats in the tomato pangenome enables a population-level analysis of these complex repeats.https://doi.org/10.1186/s13059-025-03685-5Genome assemblyGap fillingReference-guided genome assemblyTandem repeatsSegmental duplications
spellingShingle Huaming Wen
Jinbao Yang
Xianjia Zhao
Xingbin Wang
Jiawei Lei
Yanchun Li
Wenjie Du
Dongxi Li
Yun Xu
Stefano Lonardi
Weihua Pan
TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis
Genome Biology
Genome assembly
Gap filling
Reference-guided genome assembly
Tandem repeats
Segmental duplications
title TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis
title_full TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis
title_fullStr TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis
title_full_unstemmed TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis
title_short TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis
title_sort trfill synergistic use of hifi and hi c sequencing enables accurate assembly of tandem repeats for population level analysis
topic Genome assembly
Gap filling
Reference-guided genome assembly
Tandem repeats
Segmental duplications
url https://doi.org/10.1186/s13059-025-03685-5
work_keys_str_mv AT huamingwen trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT jinbaoyang trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT xianjiazhao trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT xingbinwang trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT jiaweilei trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT yanchunli trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT wenjiedu trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT dongxili trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT yunxu trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT stefanolonardi trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis
AT weihuapan trfillsynergisticuseofhifiandhicsequencingenablesaccurateassemblyoftandemrepeatsforpopulationlevelanalysis