REPrise: de novo interspersed repeat detection using inexact seeding

Abstract Background Interspersed repeats occupy a large part of many eukaryotic genomes, and thus their accurate annotation is essential for various genome analyses. Database-free de novo repeat detection approaches are powerful for annotating genomes that lack well-curated repeat databases. However...

Full description

Saved in:
Bibliographic Details
Main Authors: Atsushi Takeda, Daisuke Nonaka, Yuta Imazu, Tsukasa Fukunaga, Michiaki Hamada
Format: Article
Language:English
Published: BMC 2025-04-01
Series:Mobile DNA
Subjects:
Online Access:https://doi.org/10.1186/s13100-025-00353-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Interspersed repeats occupy a large part of many eukaryotic genomes, and thus their accurate annotation is essential for various genome analyses. Database-free de novo repeat detection approaches are powerful for annotating genomes that lack well-curated repeat databases. However, existing tools do not yet have sufficient repeat detection performance. Results In this study, we developed REPrise, a de novo interspersed repeat detection software program based on a seed-and-extension method. Although the algorithm of REPrise is similar to that of RepeatScout, which is currently the de facto standard tool, we incorporated three unique techniques into REPrise: inexact seeding, affine gap scoring and loose masking. Analyses of rice and simulation genome datasets showed that REPrise outperformed RepeatScout in terms of sensitivity, especially when the repeat sequences contained many mutations. Furthermore, when applied to the complete human genome dataset T2T-CHM13, REPrise demonstrated the potential to detect novel repeat sequence families. Conclusion REPrise can detect interspersed repeats with high sensitivity even in long genomes. Our software enhances repeat annotation in diverse genomic studies, contributing to a deeper understanding of genomic structures.
ISSN:1759-8753