Multi-proteins similarity-based sampling to select representative genomes from large databases

Abstract Background Genome sequence databases are growing exponentially, but with high redundancy and uneven data quality. For these reasons, selecting representative subsets of genomes is an essential step for almost all studies. However, most current sampling approaches are biased and unable to pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Rémi-Vinh Coudert, Jean-Philippe Charrier, Frédéric Jauffrit, Jean-Pierre Flandrois, Céline Brochier-Armanet
Format: Article
Language:English
Published: BMC 2025-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06095-3
Tags: Add Tag
No Tags, Be the first to tag this record!