Revisiting pangenome openness with k-mers
Pangenomics is the study of related genomes collectively, usually from the same species or closely related taxa. Originally, pangenomes were defined for bacterial species. After the concept was extended to eukaryotic genomes, two definitions of pangenome evolved in parallel: the gene-based approach,...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Peer Community In
2024-04-01
|
Series: | Peer Community Journal |
Online Access: | https://peercommunityjournal.org/articles/10.24072/pcjournal.415/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825206400085655552 |
---|---|
author | Parmigiani, Luca Wittler, Roland Stoye, Jens |
author_facet | Parmigiani, Luca Wittler, Roland Stoye, Jens |
author_sort | Parmigiani, Luca |
collection | DOAJ |
description | Pangenomics is the study of related genomes collectively, usually from the same species or closely related taxa. Originally, pangenomes were defined for bacterial species. After the concept was extended to eukaryotic genomes, two definitions of pangenome evolved in parallel: the gene-based approach, which defines the pangenome as the union of all genes, and the sequence-based approach, which defines the pangenome as the set of all nonredundant genomic sequences. Estimating the total size of the pangenome for a given species has been subject of study since the very first mention of pangenomes. Traditionally, this is performed by predicting the ratio at which new genes are discovered, referred to as the openness of the species. Here, we abstract each genome as a set of items, which is entirely agnostic of the two approaches (gene-based, sequence-based). Genes are a viable option for items, but also other possibilities are feasible, e.g., genome sequence substrings of fixed length k (k-mers). In the present study, we investigate the use of k-mers to estimate the openness as an alternative to genes, and compare the results. An efficient implementation is also provided. |
format | Article |
id | doaj-art-68df90d3a0f645e78c7da3d68f0f288a |
institution | Kabale University |
issn | 2804-3871 |
language | English |
publishDate | 2024-04-01 |
publisher | Peer Community In |
record_format | Article |
series | Peer Community Journal |
spelling | doaj-art-68df90d3a0f645e78c7da3d68f0f288a2025-02-07T10:17:18ZengPeer Community InPeer Community Journal2804-38712024-04-01410.24072/pcjournal.41510.24072/pcjournal.415Revisiting pangenome openness with k-mers Parmigiani, Luca0https://orcid.org/0000-0002-2139-3259Wittler, Roland1https://orcid.org/0000-0002-2249-9880Stoye, Jens2https://orcid.org/0000-0002-4656-7155Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University – Bielefeld, Germany; Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University – Bielefeld, Germany; Graduate School “Digital Infrastructure for the Life Sciences” (DILS), Bielefeld University – Bielefeld, GermanyFaculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University – Bielefeld, Germany; Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University – Bielefeld, GermanyFaculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University – Bielefeld, Germany; Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University – Bielefeld, GermanyPangenomics is the study of related genomes collectively, usually from the same species or closely related taxa. Originally, pangenomes were defined for bacterial species. After the concept was extended to eukaryotic genomes, two definitions of pangenome evolved in parallel: the gene-based approach, which defines the pangenome as the union of all genes, and the sequence-based approach, which defines the pangenome as the set of all nonredundant genomic sequences. Estimating the total size of the pangenome for a given species has been subject of study since the very first mention of pangenomes. Traditionally, this is performed by predicting the ratio at which new genes are discovered, referred to as the openness of the species. Here, we abstract each genome as a set of items, which is entirely agnostic of the two approaches (gene-based, sequence-based). Genes are a viable option for items, but also other possibilities are feasible, e.g., genome sequence substrings of fixed length k (k-mers). In the present study, we investigate the use of k-mers to estimate the openness as an alternative to genes, and compare the results. An efficient implementation is also provided.https://peercommunityjournal.org/articles/10.24072/pcjournal.415/ |
spellingShingle | Parmigiani, Luca Wittler, Roland Stoye, Jens Revisiting pangenome openness with k-mers Peer Community Journal |
title | Revisiting pangenome openness with k-mers
|
title_full | Revisiting pangenome openness with k-mers
|
title_fullStr | Revisiting pangenome openness with k-mers
|
title_full_unstemmed | Revisiting pangenome openness with k-mers
|
title_short | Revisiting pangenome openness with k-mers
|
title_sort | revisiting pangenome openness with k mers |
url | https://peercommunityjournal.org/articles/10.24072/pcjournal.415/ |
work_keys_str_mv | AT parmigianiluca revisitingpangenomeopennesswithkmers AT wittlerroland revisitingpangenomeopennesswithkmers AT stoyejens revisitingpangenomeopennesswithkmers |