MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification
Abstract Motivation Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either conta...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2024-11-01
|
| Series: | Environmental Microbiome |
| Online Access: | https://doi.org/10.1186/s40793-024-00634-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850061927438352384 |
|---|---|
| author | M. Pilar Cabezas Nuno A. Fonseca Antonio Muñoz-Mérida |
| author_facet | M. Pilar Cabezas Nuno A. Fonseca Antonio Muñoz-Mérida |
| author_sort | M. Pilar Cabezas |
| collection | DOAJ |
| description | Abstract Motivation Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem. Results The current study presents MIMt, a new 16S rRNA database for archaea and bacteria’s identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification. |
| format | Article |
| id | doaj-art-074e1e0fa324419bac7ca348fbf5e036 |
| institution | DOAJ |
| issn | 2524-6372 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | BMC |
| record_format | Article |
| series | Environmental Microbiome |
| spelling | doaj-art-074e1e0fa324419bac7ca348fbf5e0362025-08-20T02:50:04ZengBMCEnvironmental Microbiome2524-63722024-11-0119111310.1186/s40793-024-00634-wMIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identificationM. Pilar Cabezas0Nuno A. Fonseca1Antonio Muñoz-Mérida2Centre of Molecular and Environmental Biology (CBMA), Department of Biology, University of MinhoCIBIO-InBIO, Research Center in Biodiversity and Genetic ResourcesCIBIO-InBIO, Research Center in Biodiversity and Genetic ResourcesAbstract Motivation Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem. Results The current study presents MIMt, a new 16S rRNA database for archaea and bacteria’s identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.https://doi.org/10.1186/s40793-024-00634-w |
| spellingShingle | M. Pilar Cabezas Nuno A. Fonseca Antonio Muñoz-Mérida MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification Environmental Microbiome |
| title | MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification |
| title_full | MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification |
| title_fullStr | MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification |
| title_full_unstemmed | MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification |
| title_short | MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification |
| title_sort | mimt a curated 16s rrna reference database with less redundancy and higher accuracy at species level identification |
| url | https://doi.org/10.1186/s40793-024-00634-w |
| work_keys_str_mv | AT mpilarcabezas mimtacurated16srrnareferencedatabasewithlessredundancyandhigheraccuracyatspecieslevelidentification AT nunoafonseca mimtacurated16srrnareferencedatabasewithlessredundancyandhigheraccuracyatspecieslevelidentification AT antoniomunozmerida mimtacurated16srrnareferencedatabasewithlessredundancyandhigheraccuracyatspecieslevelidentification |