MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification

Abstract Motivation Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either conta...

Full description

Saved in:
Bibliographic Details
Main Authors: M. Pilar Cabezas, Nuno A. Fonseca, Antonio Muñoz-Mérida
Format: Article
Language:English
Published: BMC 2024-11-01
Series:Environmental Microbiome
Online Access:https://doi.org/10.1186/s40793-024-00634-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850061927438352384
author M. Pilar Cabezas
Nuno A. Fonseca
Antonio Muñoz-Mérida
author_facet M. Pilar Cabezas
Nuno A. Fonseca
Antonio Muñoz-Mérida
author_sort M. Pilar Cabezas
collection DOAJ
description Abstract Motivation Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem. Results The current study presents MIMt, a new 16S rRNA database for archaea and bacteria’s identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.
format Article
id doaj-art-074e1e0fa324419bac7ca348fbf5e036
institution DOAJ
issn 2524-6372
language English
publishDate 2024-11-01
publisher BMC
record_format Article
series Environmental Microbiome
spelling doaj-art-074e1e0fa324419bac7ca348fbf5e0362025-08-20T02:50:04ZengBMCEnvironmental Microbiome2524-63722024-11-0119111310.1186/s40793-024-00634-wMIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identificationM. Pilar Cabezas0Nuno A. Fonseca1Antonio Muñoz-Mérida2Centre of Molecular and Environmental Biology (CBMA), Department of Biology, University of MinhoCIBIO-InBIO, Research Center in Biodiversity and Genetic ResourcesCIBIO-InBIO, Research Center in Biodiversity and Genetic ResourcesAbstract Motivation Accurate determination and quantification of the taxonomic composition of microbial communities, especially at the species level, is one of the major issues in metagenomics. This is primarily due to the limitations of commonly used 16S rRNA reference databases, which either contain a lot of redundancy or a high percentage of sequences with missing taxonomic information. This may lead to erroneous identifications and, thus, to inaccurate conclusions regarding the ecological role and importance of those microorganisms in the ecosystem. Results The current study presents MIMt, a new 16S rRNA database for archaea and bacteria’s identification, encompassing 47 001 sequences, all precisely identified at species level. In addition, a MIMt2.0 version was created with only curated sequences from RefSeq Targeted loci with 32 086 sequences. MIMt aims to be updated twice a year to include all newly sequenced species. We evaluated MIMt against Greengenes, RDP, GTDB and SILVA in terms of sequence distribution and taxonomic assignments accuracy. Our results showed that MIMt contains less redundancy, and despite being 20 to 500 times smaller than existing databases, outperforms them in completeness and taxonomic accuracy, enabling more precise assignments at lower taxonomic ranks and thus, significantly improving species-level identification.https://doi.org/10.1186/s40793-024-00634-w
spellingShingle M. Pilar Cabezas
Nuno A. Fonseca
Antonio Muñoz-Mérida
MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification
Environmental Microbiome
title MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification
title_full MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification
title_fullStr MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification
title_full_unstemmed MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification
title_short MIMt: a curated 16S rRNA reference database with less redundancy and higher accuracy at species-level identification
title_sort mimt a curated 16s rrna reference database with less redundancy and higher accuracy at species level identification
url https://doi.org/10.1186/s40793-024-00634-w
work_keys_str_mv AT mpilarcabezas mimtacurated16srrnareferencedatabasewithlessredundancyandhigheraccuracyatspecieslevelidentification
AT nunoafonseca mimtacurated16srrnareferencedatabasewithlessredundancyandhigheraccuracyatspecieslevelidentification
AT antoniomunozmerida mimtacurated16srrnareferencedatabasewithlessredundancyandhigheraccuracyatspecieslevelidentification