Investigating fungal diversity through metabarcoding for environmental samples: assessment of ITS1 and ITS2 Illumina sequencing using multiple defined mock communities with different classification methods and reference databases

Abstract An important challenge in taxonomic classification of environmental samples is capturing the real diversity by identifying all species present in a sample. Metabarcoding approaches are often employed to identify species in complex samples. The internal transcribed spacer (ITS) region is the...

Full description

Saved in:
Bibliographic Details
Main Authors: Raf Winand, Elizabet D’hooge, Alexander Van Uffelen, Bert Bogaerts, Julien Van Braekel, Stefan Hoffman, Nancy H. C. J. Roosens, Pierre Becker, Sigrid C. J. De Keersmaecker, Kevin Vanneste
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-025-11917-y
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract An important challenge in taxonomic classification of environmental samples is capturing the real diversity by identifying all species present in a sample. Metabarcoding approaches are often employed to identify species in complex samples. The internal transcribed spacer (ITS) region is the official, widely adopted, barcode for identifying fungal species. Metabarcoding can be done in many different ways with multiple choices at different steps of the workflow. We present a comparative evaluation of the sequenced region (ITS1 and/or ITS2), two different reference databases (UNITE versus BCCM/IHEM), two different bioinformatics software packages (BLAST versus mothur), and the considered taxonomic level (species versus genus level), to accurately capture the diversity using 37 fungal defined mock communities (DMCs). The DMCs cover a broad range of fungal diversity, including 42 Ascomycota species (26 genera), 4 Basidiomycota species (4 genera), and 5 Mucoromycota species (5 genera), all commonly found in indoor environments in Western Europe. Classification performance was first evaluated using ITS1 and ITS2 sequences of all species in the DMCs, generated by Sanger sequencing, to evaluate the discriminatory power of ITS and set a baseline for subsequent comparison with Illumina sequencing. Classification performance was found to be variable depending on all considered variables (sequencing technology, taxonomic level, ITS region, software, database) with 56–100% of species correctly assigned. Sanger sequencing showed that neither ITS1 nor ITS2 resulted in optimal performance due to its low discriminatory power within certain genera. Compared to Sanger sequencing, Illumina sequencing generally resulted in lower precision but comparable recall. Classification performance was generally good at genus but not at species level, although intermediate taxonomic levels could present adequate alternatives. ITS2 typically resulted in slightly better precision and comparable recall compared to ITS1. The employed reference database had a marked effect, with BCCM/IHEM performing better than UNITE due to the difference in number of sequences in each database. BLAST resulted in better performance, but required expert curation, whereas mothur performed better when using an automated workflow. Estimating species abundances using Illumina sequencing read counts generally performed only poorly, although read abundance filtering could increase the precision of ITS1, but not ITS2. Each approach comes with its own advantages and inconveniences and should be carefully selected based on the objectives of the analysis. Our results highlight the power of metabarcoding using Illumina sequencing for investigating fungal diversity in complex samples and can guide scientists in selecting the most appropriate setup for their own purposes.
ISSN:1471-2164