MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation

Metagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknow...

Full description

Saved in:
Bibliographic Details
Main Authors: Daniel Ramos Lopez, Francisco J. Flores, Andres S. Espindola
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Biology
Subjects:
Online Access:https://www.mdpi.com/2079-7737/14/1/69
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589028455612416
author Daniel Ramos Lopez
Francisco J. Flores
Andres S. Espindola
author_facet Daniel Ramos Lopez
Francisco J. Flores
Andres S. Espindola
author_sort Daniel Ramos Lopez
collection DOAJ
description Metagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknown species to applications in disease diagnostics. Robust validation processes are essential for test reliability, requiring standard samples and databases deriving from real samples and in silico generated artificial controls. We propose a MeStanG as a resource for generating HTS Nanopore data sets to evaluate present and emerging bioinformatics pipelines. MeStanG allows samples to be designed with user-defined organism abundances expressed as number of reads, reference sequences, and predetermined or custom errors by sequencing profiles. The simulator pipeline was evaluated by analyzing its output mock metagenomic samples containing known read abundances using read mapping, genome assembly, and taxonomic classification on three scenarios: a bacterial community composed of nine different organisms, samples resembling pathogen-infected wheat plants, and a viral pathogen serial dilution sampling. The evaluation was able to report consistently the same organisms, and their read abundances as provided in the mock metagenomic sample design. Based on this performance and its novel capacity of generating exact number of reads, MeStanG can be used by scientists to develop mock metagenomic samples (artificial HTS data sets) to assess the diagnostic performance metrics of bioinformatic pipelines, allowing the user to choose predetermined or customized models for research and training.
format Article
id doaj-art-1d996b17285c44fab37d2ed9087c5689
institution Kabale University
issn 2079-7737
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Biology
spelling doaj-art-1d996b17285c44fab37d2ed9087c56892025-01-24T13:23:30ZengMDPI AGBiology2079-77372025-01-011416910.3390/biology14010069MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and ValidationDaniel Ramos Lopez0Francisco J. Flores1Andres S. Espindola2Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK 74078, USADepartamento de Ciencias de la Vida y la Agricultura, Universidad de las Fuerzas Armadas-ESPE, Sangolquí 171103, EcuadorInstitute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK 74078, USAMetagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknown species to applications in disease diagnostics. Robust validation processes are essential for test reliability, requiring standard samples and databases deriving from real samples and in silico generated artificial controls. We propose a MeStanG as a resource for generating HTS Nanopore data sets to evaluate present and emerging bioinformatics pipelines. MeStanG allows samples to be designed with user-defined organism abundances expressed as number of reads, reference sequences, and predetermined or custom errors by sequencing profiles. The simulator pipeline was evaluated by analyzing its output mock metagenomic samples containing known read abundances using read mapping, genome assembly, and taxonomic classification on three scenarios: a bacterial community composed of nine different organisms, samples resembling pathogen-infected wheat plants, and a viral pathogen serial dilution sampling. The evaluation was able to report consistently the same organisms, and their read abundances as provided in the mock metagenomic sample design. Based on this performance and its novel capacity of generating exact number of reads, MeStanG can be used by scientists to develop mock metagenomic samples (artificial HTS data sets) to assess the diagnostic performance metrics of bioinformatic pipelines, allowing the user to choose predetermined or customized models for research and training.https://www.mdpi.com/2079-7737/14/1/69bioinformaticsmetagenomicshigh-throughput sequencing
spellingShingle Daniel Ramos Lopez
Francisco J. Flores
Andres S. Espindola
MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation
Biology
bioinformatics
metagenomics
high-throughput sequencing
title MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation
title_full MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation
title_fullStr MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation
title_full_unstemmed MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation
title_short MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation
title_sort mestang resource for high throughput sequencing standard data sets generation for bioinformatic methods evaluation and validation
topic bioinformatics
metagenomics
high-throughput sequencing
url https://www.mdpi.com/2079-7737/14/1/69
work_keys_str_mv AT danielramoslopez mestangresourceforhighthroughputsequencingstandarddatasetsgenerationforbioinformaticmethodsevaluationandvalidation
AT franciscojflores mestangresourceforhighthroughputsequencingstandarddatasetsgenerationforbioinformaticmethodsevaluationandvalidation
AT andressespindola mestangresourceforhighthroughputsequencingstandarddatasetsgenerationforbioinformaticmethodsevaluationandvalidation