MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation
Metagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknow...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Biology |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-7737/14/1/69 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589028455612416 |
---|---|
author | Daniel Ramos Lopez Francisco J. Flores Andres S. Espindola |
author_facet | Daniel Ramos Lopez Francisco J. Flores Andres S. Espindola |
author_sort | Daniel Ramos Lopez |
collection | DOAJ |
description | Metagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknown species to applications in disease diagnostics. Robust validation processes are essential for test reliability, requiring standard samples and databases deriving from real samples and in silico generated artificial controls. We propose a MeStanG as a resource for generating HTS Nanopore data sets to evaluate present and emerging bioinformatics pipelines. MeStanG allows samples to be designed with user-defined organism abundances expressed as number of reads, reference sequences, and predetermined or custom errors by sequencing profiles. The simulator pipeline was evaluated by analyzing its output mock metagenomic samples containing known read abundances using read mapping, genome assembly, and taxonomic classification on three scenarios: a bacterial community composed of nine different organisms, samples resembling pathogen-infected wheat plants, and a viral pathogen serial dilution sampling. The evaluation was able to report consistently the same organisms, and their read abundances as provided in the mock metagenomic sample design. Based on this performance and its novel capacity of generating exact number of reads, MeStanG can be used by scientists to develop mock metagenomic samples (artificial HTS data sets) to assess the diagnostic performance metrics of bioinformatic pipelines, allowing the user to choose predetermined or customized models for research and training. |
format | Article |
id | doaj-art-1d996b17285c44fab37d2ed9087c5689 |
institution | Kabale University |
issn | 2079-7737 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Biology |
spelling | doaj-art-1d996b17285c44fab37d2ed9087c56892025-01-24T13:23:30ZengMDPI AGBiology2079-77372025-01-011416910.3390/biology14010069MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and ValidationDaniel Ramos Lopez0Francisco J. Flores1Andres S. Espindola2Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK 74078, USADepartamento de Ciencias de la Vida y la Agricultura, Universidad de las Fuerzas Armadas-ESPE, Sangolquí 171103, EcuadorInstitute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK 74078, USAMetagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknown species to applications in disease diagnostics. Robust validation processes are essential for test reliability, requiring standard samples and databases deriving from real samples and in silico generated artificial controls. We propose a MeStanG as a resource for generating HTS Nanopore data sets to evaluate present and emerging bioinformatics pipelines. MeStanG allows samples to be designed with user-defined organism abundances expressed as number of reads, reference sequences, and predetermined or custom errors by sequencing profiles. The simulator pipeline was evaluated by analyzing its output mock metagenomic samples containing known read abundances using read mapping, genome assembly, and taxonomic classification on three scenarios: a bacterial community composed of nine different organisms, samples resembling pathogen-infected wheat plants, and a viral pathogen serial dilution sampling. The evaluation was able to report consistently the same organisms, and their read abundances as provided in the mock metagenomic sample design. Based on this performance and its novel capacity of generating exact number of reads, MeStanG can be used by scientists to develop mock metagenomic samples (artificial HTS data sets) to assess the diagnostic performance metrics of bioinformatic pipelines, allowing the user to choose predetermined or customized models for research and training.https://www.mdpi.com/2079-7737/14/1/69bioinformaticsmetagenomicshigh-throughput sequencing |
spellingShingle | Daniel Ramos Lopez Francisco J. Flores Andres S. Espindola MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation Biology bioinformatics metagenomics high-throughput sequencing |
title | MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation |
title_full | MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation |
title_fullStr | MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation |
title_full_unstemmed | MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation |
title_short | MeStanG—Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation |
title_sort | mestang resource for high throughput sequencing standard data sets generation for bioinformatic methods evaluation and validation |
topic | bioinformatics metagenomics high-throughput sequencing |
url | https://www.mdpi.com/2079-7737/14/1/69 |
work_keys_str_mv | AT danielramoslopez mestangresourceforhighthroughputsequencingstandarddatasetsgenerationforbioinformaticmethodsevaluationandvalidation AT franciscojflores mestangresourceforhighthroughputsequencingstandarddatasetsgenerationforbioinformaticmethodsevaluationandvalidation AT andressespindola mestangresourceforhighthroughputsequencingstandarddatasetsgenerationforbioinformaticmethodsevaluationandvalidation |