Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species

Abstract Background Diversifying animal cultivation demands efficient genotyping for enabling genomic selection, but non-model species lack efficient genotyping solutions. The aim of this study was to optimize a genotyping-by-sequencing (GBS) double-digest RAD-sequencing (ddRAD) pipeline. Bovine dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Daniel Fischer, Miika Tapio, Oliver Bitz, Terhi Iso-Touru, Antti Kause, Ilma Tapio
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-025-11296-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823863321800998912
author Daniel Fischer
Miika Tapio
Oliver Bitz
Terhi Iso-Touru
Antti Kause
Ilma Tapio
author_facet Daniel Fischer
Miika Tapio
Oliver Bitz
Terhi Iso-Touru
Antti Kause
Ilma Tapio
author_sort Daniel Fischer
collection DOAJ
description Abstract Background Diversifying animal cultivation demands efficient genotyping for enabling genomic selection, but non-model species lack efficient genotyping solutions. The aim of this study was to optimize a genotyping-by-sequencing (GBS) double-digest RAD-sequencing (ddRAD) pipeline. Bovine data was used to automate the bioinformatic analysis. The application of the optimization was demonstrated on non-model European whitefish data. Results DdRAD data generation was designed for a reliable estimation of relatedness and is scalable to up to 384 samples. The GBS sequencing yielded approximately one million reads for each of the around 100 assessed samples. Optimizing various strategies to create a de-novo reference genome for variant calling (mock reference) showed that using three samples outperformed other building strategies with single or very large number of samples. Adjustments to most pipeline tuning parameters had limited impact on high-quality data, except for the identity criterion for merging mock reference genome clusters. For each species, over 15k GBS variants based on the mock reference were obtained and showed comparable results with the ones called using an existing reference genome. Repeatability analysis showed high concordance over replicates, particularly in bovine while in European whitefish data repeatability did not exceed earlier observations. Conclusions The proposed cost-effective ddRAD strategy, coupled with an efficient bioinformatics workflow, enables broad adoption of ddRAD GBS across diverse farmed species. While beneficial, a reference genome is not obligatory. The integration of Snakemake streamlines the pipeline usage on computer clusters and supports customization. This user-friendly solution facilitates genotyping for both model and non-model species.
format Article
id doaj-art-f5a88a88d00441c58604ab9f7fd7a3c1
institution Kabale University
issn 1471-2164
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj-art-f5a88a88d00441c58604ab9f7fd7a3c12025-02-09T12:13:48ZengBMCBMC Genomics1471-21642025-02-0126111710.1186/s12864-025-11296-4Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed speciesDaniel Fischer0Miika Tapio1Oliver Bitz2Terhi Iso-Touru3Antti Kause4Ilma Tapio5Applied Statistical Methods, Natural Resources, Natural Resources Institute Finland (Luke)Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke)Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke)Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke)Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke)Genomics and Breeding, Production Systems, Natural Resources Institute Finland (Luke)Abstract Background Diversifying animal cultivation demands efficient genotyping for enabling genomic selection, but non-model species lack efficient genotyping solutions. The aim of this study was to optimize a genotyping-by-sequencing (GBS) double-digest RAD-sequencing (ddRAD) pipeline. Bovine data was used to automate the bioinformatic analysis. The application of the optimization was demonstrated on non-model European whitefish data. Results DdRAD data generation was designed for a reliable estimation of relatedness and is scalable to up to 384 samples. The GBS sequencing yielded approximately one million reads for each of the around 100 assessed samples. Optimizing various strategies to create a de-novo reference genome for variant calling (mock reference) showed that using three samples outperformed other building strategies with single or very large number of samples. Adjustments to most pipeline tuning parameters had limited impact on high-quality data, except for the identity criterion for merging mock reference genome clusters. For each species, over 15k GBS variants based on the mock reference were obtained and showed comparable results with the ones called using an existing reference genome. Repeatability analysis showed high concordance over replicates, particularly in bovine while in European whitefish data repeatability did not exceed earlier observations. Conclusions The proposed cost-effective ddRAD strategy, coupled with an efficient bioinformatics workflow, enables broad adoption of ddRAD GBS across diverse farmed species. While beneficial, a reference genome is not obligatory. The integration of Snakemake streamlines the pipeline usage on computer clusters and supports customization. This user-friendly solution facilitates genotyping for both model and non-model species.https://doi.org/10.1186/s12864-025-11296-4Genotyping by sequencingSnakemakeVariant callingCattleAquacultureRepeatability
spellingShingle Daniel Fischer
Miika Tapio
Oliver Bitz
Terhi Iso-Touru
Antti Kause
Ilma Tapio
Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species
BMC Genomics
Genotyping by sequencing
Snakemake
Variant calling
Cattle
Aquaculture
Repeatability
title Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species
title_full Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species
title_fullStr Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species
title_full_unstemmed Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species
title_short Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species
title_sort fine tuning gbs data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species
topic Genotyping by sequencing
Snakemake
Variant calling
Cattle
Aquaculture
Repeatability
url https://doi.org/10.1186/s12864-025-11296-4
work_keys_str_mv AT danielfischer finetuninggbsdatawithcomparisonofreferenceandmockgenomeapproachesforadvancinggenomicselectioninlessstudiedfarmedspecies
AT miikatapio finetuninggbsdatawithcomparisonofreferenceandmockgenomeapproachesforadvancinggenomicselectioninlessstudiedfarmedspecies
AT oliverbitz finetuninggbsdatawithcomparisonofreferenceandmockgenomeapproachesforadvancinggenomicselectioninlessstudiedfarmedspecies
AT terhiisotouru finetuninggbsdatawithcomparisonofreferenceandmockgenomeapproachesforadvancinggenomicselectioninlessstudiedfarmedspecies
AT anttikause finetuninggbsdatawithcomparisonofreferenceandmockgenomeapproachesforadvancinggenomicselectioninlessstudiedfarmedspecies
AT ilmatapio finetuninggbsdatawithcomparisonofreferenceandmockgenomeapproachesforadvancinggenomicselectioninlessstudiedfarmedspecies