Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization

With the rapid development of high-density molecular marker chips and high-throughput sequencing technologies, genomic selection/prediction (GS/GP) has been widely applied in plant breeding. Arabidopsis thaliana, as a common model organism, provides important resources for dissecting genetic variati...

Full description

Saved in:
Bibliographic Details
Main Authors: Qingfang Ba, Heng Zhou, Zheming Yuan, Zhijun Dai
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Bioinformatics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fbinf.2025.1607119/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850118494286249984
author Qingfang Ba
Heng Zhou
Zheming Yuan
Zhijun Dai
author_facet Qingfang Ba
Heng Zhou
Zheming Yuan
Zhijun Dai
author_sort Qingfang Ba
collection DOAJ
description With the rapid development of high-density molecular marker chips and high-throughput sequencing technologies, genomic selection/prediction (GS/GP) has been widely applied in plant breeding. Arabidopsis thaliana, as a common model organism, provides important resources for dissecting genetic variation and evolutionary mechanisms of complex traits. Quantitative traits are typically influenced by multiple minor-effect genes, which are often functionally related and can be enriched within gene ontology (GO) pathways. However, optimizing marker subsets associated with these pathways to enhance GP performance remains challenging. In this study, we propose an improved GS framework called binGO-GS by integrating GO-based biological priors with a novel bin-based combinatorial SNP subset selection strategy. We evaluated the performance of binGO-GS on nine quantitative traits from two A. thaliana datasets, comprising nearly 1,000 samples and over 1.8 million SNPs. Compared with using either the full marker set or randomly selected markers with Genomic BLUP (GBLUP), binGO-GS achieved statistically significant improvements in prediction accuracy across all traits. Similar improvements were observed across six additional regression models when applying binGO-GS instead of the full marker set. Furthermore, the selected markers for identical or similar morphological traits exhibited consistent patterns in quantity and genomic distribution, supporting the polygenic model of complex quantitative traits driven by minor-effect genes. Taken together, binGO-GS offers a powerful and interpretable approach to enhance GS performance, providing a methodological reference for accelerating plant breeding and germplasm innovation.
format Article
id doaj-art-8daa370a8b3f49f0a5f4e6ba0e924995
institution OA Journals
issn 2673-7647
language English
publishDate 2025-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Bioinformatics
spelling doaj-art-8daa370a8b3f49f0a5f4e6ba0e9249952025-08-20T02:35:51ZengFrontiers Media S.A.Frontiers in Bioinformatics2673-76472025-06-01510.3389/fbinf.2025.16071191607119Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimizationQingfang BaHeng ZhouZheming YuanZhijun DaiWith the rapid development of high-density molecular marker chips and high-throughput sequencing technologies, genomic selection/prediction (GS/GP) has been widely applied in plant breeding. Arabidopsis thaliana, as a common model organism, provides important resources for dissecting genetic variation and evolutionary mechanisms of complex traits. Quantitative traits are typically influenced by multiple minor-effect genes, which are often functionally related and can be enriched within gene ontology (GO) pathways. However, optimizing marker subsets associated with these pathways to enhance GP performance remains challenging. In this study, we propose an improved GS framework called binGO-GS by integrating GO-based biological priors with a novel bin-based combinatorial SNP subset selection strategy. We evaluated the performance of binGO-GS on nine quantitative traits from two A. thaliana datasets, comprising nearly 1,000 samples and over 1.8 million SNPs. Compared with using either the full marker set or randomly selected markers with Genomic BLUP (GBLUP), binGO-GS achieved statistically significant improvements in prediction accuracy across all traits. Similar improvements were observed across six additional regression models when applying binGO-GS instead of the full marker set. Furthermore, the selected markers for identical or similar morphological traits exhibited consistent patterns in quantity and genomic distribution, supporting the polygenic model of complex quantitative traits driven by minor-effect genes. Taken together, binGO-GS offers a powerful and interpretable approach to enhance GS performance, providing a methodological reference for accelerating plant breeding and germplasm innovation.https://www.frontiersin.org/articles/10.3389/fbinf.2025.1607119/fullgenomic selection/predictionSNPsubset selectiongene ontologybiological priorsArabidopsis thaliana
spellingShingle Qingfang Ba
Heng Zhou
Zheming Yuan
Zhijun Dai
Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization
Frontiers in Bioinformatics
genomic selection/prediction
SNP
subset selection
gene ontology
biological priors
Arabidopsis thaliana
title Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization
title_full Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization
title_fullStr Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization
title_full_unstemmed Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization
title_short Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization
title_sort enhancing genomic prediction in arabidopsis thaliana with optimized snp subset by leveraging gene ontology priors and bin based combinatorial optimization
topic genomic selection/prediction
SNP
subset selection
gene ontology
biological priors
Arabidopsis thaliana
url https://www.frontiersin.org/articles/10.3389/fbinf.2025.1607119/full
work_keys_str_mv AT qingfangba enhancinggenomicpredictioninarabidopsisthalianawithoptimizedsnpsubsetbyleveraginggeneontologypriorsandbinbasedcombinatorialoptimization
AT hengzhou enhancinggenomicpredictioninarabidopsisthalianawithoptimizedsnpsubsetbyleveraginggeneontologypriorsandbinbasedcombinatorialoptimization
AT zhemingyuan enhancinggenomicpredictioninarabidopsisthalianawithoptimizedsnpsubsetbyleveraginggeneontologypriorsandbinbasedcombinatorialoptimization
AT zhijundai enhancinggenomicpredictioninarabidopsisthalianawithoptimizedsnpsubsetbyleveraginggeneontologypriorsandbinbasedcombinatorialoptimization