Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population

Abstract Background Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning...

Full description

Saved in:
Bibliographic Details
Main Authors: Jifan Yang, Mario P. L. Calus, Yvonne C. J. Wientjes, Theo H. E. Meuwissen, Pascal Duenk
Format: Article
Language:English
Published: BMC 2025-08-01
Series:Journal of Animal Science and Biotechnology
Subjects:
Online Access:https://doi.org/10.1186/s40104-025-01250-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849226077923180544
author Jifan Yang
Mario P. L. Calus
Yvonne C. J. Wientjes
Theo H. E. Meuwissen
Pascal Duenk
author_facet Jifan Yang
Mario P. L. Calus
Yvonne C. J. Wientjes
Theo H. E. Meuwissen
Pascal Duenk
author_sort Jifan Yang
collection DOAJ
description Abstract Background Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning models that are able to distinguish informative from uninformative features. The objective of this study was to assess the benefit of incorporating QTL genotypes in GBLUP and machine learning models. This study simulated a selected livestock population where QTL and their effects were known. We used four genomic prediction models, GBLUP, (weighted) 2GBLUP, random forest (RF), and support vector regression (SVR) to predict breeding values of young animals, and considered different scenarios that varied in the proportion of genetic variance explained by the included QTL. Results 2GBLUP resulted in the highest accuracy. Its accuracy increased when the included QTL explained up to 80% of the genetic variance, after which the accuracy dropped. With a weighted 2GBLUP model, the accuracy always increased when more QTL were included. Prediction accuracy of GBLUP was consistently higher than SVR, and the accuracy for both models slightly increased with more QTL information included. The RF model resulted in the lowest prediction accuracy, and did not improve by including QTL information. Conclusions Our results show that incorporating QTL information in GBLUP and SVR can improve prediction accuracy, but the extent of improvement varies across models. RF had a much lower prediction accuracy than the other models and did not show improvements when QTL information was added. Two possible reasons for this result are that the data structure in our data does not allow RF to fully realize its potential and that RF is not designed well for this particular prediction problem. Our study highlighted the importance of selecting appropriate models for genomic prediction and underscored the potential limitations of machine learning models when applied to genomic prediction in livestock.
format Article
id doaj-art-915c1e4fc3124c34aaadd129e7ef10f6
institution Kabale University
issn 2049-1891
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series Journal of Animal Science and Biotechnology
spelling doaj-art-915c1e4fc3124c34aaadd129e7ef10f62025-08-24T11:42:41ZengBMCJournal of Animal Science and Biotechnology2049-18912025-08-0116111210.1186/s40104-025-01250-5Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock populationJifan Yang0Mario P. L. Calus1Yvonne C. J. Wientjes2Theo H. E. Meuwissen3Pascal Duenk4Animal Breeding and Genomics, Wageningen University & ResearchAnimal Breeding and Genomics, Wageningen University & ResearchAnimal Breeding and Genomics, Wageningen University & ResearchFaculty of Life Sciences, Norwegian University of Life SciencesAnimal Breeding and Genomics, Wageningen University & ResearchAbstract Background Genomic prediction has revolutionized animal breeding, with GBLUP being the most widely used prediction model. In theory, the accuracy of genomic prediction could be improved by incorporating information from QTL. This strategy could be especially beneficial for machine learning models that are able to distinguish informative from uninformative features. The objective of this study was to assess the benefit of incorporating QTL genotypes in GBLUP and machine learning models. This study simulated a selected livestock population where QTL and their effects were known. We used four genomic prediction models, GBLUP, (weighted) 2GBLUP, random forest (RF), and support vector regression (SVR) to predict breeding values of young animals, and considered different scenarios that varied in the proportion of genetic variance explained by the included QTL. Results 2GBLUP resulted in the highest accuracy. Its accuracy increased when the included QTL explained up to 80% of the genetic variance, after which the accuracy dropped. With a weighted 2GBLUP model, the accuracy always increased when more QTL were included. Prediction accuracy of GBLUP was consistently higher than SVR, and the accuracy for both models slightly increased with more QTL information included. The RF model resulted in the lowest prediction accuracy, and did not improve by including QTL information. Conclusions Our results show that incorporating QTL information in GBLUP and SVR can improve prediction accuracy, but the extent of improvement varies across models. RF had a much lower prediction accuracy than the other models and did not show improvements when QTL information was added. Two possible reasons for this result are that the data structure in our data does not allow RF to fully realize its potential and that RF is not designed well for this particular prediction problem. Our study highlighted the importance of selecting appropriate models for genomic prediction and underscored the potential limitations of machine learning models when applied to genomic prediction in livestock.https://doi.org/10.1186/s40104-025-01250-5GBLUPGenomic predictionMachine learningQTLRandom forestSupport vector regression
spellingShingle Jifan Yang
Mario P. L. Calus
Yvonne C. J. Wientjes
Theo H. E. Meuwissen
Pascal Duenk
Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population
Journal of Animal Science and Biotechnology
GBLUP
Genomic prediction
Machine learning
QTL
Random forest
Support vector regression
title Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population
title_full Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population
title_fullStr Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population
title_full_unstemmed Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population
title_short Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population
title_sort incorporating information of causal variants in genomic prediction using gblup or machine learning models in a simulated livestock population
topic GBLUP
Genomic prediction
Machine learning
QTL
Random forest
Support vector regression
url https://doi.org/10.1186/s40104-025-01250-5
work_keys_str_mv AT jifanyang incorporatinginformationofcausalvariantsingenomicpredictionusinggblupormachinelearningmodelsinasimulatedlivestockpopulation
AT marioplcalus incorporatinginformationofcausalvariantsingenomicpredictionusinggblupormachinelearningmodelsinasimulatedlivestockpopulation
AT yvonnecjwientjes incorporatinginformationofcausalvariantsingenomicpredictionusinggblupormachinelearningmodelsinasimulatedlivestockpopulation
AT theohemeuwissen incorporatinginformationofcausalvariantsingenomicpredictionusinggblupormachinelearningmodelsinasimulatedlivestockpopulation
AT pascalduenk incorporatinginformationofcausalvariantsingenomicpredictionusinggblupormachinelearningmodelsinasimulatedlivestockpopulation