Machine learning for genomic and pedigree prediction in sugarcane

Abstract Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non‐additive genetic effects is essential in genome prediction (GP) models of crops with highly heter...

Full description

Saved in:
Bibliographic Details
Main Authors: Minoru Inamori, Tatsuro Kimura, Masaaki Mori, Yusuke Tarumoto, Taiichiro Hattori, Michiko Hayano, Makoto Umeda, Hiroyoshi Iwata
Format: Article
Language:English
Published: Wiley 2024-09-01
Series:The Plant Genome
Online Access:https://doi.org/10.1002/tpg2.20486
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849221237651275776
author Minoru Inamori
Tatsuro Kimura
Masaaki Mori
Yusuke Tarumoto
Taiichiro Hattori
Michiko Hayano
Makoto Umeda
Hiroyoshi Iwata
author_facet Minoru Inamori
Tatsuro Kimura
Masaaki Mori
Yusuke Tarumoto
Taiichiro Hattori
Michiko Hayano
Makoto Umeda
Hiroyoshi Iwata
author_sort Minoru Inamori
collection DOAJ
description Abstract Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non‐additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non‐additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single‐nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross‐validation methods: repeated 10‐fold cross‐validation and leave‐one‐family‐out cross‐validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10‐fold cross‐validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave‐one‐family‐out cross‐validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.
format Article
id doaj-art-44887a344fbb47caa5da6b7fe615c02f
institution Kabale University
issn 1940-3372
language English
publishDate 2024-09-01
publisher Wiley
record_format Article
series The Plant Genome
spelling doaj-art-44887a344fbb47caa5da6b7fe615c02f2024-11-17T09:46:07ZengWileyThe Plant Genome1940-33722024-09-01173n/an/a10.1002/tpg2.20486Machine learning for genomic and pedigree prediction in sugarcaneMinoru Inamori0Tatsuro Kimura1Masaaki Mori2Yusuke Tarumoto3Taiichiro Hattori4Michiko Hayano5Makoto Umeda6Hiroyoshi Iwata7Laboratory of Biometry and Bioinformatics, Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences The University of Tokyo Tokyo JapanToyota Motor Corporation, New Business Planning Division, Agriculture & Biotechnology Business Department Toyota JapanToyota Motor Corporation, Environment Affairs and Engineering Management Division, CN Advanced Engineering Development Center Tokyo JapanNARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site Nishinoomote JapanNARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site Nishinoomote JapanNARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site Nishinoomote JapanNARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site Nishinoomote JapanLaboratory of Biometry and Bioinformatics, Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences The University of Tokyo Tokyo JapanAbstract Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non‐additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non‐additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single‐nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross‐validation methods: repeated 10‐fold cross‐validation and leave‐one‐family‐out cross‐validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10‐fold cross‐validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave‐one‐family‐out cross‐validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.https://doi.org/10.1002/tpg2.20486
spellingShingle Minoru Inamori
Tatsuro Kimura
Masaaki Mori
Yusuke Tarumoto
Taiichiro Hattori
Michiko Hayano
Makoto Umeda
Hiroyoshi Iwata
Machine learning for genomic and pedigree prediction in sugarcane
The Plant Genome
title Machine learning for genomic and pedigree prediction in sugarcane
title_full Machine learning for genomic and pedigree prediction in sugarcane
title_fullStr Machine learning for genomic and pedigree prediction in sugarcane
title_full_unstemmed Machine learning for genomic and pedigree prediction in sugarcane
title_short Machine learning for genomic and pedigree prediction in sugarcane
title_sort machine learning for genomic and pedigree prediction in sugarcane
url https://doi.org/10.1002/tpg2.20486
work_keys_str_mv AT minoruinamori machinelearningforgenomicandpedigreepredictioninsugarcane
AT tatsurokimura machinelearningforgenomicandpedigreepredictioninsugarcane
AT masaakimori machinelearningforgenomicandpedigreepredictioninsugarcane
AT yusuketarumoto machinelearningforgenomicandpedigreepredictioninsugarcane
AT taiichirohattori machinelearningforgenomicandpedigreepredictioninsugarcane
AT michikohayano machinelearningforgenomicandpedigreepredictioninsugarcane
AT makotoumeda machinelearningforgenomicandpedigreepredictioninsugarcane
AT hiroyoshiiwata machinelearningforgenomicandpedigreepredictioninsugarcane