Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection

Abstract Background Cacao (Theobroma cacao L.) breeding and improvement rely on understanding germplasm diversity and trait architecture. This study characterized a cacao collection (173 accessions) evaluated in Puerto Rico, examining phenotypic diversity, trait interrelationships, and performing co...

Full description

Saved in:
Bibliographic Details
Main Authors: Insuck Baek, Minhyeok Cha, Seunghyun Lim, Brian M. Irish, Sookyung Oh, Jishnu Bhatt, Rakesh K. Upadhyay, Moon S. Kim, Lyndel W. Meinhardt, Sunchung Park, Ezekiel Ahn
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Plant Biology
Online Access:https://doi.org/10.1186/s12870-025-07128-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849333442545713152
author Insuck Baek
Minhyeok Cha
Seunghyun Lim
Brian M. Irish
Sookyung Oh
Jishnu Bhatt
Rakesh K. Upadhyay
Moon S. Kim
Lyndel W. Meinhardt
Sunchung Park
Ezekiel Ahn
author_facet Insuck Baek
Minhyeok Cha
Seunghyun Lim
Brian M. Irish
Sookyung Oh
Jishnu Bhatt
Rakesh K. Upadhyay
Moon S. Kim
Lyndel W. Meinhardt
Sunchung Park
Ezekiel Ahn
author_sort Insuck Baek
collection DOAJ
description Abstract Background Cacao (Theobroma cacao L.) breeding and improvement rely on understanding germplasm diversity and trait architecture. This study characterized a cacao collection (173 accessions) evaluated in Puerto Rico, examining phenotypic diversity, trait interrelationships, and performing comparative analyses with published Trinidad and Colombia datasets. We also developed machine learning (ML) models for yield prediction and identified yield-associated SNP markers. Results The cacao collection showed significant phenotypic variation and strong intra-collection trait correlations. Comparative analyses revealed conserved trait responses across environments, notably linking susceptibility to black pod rot in Puerto Rico with Witches' Broom Disease in Colombia, suggesting a broad-spectrum disease response mechanism. Machine learning models effectively modeled yield, quantifying a hierarchy of predictor importance, with ‘Total pods’, ‘Infection rate’, and ‘Pod weight’ being the most influential. Integrating existing SNP data for 28 common accessions, multiple SNPs were identified as significantly associated with key horticultural traits, including ‘Total pods’, ‘Infection rate’, and ‘Yield’ (FDR < 0.01). Notably, a single genetic marker on chromosome 5 (TcSNP475), located within a putative zinc finger stress-associated protein gene (Tc05_t008610), was associated with both ‘Total pods’ and ‘Yield’, representing a prime target for marker-assisted selection. Conclusions This research provides a detailed characterization of a wide germplasm collection, robust yield predictors, and a suite of novel trait-linked genetic markers, offering valuable resources for cacao breeding. These integrated findings will provide a solid foundation for targeted breeding strategies and deeper molecular investigations into the mechanisms underpinning yield and stress resilience in this vital global crop.
format Article
id doaj-art-784c7087427f407f84b6fddadbcc4835
institution Kabale University
issn 1471-2229
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series BMC Plant Biology
spelling doaj-art-784c7087427f407f84b6fddadbcc48352025-08-20T03:45:51ZengBMCBMC Plant Biology1471-22292025-08-0125111710.1186/s12870-025-07128-yIntegrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collectionInsuck Baek0Minhyeok Cha1Seunghyun Lim2Brian M. Irish3Sookyung Oh4Jishnu Bhatt5Rakesh K. Upadhyay6Moon S. Kim7Lyndel W. Meinhardt8Sunchung Park9Ezekiel Ahn10Environmental Microbial and Food Safety Laboratory, Agricultural Research Service, Department of AgricultureEnvironmental Microbial and Food Safety Laboratory, Agricultural Research Service, Department of AgricultureSustainable Perennial Crops Laboratory, Agricultural Research Service, Department of AgriculturePlant Germplasm Introduction and Testing Research Unit, Agricultural Research Service, Department of AgricultureEnvironmental Microbial and Food Safety Laboratory, Agricultural Research Service, Department of AgricultureSustainable Perennial Crops Laboratory, Agricultural Research Service, Department of AgricultureDepartment of Natural Sciences, College of Arts and Sciences, Bowie State UniversityEnvironmental Microbial and Food Safety Laboratory, Agricultural Research Service, Department of AgricultureSustainable Perennial Crops Laboratory, Agricultural Research Service, Department of AgricultureSustainable Perennial Crops Laboratory, Agricultural Research Service, Department of AgricultureSustainable Perennial Crops Laboratory, Agricultural Research Service, Department of AgricultureAbstract Background Cacao (Theobroma cacao L.) breeding and improvement rely on understanding germplasm diversity and trait architecture. This study characterized a cacao collection (173 accessions) evaluated in Puerto Rico, examining phenotypic diversity, trait interrelationships, and performing comparative analyses with published Trinidad and Colombia datasets. We also developed machine learning (ML) models for yield prediction and identified yield-associated SNP markers. Results The cacao collection showed significant phenotypic variation and strong intra-collection trait correlations. Comparative analyses revealed conserved trait responses across environments, notably linking susceptibility to black pod rot in Puerto Rico with Witches' Broom Disease in Colombia, suggesting a broad-spectrum disease response mechanism. Machine learning models effectively modeled yield, quantifying a hierarchy of predictor importance, with ‘Total pods’, ‘Infection rate’, and ‘Pod weight’ being the most influential. Integrating existing SNP data for 28 common accessions, multiple SNPs were identified as significantly associated with key horticultural traits, including ‘Total pods’, ‘Infection rate’, and ‘Yield’ (FDR < 0.01). Notably, a single genetic marker on chromosome 5 (TcSNP475), located within a putative zinc finger stress-associated protein gene (Tc05_t008610), was associated with both ‘Total pods’ and ‘Yield’, representing a prime target for marker-assisted selection. Conclusions This research provides a detailed characterization of a wide germplasm collection, robust yield predictors, and a suite of novel trait-linked genetic markers, offering valuable resources for cacao breeding. These integrated findings will provide a solid foundation for targeted breeding strategies and deeper molecular investigations into the mechanisms underpinning yield and stress resilience in this vital global crop.https://doi.org/10.1186/s12870-025-07128-y
spellingShingle Insuck Baek
Minhyeok Cha
Seunghyun Lim
Brian M. Irish
Sookyung Oh
Jishnu Bhatt
Rakesh K. Upadhyay
Moon S. Kim
Lyndel W. Meinhardt
Sunchung Park
Ezekiel Ahn
Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection
BMC Plant Biology
title Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection
title_full Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection
title_fullStr Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection
title_full_unstemmed Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection
title_short Integrated phenotypic analysis, predictive modeling, and identification of novel trait-associated loci in a diverse Theobroma cacao collection
title_sort integrated phenotypic analysis predictive modeling and identification of novel trait associated loci in a diverse theobroma cacao collection
url https://doi.org/10.1186/s12870-025-07128-y
work_keys_str_mv AT insuckbaek integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT minhyeokcha integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT seunghyunlim integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT brianmirish integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT sookyungoh integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT jishnubhatt integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT rakeshkupadhyay integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT moonskim integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT lyndelwmeinhardt integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT sunchungpark integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection
AT ezekielahn integratedphenotypicanalysispredictivemodelingandidentificationofnoveltraitassociatedlociinadiversetheobromacacaocollection