QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize

Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for...

Full description

Saved in:
Bibliographic Details
Main Authors: Chuang Wang, Shenshen Wu, Zhou Yao, En Luo, Junli Deng, Jianxiao Liu
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-06-01
Series:Crop Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2214514125000686
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for prioritizing causal genes in maize based on the LightGBM algorithm. QTG-LGBM dynamically adjusts gene weights and sample proportions during training to mitigate the effects of class imbalance. The method prevents overfitting in datasets with small samples by introducing a regularization term. Experimental results on maize traits, including plant height (PH), flowering time (FT), and tassel branch number (TBN), demonstrated that QTG-LGBM outperforms the commonly used methods QTG-Finder, GBDT, XGBoost, BernoulliNB, SVM, CNN, and ensemble learning. We validated the generalization of QTG-LGBM using Arabidopsis, rice, Setaria, and sorghum. We also applied QTG-LGBM using reported QTL that affect traits of maize PH, FT and TBN, and FT in Arabidopsis, rice, and sorghum, as well as known causal genes within the QTL. When examining the top 20% of ranked genes, QTG-LGBM demonstrated a significantly higher recall rate of causal genes compared to random selection methods. We identified key gene features affecting phenotypes through feature importance analysis. QTG-LGBM is available at http://www.deepcba.com/QTG-LGBM.
ISSN:2214-5141