QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize

Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for...

Full description

Saved in:
Bibliographic Details
Main Authors: Chuang Wang, Shenshen Wu, Zhou Yao, En Luo, Junli Deng, Jianxiao Liu
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-06-01
Series:Crop Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2214514125000686
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850207384883953664
author Chuang Wang
Shenshen Wu
Zhou Yao
En Luo
Junli Deng
Jianxiao Liu
author_facet Chuang Wang
Shenshen Wu
Zhou Yao
En Luo
Junli Deng
Jianxiao Liu
author_sort Chuang Wang
collection DOAJ
description Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for prioritizing causal genes in maize based on the LightGBM algorithm. QTG-LGBM dynamically adjusts gene weights and sample proportions during training to mitigate the effects of class imbalance. The method prevents overfitting in datasets with small samples by introducing a regularization term. Experimental results on maize traits, including plant height (PH), flowering time (FT), and tassel branch number (TBN), demonstrated that QTG-LGBM outperforms the commonly used methods QTG-Finder, GBDT, XGBoost, BernoulliNB, SVM, CNN, and ensemble learning. We validated the generalization of QTG-LGBM using Arabidopsis, rice, Setaria, and sorghum. We also applied QTG-LGBM using reported QTL that affect traits of maize PH, FT and TBN, and FT in Arabidopsis, rice, and sorghum, as well as known causal genes within the QTL. When examining the top 20% of ranked genes, QTG-LGBM demonstrated a significantly higher recall rate of causal genes compared to random selection methods. We identified key gene features affecting phenotypes through feature importance analysis. QTG-LGBM is available at http://www.deepcba.com/QTG-LGBM.
format Article
id doaj-art-83296645ca6c42b1a386ff3e79d2f69d
institution OA Journals
issn 2214-5141
language English
publishDate 2025-06-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Crop Journal
spelling doaj-art-83296645ca6c42b1a386ff3e79d2f69d2025-08-20T02:10:32ZengKeAi Communications Co., Ltd.Crop Journal2214-51412025-06-0113387388610.1016/j.cj.2025.03.004QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maizeChuang Wang0Shenshen Wu1Zhou Yao2En Luo3Junli Deng4Jianxiao Liu5National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Hubei Hongshan Laboratory, Wuhan 430070, Hubei, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaNational Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaNational Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Hubei Hongshan Laboratory, Wuhan 430070, Hubei, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaNational Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaCollege of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Corresponding authors.National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Hubei Hongshan Laboratory, Wuhan 430070, Hubei, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Corresponding authors.Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for prioritizing causal genes in maize based on the LightGBM algorithm. QTG-LGBM dynamically adjusts gene weights and sample proportions during training to mitigate the effects of class imbalance. The method prevents overfitting in datasets with small samples by introducing a regularization term. Experimental results on maize traits, including plant height (PH), flowering time (FT), and tassel branch number (TBN), demonstrated that QTG-LGBM outperforms the commonly used methods QTG-Finder, GBDT, XGBoost, BernoulliNB, SVM, CNN, and ensemble learning. We validated the generalization of QTG-LGBM using Arabidopsis, rice, Setaria, and sorghum. We also applied QTG-LGBM using reported QTL that affect traits of maize PH, FT and TBN, and FT in Arabidopsis, rice, and sorghum, as well as known causal genes within the QTL. When examining the top 20% of ranked genes, QTG-LGBM demonstrated a significantly higher recall rate of causal genes compared to random selection methods. We identified key gene features affecting phenotypes through feature importance analysis. QTG-LGBM is available at http://www.deepcba.com/QTG-LGBM.http://www.sciencedirect.com/science/article/pii/S2214514125000686Genetic analysisQuantitative trait lociLightGBMZea mays
spellingShingle Chuang Wang
Shenshen Wu
Zhou Yao
En Luo
Junli Deng
Jianxiao Liu
QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize
Crop Journal
Genetic analysis
Quantitative trait loci
LightGBM
Zea mays
title QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize
title_full QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize
title_fullStr QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize
title_full_unstemmed QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize
title_short QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize
title_sort qtg lgbm a method of prioritizing causal genes in quantitative trait loci in maize
topic Genetic analysis
Quantitative trait loci
LightGBM
Zea mays
url http://www.sciencedirect.com/science/article/pii/S2214514125000686
work_keys_str_mv AT chuangwang qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize
AT shenshenwu qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize
AT zhouyao qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize
AT enluo qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize
AT junlideng qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize
AT jianxiaoliu qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize