QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize
Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co., Ltd.
2025-06-01
|
| Series: | Crop Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2214514125000686 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850207384883953664 |
|---|---|
| author | Chuang Wang Shenshen Wu Zhou Yao En Luo Junli Deng Jianxiao Liu |
| author_facet | Chuang Wang Shenshen Wu Zhou Yao En Luo Junli Deng Jianxiao Liu |
| author_sort | Chuang Wang |
| collection | DOAJ |
| description | Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for prioritizing causal genes in maize based on the LightGBM algorithm. QTG-LGBM dynamically adjusts gene weights and sample proportions during training to mitigate the effects of class imbalance. The method prevents overfitting in datasets with small samples by introducing a regularization term. Experimental results on maize traits, including plant height (PH), flowering time (FT), and tassel branch number (TBN), demonstrated that QTG-LGBM outperforms the commonly used methods QTG-Finder, GBDT, XGBoost, BernoulliNB, SVM, CNN, and ensemble learning. We validated the generalization of QTG-LGBM using Arabidopsis, rice, Setaria, and sorghum. We also applied QTG-LGBM using reported QTL that affect traits of maize PH, FT and TBN, and FT in Arabidopsis, rice, and sorghum, as well as known causal genes within the QTL. When examining the top 20% of ranked genes, QTG-LGBM demonstrated a significantly higher recall rate of causal genes compared to random selection methods. We identified key gene features affecting phenotypes through feature importance analysis. QTG-LGBM is available at http://www.deepcba.com/QTG-LGBM. |
| format | Article |
| id | doaj-art-83296645ca6c42b1a386ff3e79d2f69d |
| institution | OA Journals |
| issn | 2214-5141 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | KeAi Communications Co., Ltd. |
| record_format | Article |
| series | Crop Journal |
| spelling | doaj-art-83296645ca6c42b1a386ff3e79d2f69d2025-08-20T02:10:32ZengKeAi Communications Co., Ltd.Crop Journal2214-51412025-06-0113387388610.1016/j.cj.2025.03.004QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maizeChuang Wang0Shenshen Wu1Zhou Yao2En Luo3Junli Deng4Jianxiao Liu5National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Hubei Hongshan Laboratory, Wuhan 430070, Hubei, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaNational Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaNational Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Hubei Hongshan Laboratory, Wuhan 430070, Hubei, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaNational Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, ChinaCollege of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Corresponding authors.National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Hubei Hongshan Laboratory, Wuhan 430070, Hubei, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China; Corresponding authors.Efficient and accurate identification of candidate causal genes within quantitative trait loci (QTL) is a significant challenge in genetic research. Conventional linkage analysis methods often require substantial time and resources to identify causal genes. This paper proposes a QTG-LGBM method for prioritizing causal genes in maize based on the LightGBM algorithm. QTG-LGBM dynamically adjusts gene weights and sample proportions during training to mitigate the effects of class imbalance. The method prevents overfitting in datasets with small samples by introducing a regularization term. Experimental results on maize traits, including plant height (PH), flowering time (FT), and tassel branch number (TBN), demonstrated that QTG-LGBM outperforms the commonly used methods QTG-Finder, GBDT, XGBoost, BernoulliNB, SVM, CNN, and ensemble learning. We validated the generalization of QTG-LGBM using Arabidopsis, rice, Setaria, and sorghum. We also applied QTG-LGBM using reported QTL that affect traits of maize PH, FT and TBN, and FT in Arabidopsis, rice, and sorghum, as well as known causal genes within the QTL. When examining the top 20% of ranked genes, QTG-LGBM demonstrated a significantly higher recall rate of causal genes compared to random selection methods. We identified key gene features affecting phenotypes through feature importance analysis. QTG-LGBM is available at http://www.deepcba.com/QTG-LGBM.http://www.sciencedirect.com/science/article/pii/S2214514125000686Genetic analysisQuantitative trait lociLightGBMZea mays |
| spellingShingle | Chuang Wang Shenshen Wu Zhou Yao En Luo Junli Deng Jianxiao Liu QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize Crop Journal Genetic analysis Quantitative trait loci LightGBM Zea mays |
| title | QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize |
| title_full | QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize |
| title_fullStr | QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize |
| title_full_unstemmed | QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize |
| title_short | QTG-LGBM: A method of prioritizing causal genes in quantitative trait loci in maize |
| title_sort | qtg lgbm a method of prioritizing causal genes in quantitative trait loci in maize |
| topic | Genetic analysis Quantitative trait loci LightGBM Zea mays |
| url | http://www.sciencedirect.com/science/article/pii/S2214514125000686 |
| work_keys_str_mv | AT chuangwang qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize AT shenshenwu qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize AT zhouyao qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize AT enluo qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize AT junlideng qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize AT jianxiaoliu qtglgbmamethodofprioritizingcausalgenesinquantitativetraitlociinmaize |