Effective Gene Expression Prediction and Optimization from Protein Sequences
Abstract High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel conne...
Saved in:
| Main Authors: | , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2025-02-01
|
| Series: | Advanced Science |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/advs.202407664 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849727293089382400 |
|---|---|
| author | Tuoyu Liu Yiyang Zhang Yanjun Li Guoshun Xu Han Gao Pengtao Wang Tao Tu Huiying Luo Ningfeng Wu Bin Yao Bo Liu Feifei Guan Huoqing Huang Jian Tian |
| author_facet | Tuoyu Liu Yiyang Zhang Yanjun Li Guoshun Xu Han Gao Pengtao Wang Tao Tu Huiying Luo Ningfeng Wu Bin Yao Bo Liu Feifei Guan Huoqing Huang Jian Tian |
| author_sort | Tuoyu Liu |
| collection | DOAJ |
| description | Abstract High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre‐trained protein model MP‐TRANS (MindSpore Protein Transformer) is developed and fine‐tuned using transfer learning techniques to create 88 prediction models (MPB‐EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB‐MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high‐level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences. |
| format | Article |
| id | doaj-art-16ee03eeeef94a2f81bb441001ac4af9 |
| institution | DOAJ |
| issn | 2198-3844 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | Wiley |
| record_format | Article |
| series | Advanced Science |
| spelling | doaj-art-16ee03eeeef94a2f81bb441001ac4af92025-08-20T03:09:54ZengWileyAdvanced Science2198-38442025-02-01128n/an/a10.1002/advs.202407664Effective Gene Expression Prediction and Optimization from Protein SequencesTuoyu Liu0Yiyang Zhang1Yanjun Li2Guoshun Xu3Han Gao4Pengtao Wang5Tao Tu6Huiying Luo7Ningfeng Wu8Bin Yao9Bo Liu10Feifei Guan11Huoqing Huang12Jian Tian13State Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaNational Key Laboratory of Agricultural Microbiology Biotechnology Research Institute Chinese Academy of Agricultural Sciences Beijing 100081 ChinaNational Key Laboratory of Agricultural Microbiology Biotechnology Research Institute Chinese Academy of Agricultural Sciences Beijing 100081 ChinaState Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaState Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaNational Key Laboratory of Agricultural Microbiology Biotechnology Research Institute Chinese Academy of Agricultural Sciences Beijing 100081 ChinaState Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaState Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaNational Key Laboratory of Agricultural Microbiology Biotechnology Research Institute Chinese Academy of Agricultural Sciences Beijing 100081 ChinaState Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaNational Key Laboratory of Agricultural Microbiology Biotechnology Research Institute Chinese Academy of Agricultural Sciences Beijing 100081 ChinaNational Key Laboratory of Agricultural Microbiology Biotechnology Research Institute Chinese Academy of Agricultural Sciences Beijing 100081 ChinaState Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaState Key Laboratory of Animal Nutrition and Feeding Institute of Animal Sciences Chinese Academy of Agricultural Sciences Beijing 100193 ChinaAbstract High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre‐trained protein model MP‐TRANS (MindSpore Protein Transformer) is developed and fine‐tuned using transfer learning techniques to create 88 prediction models (MPB‐EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB‐MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high‐level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences.https://doi.org/10.1002/advs.202407664amino acid expression indexmutant generationpredicting protein expressionsoluble expressiontransfer learning |
| spellingShingle | Tuoyu Liu Yiyang Zhang Yanjun Li Guoshun Xu Han Gao Pengtao Wang Tao Tu Huiying Luo Ningfeng Wu Bin Yao Bo Liu Feifei Guan Huoqing Huang Jian Tian Effective Gene Expression Prediction and Optimization from Protein Sequences Advanced Science amino acid expression index mutant generation predicting protein expression soluble expression transfer learning |
| title | Effective Gene Expression Prediction and Optimization from Protein Sequences |
| title_full | Effective Gene Expression Prediction and Optimization from Protein Sequences |
| title_fullStr | Effective Gene Expression Prediction and Optimization from Protein Sequences |
| title_full_unstemmed | Effective Gene Expression Prediction and Optimization from Protein Sequences |
| title_short | Effective Gene Expression Prediction and Optimization from Protein Sequences |
| title_sort | effective gene expression prediction and optimization from protein sequences |
| topic | amino acid expression index mutant generation predicting protein expression soluble expression transfer learning |
| url | https://doi.org/10.1002/advs.202407664 |
| work_keys_str_mv | AT tuoyuliu effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT yiyangzhang effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT yanjunli effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT guoshunxu effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT hangao effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT pengtaowang effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT taotu effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT huiyingluo effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT ningfengwu effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT binyao effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT boliu effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT feifeiguan effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT huoqinghuang effectivegeneexpressionpredictionandoptimizationfromproteinsequences AT jiantian effectivegeneexpressionpredictionandoptimizationfromproteinsequences |