HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost
Abstract Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean‐up, and energy production. Generally, halo...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2024-12-01
|
| Series: | mLife |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/mlf2.12125 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850101970841370624 |
|---|---|
| author | Shantong Hu Xiaoyu Wang Zhikang Wang Menghan Jiang Shihui Wang Wenya Wang Jiangning Song Guimin Zhang |
| author_facet | Shantong Hu Xiaoyu Wang Zhikang Wang Menghan Jiang Shihui Wang Wenya Wang Jiangning Song Guimin Zhang |
| author_sort | Shantong Hu |
| collection | DOAJ |
| description | Abstract Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean‐up, and energy production. Generally, halophilic proteins are discovered and characterized through labor‐intensive and time‐consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning‐based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields. |
| format | Article |
| id | doaj-art-bc84685175984b05ab8ae83b3b209888 |
| institution | DOAJ |
| issn | 2770-100X |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Wiley |
| record_format | Article |
| series | mLife |
| spelling | doaj-art-bc84685175984b05ab8ae83b3b2098882025-08-20T02:39:51ZengWileymLife2770-100X2024-12-013451552610.1002/mlf2.12125HPClas: A data‐driven approach for identifying halophilic proteins based on catBoostShantong Hu0Xiaoyu Wang1Zhikang Wang2Menghan Jiang3Shihui Wang4Wenya Wang5Jiangning Song6Guimin Zhang7College of Life Science and Technology Beijing University of Chemical Technology Beijing ChinaMonash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology Monash University Melbourne Victoria AustraliaMonash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology Monash University Melbourne Victoria AustraliaCollege of Life Science and Technology Beijing University of Chemical Technology Beijing ChinaCollege of Life Science and Technology Beijing University of Chemical Technology Beijing ChinaCollege of Life Science and Technology Beijing University of Chemical Technology Beijing ChinaMonash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology Monash University Melbourne Victoria AustraliaCollege of Life Science and Technology Beijing University of Chemical Technology Beijing ChinaAbstract Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean‐up, and energy production. Generally, halophilic proteins are discovered and characterized through labor‐intensive and time‐consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning‐based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields.https://doi.org/10.1002/mlf2.12125feature engineeringhalophilic proteinmachine learning |
| spellingShingle | Shantong Hu Xiaoyu Wang Zhikang Wang Menghan Jiang Shihui Wang Wenya Wang Jiangning Song Guimin Zhang HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost mLife feature engineering halophilic protein machine learning |
| title | HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost |
| title_full | HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost |
| title_fullStr | HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost |
| title_full_unstemmed | HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost |
| title_short | HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost |
| title_sort | hpclas a data driven approach for identifying halophilic proteins based on catboost |
| topic | feature engineering halophilic protein machine learning |
| url | https://doi.org/10.1002/mlf2.12125 |
| work_keys_str_mv | AT shantonghu hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost AT xiaoyuwang hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost AT zhikangwang hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost AT menghanjiang hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost AT shihuiwang hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost AT wenyawang hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost AT jiangningsong hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost AT guiminzhang hpclasadatadrivenapproachforidentifyinghalophilicproteinsbasedoncatboost |