A Machine Learning Approach for the Prediction of Thermostable β-Glucosidases

Thermostable β-glucosidases (E.C. 3.2.1.21) are essential enzymes used in second-generation biofuel production. However, little is known about the structural characteristics that lead to their thermostability. In this study, I used graph-based structural signatures to represent three-dimensional str...

Full description

Saved in:
Bibliographic Details
Main Author: Diego Mariano
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/9/4839
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Thermostable β-glucosidases (E.C. 3.2.1.21) are essential enzymes used in second-generation biofuel production. However, little is known about the structural characteristics that lead to their thermostability. In this study, I used graph-based structural signatures to represent three-dimensional structures of β-glucosidase enzymes extracted from thermophilic organisms. I collected 1717 structures from thermophilic (<i>n</i> = 890) and non-thermophilic (<i>n</i> = 827) organisms and divided them into two datasets: training (<i>n</i> = 1134) and test (<i>n</i> = 583). I then used seven machine learning algorithms to classify them. The best model achieved 77.1% accuracy using logistic regression in training with 10-fold cross-validation and 81.6% accuracy in testing using the CatBoost algorithm. I hypothesize that the signature model proposed here can help understand the structural patterns in thermostable enzymes and shed light on the design of more efficient enzymes for biofuel production.
ISSN:2076-3417