MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model

Abstract A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Sadettin Y. Ugurlu, David McDonald, Shan He
Format: Article
Language:English
Published: BMC 2024-10-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-024-00882-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850202376650096640
author Sadettin Y. Ugurlu
David McDonald
Shan He
author_facet Sadettin Y. Ugurlu
David McDonald
Shan He
author_sort Sadettin Y. Ugurlu
collection DOAJ
description Abstract A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms. Allosteric sites are increasingly found in different protein families through various techniques, such as machine learning applications, which opens up possibilities for creating completely novel medications with a diverse variety of chemical structures. Machine learning methods, such as PASSer, exhibit limited efficacy in accurately finding allosteric binding sites when relying solely on 3D structural information. Scientific Contribution Prior to conducting feature selection for allosteric binding site identification, integration of supporting amino-acid–based information to 3D structural knowledge is advantageous. This approach can enhance performance by ensuring accuracy and robustness. Therefore, we have developed an accurate and robust model called Multimodel Ensemble Feature Selection for Allosteric Site Identification (MEF-AlloSite) after collecting 9460 relevant and diverse features from the literature to characterise pockets. The model employs an accurate and robust multimodal feature selection technique for the small training set size of only 90 proteins to improve predictive performance. This state-of-the-art technique increased the performance in allosteric binding site identification by selecting promising features from 9460 features. Also, the relationship between selected features and allosteric binding sites enlightened the understanding of complex allostery for proteins by analysing selected features. MEF-AlloSite and state-of-the-art allosteric site identification methods such as PASSer2.0 and PASSerRank have been tested on three test cases 51 times with a different split of the training set. The Student’s t test and Cohen’s D value have been used to evaluate the average precision and ROC AUC score distribution. On three test cases, most of the p-values ( $$< 0.05$$ < 0.05 ) and the majority of Cohen’s D values ( $$> 0.5$$ > 0.5 ) showed that MEF-AlloSite’s 1–6% higher mean of average precision and ROC AUC than state-of-the-art allosteric site identification methods are statistically significant.
format Article
id doaj-art-3a8037ccea0f4fe8b224105974c350cd
institution OA Journals
issn 1758-2946
language English
publishDate 2024-10-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-3a8037ccea0f4fe8b224105974c350cd2025-08-20T02:11:47ZengBMCJournal of Cheminformatics1758-29462024-10-0116112910.1186/s13321-024-00882-5MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification modelSadettin Y. Ugurlu0David McDonald1Shan He2School of Computer Science, University of BirminghamAIA Insights LtdSchool of Computer Science, University of BirminghamAbstract A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms. Allosteric sites are increasingly found in different protein families through various techniques, such as machine learning applications, which opens up possibilities for creating completely novel medications with a diverse variety of chemical structures. Machine learning methods, such as PASSer, exhibit limited efficacy in accurately finding allosteric binding sites when relying solely on 3D structural information. Scientific Contribution Prior to conducting feature selection for allosteric binding site identification, integration of supporting amino-acid–based information to 3D structural knowledge is advantageous. This approach can enhance performance by ensuring accuracy and robustness. Therefore, we have developed an accurate and robust model called Multimodel Ensemble Feature Selection for Allosteric Site Identification (MEF-AlloSite) after collecting 9460 relevant and diverse features from the literature to characterise pockets. The model employs an accurate and robust multimodal feature selection technique for the small training set size of only 90 proteins to improve predictive performance. This state-of-the-art technique increased the performance in allosteric binding site identification by selecting promising features from 9460 features. Also, the relationship between selected features and allosteric binding sites enlightened the understanding of complex allostery for proteins by analysing selected features. MEF-AlloSite and state-of-the-art allosteric site identification methods such as PASSer2.0 and PASSerRank have been tested on three test cases 51 times with a different split of the training set. The Student’s t test and Cohen’s D value have been used to evaluate the average precision and ROC AUC score distribution. On three test cases, most of the p-values ( $$< 0.05$$ < 0.05 ) and the majority of Cohen’s D values ( $$> 0.5$$ > 0.5 ) showed that MEF-AlloSite’s 1–6% higher mean of average precision and ROC AUC than state-of-the-art allosteric site identification methods are statistically significant.https://doi.org/10.1186/s13321-024-00882-5Allosteric binding siteAllosteryBinding siteMultimodel Ensemble Feature selection
spellingShingle Sadettin Y. Ugurlu
David McDonald
Shan He
MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model
Journal of Cheminformatics
Allosteric binding site
Allostery
Binding site
Multimodel Ensemble Feature selection
title MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model
title_full MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model
title_fullStr MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model
title_full_unstemmed MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model
title_short MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model
title_sort mef allosite an accurate and robust multimodel ensemble feature selection for the allosteric site identification model
topic Allosteric binding site
Allostery
Binding site
Multimodel Ensemble Feature selection
url https://doi.org/10.1186/s13321-024-00882-5
work_keys_str_mv AT sadettinyugurlu mefallositeanaccurateandrobustmultimodelensemblefeatureselectionfortheallostericsiteidentificationmodel
AT davidmcdonald mefallositeanaccurateandrobustmultimodelensemblefeatureselectionfortheallostericsiteidentificationmodel
AT shanhe mefallositeanaccurateandrobustmultimodelensemblefeatureselectionfortheallostericsiteidentificationmodel