Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings
Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037024004446 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841553793446051840 |
---|---|
author | Sotiris Chatzimiltis Michalis Agathocleous Vasilis J. Promponas Chris Christodoulou |
author_facet | Sotiris Chatzimiltis Michalis Agathocleous Vasilis J. Promponas Chris Christodoulou |
author_sort | Sotiris Chatzimiltis |
collection | DOAJ |
description | Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem. In this paper, we deploy a Convolutional Neural Network (CNN) trained with the Subsampled Hessian Newton (SHN) method (a Hessian Free Optimisation variant), with a two- dimensional input representation of embeddings extracted from a language model pretrained with protein sequences. Utilising a CNN trained with the SHN method and the input embeddings, we achieved on average a 79.96% per residue (Q3) accuracy on the CB513 dataset and 81.45% Q3 accuracy on the PISCES dataset (without any post-processing techniques applied). The application of ensembles and filtering techniques to the results of the CNN improved the overall prediction performance. The Q3 accuracy on the CB513 increased to 93.65% and for the PISCES dataset to 87.13%. Moreover, our method was evaluated using the CASP13 dataset where we showed that as the post-processing window size increased, the prediction performance increased as well. In fact, with the biggest post-processing window size (limited by the smallest CASP13 protein), we achieved a Q3 accuracy of 98.12% and a Segment Overlap (SOV) score of 96.98 on the CASP13 dataset when the CNNs were trained with the PISCES dataset. Finally, we showed that input representations from embeddings can perform equally well as representations extracted from multiple sequence alignments. |
format | Article |
id | doaj-art-0b167a8f74f8481d9f471f9b9a1bd774 |
institution | Kabale University |
issn | 2001-0370 |
language | English |
publishDate | 2025-01-01 |
publisher | Elsevier |
record_format | Article |
series | Computational and Structural Biotechnology Journal |
spelling | doaj-art-0b167a8f74f8481d9f471f9b9a1bd7742025-01-09T06:13:46ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-0127243251Post-processing enhances protein secondary structure prediction with second order deep learning and embeddingsSotiris Chatzimiltis0Michalis Agathocleous1Vasilis J. Promponas2Chris Christodoulou3University of Cyprus, Department of Computer Science, Nicosia, Cyprus; 5G/6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, United KingdomUniversity of Cyprus, Department of Computer Science, Nicosia, Cyprus; University of Nicosia, Department of Computer Science, Nicosia, CyprusUniversity of Cyprus, Department of Biological Sciences, Nicosia, Cyprus; Corresponding author.University of Cyprus, Department of Computer Science, Nicosia, CyprusProtein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem. In this paper, we deploy a Convolutional Neural Network (CNN) trained with the Subsampled Hessian Newton (SHN) method (a Hessian Free Optimisation variant), with a two- dimensional input representation of embeddings extracted from a language model pretrained with protein sequences. Utilising a CNN trained with the SHN method and the input embeddings, we achieved on average a 79.96% per residue (Q3) accuracy on the CB513 dataset and 81.45% Q3 accuracy on the PISCES dataset (without any post-processing techniques applied). The application of ensembles and filtering techniques to the results of the CNN improved the overall prediction performance. The Q3 accuracy on the CB513 increased to 93.65% and for the PISCES dataset to 87.13%. Moreover, our method was evaluated using the CASP13 dataset where we showed that as the post-processing window size increased, the prediction performance increased as well. In fact, with the biggest post-processing window size (limited by the smallest CASP13 protein), we achieved a Q3 accuracy of 98.12% and a Segment Overlap (SOV) score of 96.98 on the CASP13 dataset when the CNNs were trained with the PISCES dataset. Finally, we showed that input representations from embeddings can perform equally well as representations extracted from multiple sequence alignments.http://www.sciencedirect.com/science/article/pii/S2001037024004446Convolutional neural networksDeep learningEmbeddingsHessian free optimisationProtein secondary structure prediction |
spellingShingle | Sotiris Chatzimiltis Michalis Agathocleous Vasilis J. Promponas Chris Christodoulou Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings Computational and Structural Biotechnology Journal Convolutional neural networks Deep learning Embeddings Hessian free optimisation Protein secondary structure prediction |
title | Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings |
title_full | Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings |
title_fullStr | Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings |
title_full_unstemmed | Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings |
title_short | Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings |
title_sort | post processing enhances protein secondary structure prediction with second order deep learning and embeddings |
topic | Convolutional neural networks Deep learning Embeddings Hessian free optimisation Protein secondary structure prediction |
url | http://www.sciencedirect.com/science/article/pii/S2001037024004446 |
work_keys_str_mv | AT sotirischatzimiltis postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings AT michalisagathocleous postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings AT vasilisjpromponas postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings AT chrischristodoulou postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings |