Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings

Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in...

Full description

Saved in:
Bibliographic Details
Main Authors: Sotiris Chatzimiltis, Michalis Agathocleous, Vasilis J. Promponas, Chris Christodoulou
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037024004446
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841553793446051840
author Sotiris Chatzimiltis
Michalis Agathocleous
Vasilis J. Promponas
Chris Christodoulou
author_facet Sotiris Chatzimiltis
Michalis Agathocleous
Vasilis J. Promponas
Chris Christodoulou
author_sort Sotiris Chatzimiltis
collection DOAJ
description Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem. In this paper, we deploy a Convolutional Neural Network (CNN) trained with the Subsampled Hessian Newton (SHN) method (a Hessian Free Optimisation variant), with a two- dimensional input representation of embeddings extracted from a language model pretrained with protein sequences. Utilising a CNN trained with the SHN method and the input embeddings, we achieved on average a 79.96% per residue (Q3) accuracy on the CB513 dataset and 81.45% Q3 accuracy on the PISCES dataset (without any post-processing techniques applied). The application of ensembles and filtering techniques to the results of the CNN improved the overall prediction performance. The Q3 accuracy on the CB513 increased to 93.65% and for the PISCES dataset to 87.13%. Moreover, our method was evaluated using the CASP13 dataset where we showed that as the post-processing window size increased, the prediction performance increased as well. In fact, with the biggest post-processing window size (limited by the smallest CASP13 protein), we achieved a Q3 accuracy of 98.12% and a Segment Overlap (SOV) score of 96.98 on the CASP13 dataset when the CNNs were trained with the PISCES dataset. Finally, we showed that input representations from embeddings can perform equally well as representations extracted from multiple sequence alignments.
format Article
id doaj-art-0b167a8f74f8481d9f471f9b9a1bd774
institution Kabale University
issn 2001-0370
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj-art-0b167a8f74f8481d9f471f9b9a1bd7742025-01-09T06:13:46ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-0127243251Post-processing enhances protein secondary structure prediction with second order deep learning and embeddingsSotiris Chatzimiltis0Michalis Agathocleous1Vasilis J. Promponas2Chris Christodoulou3University of Cyprus, Department of Computer Science, Nicosia, Cyprus; 5G/6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, United KingdomUniversity of Cyprus, Department of Computer Science, Nicosia, Cyprus; University of Nicosia, Department of Computer Science, Nicosia, CyprusUniversity of Cyprus, Department of Biological Sciences, Nicosia, Cyprus; Corresponding author.University of Cyprus, Department of Computer Science, Nicosia, CyprusProtein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem. In this paper, we deploy a Convolutional Neural Network (CNN) trained with the Subsampled Hessian Newton (SHN) method (a Hessian Free Optimisation variant), with a two- dimensional input representation of embeddings extracted from a language model pretrained with protein sequences. Utilising a CNN trained with the SHN method and the input embeddings, we achieved on average a 79.96% per residue (Q3) accuracy on the CB513 dataset and 81.45% Q3 accuracy on the PISCES dataset (without any post-processing techniques applied). The application of ensembles and filtering techniques to the results of the CNN improved the overall prediction performance. The Q3 accuracy on the CB513 increased to 93.65% and for the PISCES dataset to 87.13%. Moreover, our method was evaluated using the CASP13 dataset where we showed that as the post-processing window size increased, the prediction performance increased as well. In fact, with the biggest post-processing window size (limited by the smallest CASP13 protein), we achieved a Q3 accuracy of 98.12% and a Segment Overlap (SOV) score of 96.98 on the CASP13 dataset when the CNNs were trained with the PISCES dataset. Finally, we showed that input representations from embeddings can perform equally well as representations extracted from multiple sequence alignments.http://www.sciencedirect.com/science/article/pii/S2001037024004446Convolutional neural networksDeep learningEmbeddingsHessian free optimisationProtein secondary structure prediction
spellingShingle Sotiris Chatzimiltis
Michalis Agathocleous
Vasilis J. Promponas
Chris Christodoulou
Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings
Computational and Structural Biotechnology Journal
Convolutional neural networks
Deep learning
Embeddings
Hessian free optimisation
Protein secondary structure prediction
title Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings
title_full Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings
title_fullStr Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings
title_full_unstemmed Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings
title_short Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings
title_sort post processing enhances protein secondary structure prediction with second order deep learning and embeddings
topic Convolutional neural networks
Deep learning
Embeddings
Hessian free optimisation
Protein secondary structure prediction
url http://www.sciencedirect.com/science/article/pii/S2001037024004446
work_keys_str_mv AT sotirischatzimiltis postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings
AT michalisagathocleous postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings
AT vasilisjpromponas postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings
AT chrischristodoulou postprocessingenhancesproteinsecondarystructurepredictionwithsecondorderdeeplearningandembeddings