Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.

The study investigated regions with undefined structures, known as "missing" segments in X-ray crystallography and cryo-electron microscopy (Cryo-EM) data, by assessing their predicted structural confidence and disorder scores. Utilizing a comprehensive dataset from the Protein Data Bank (...

Full description

Saved in:
Bibliographic Details
Main Author: Sen Zheng
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0313812
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850261465463783424
author Sen Zheng
author_facet Sen Zheng
author_sort Sen Zheng
collection DOAJ
description The study investigated regions with undefined structures, known as "missing" segments in X-ray crystallography and cryo-electron microscopy (Cryo-EM) data, by assessing their predicted structural confidence and disorder scores. Utilizing a comprehensive dataset from the Protein Data Bank (PDB), residues were categorized as "modeled", "hard missing" and "soft missing" based on their visibility in structural datasets. Key features were determined, including a confidence score predicted local distance difference test (pLDDT) from AlphaFold2, an advanced structural prediction tool, and a disorder score from IUPred, a traditional disorder prediction method. To enhance prediction performance for unstructured residues, we employed a Long Short-Term Memory (LSTM) model, integrating both scores with amino acid sequences. Notable patterns such as composition, region lengths and prediction scores were observed in unstructured residues and regions identified through structural experiments over our studied period. Our findings also indicate that "hard missing" residues often align with low confidence scores, whereas "soft missing" residues exhibit dynamic behavior that can complicate predictions. The incorporation of pLDDT, IUPred scores, and sequence data into the LSTM model has improved the differentiation between structured and unstructured residues, particularly for shorter unstructured regions. This research elucidates the relationship between established computational predictions and experimental structural data, enhancing our ability to target structurally significant areas for research and guiding experimental designs toward functionally relevant regions.
format Article
id doaj-art-3366300955fb453eb2014ffe8b32a250
institution OA Journals
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-3366300955fb453eb2014ffe8b32a2502025-08-20T01:55:23ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01203e031381210.1371/journal.pone.0313812Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.Sen ZhengThe study investigated regions with undefined structures, known as "missing" segments in X-ray crystallography and cryo-electron microscopy (Cryo-EM) data, by assessing their predicted structural confidence and disorder scores. Utilizing a comprehensive dataset from the Protein Data Bank (PDB), residues were categorized as "modeled", "hard missing" and "soft missing" based on their visibility in structural datasets. Key features were determined, including a confidence score predicted local distance difference test (pLDDT) from AlphaFold2, an advanced structural prediction tool, and a disorder score from IUPred, a traditional disorder prediction method. To enhance prediction performance for unstructured residues, we employed a Long Short-Term Memory (LSTM) model, integrating both scores with amino acid sequences. Notable patterns such as composition, region lengths and prediction scores were observed in unstructured residues and regions identified through structural experiments over our studied period. Our findings also indicate that "hard missing" residues often align with low confidence scores, whereas "soft missing" residues exhibit dynamic behavior that can complicate predictions. The incorporation of pLDDT, IUPred scores, and sequence data into the LSTM model has improved the differentiation between structured and unstructured residues, particularly for shorter unstructured regions. This research elucidates the relationship between established computational predictions and experimental structural data, enhancing our ability to target structurally significant areas for research and guiding experimental designs toward functionally relevant regions.https://doi.org/10.1371/journal.pone.0313812
spellingShingle Sen Zheng
Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.
PLoS ONE
title Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.
title_full Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.
title_fullStr Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.
title_full_unstemmed Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.
title_short Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.
title_sort navigating the unstructured by evaluating alphafold s efficacy in predicting missing residues and structural disorder in proteins
url https://doi.org/10.1371/journal.pone.0313812
work_keys_str_mv AT senzheng navigatingtheunstructuredbyevaluatingalphafoldsefficacyinpredictingmissingresiduesandstructuraldisorderinproteins