Interpretable protein-DNA interactions captured by structure-sequence optimization

Sequence-specific DNA recognition underlies essential processes in gene regulation, yet methods for simultaneous predictions of genomic DNA recognition sites and their binding affinity remain lacking. Here, we present the Interpretable protein-DNA Energy Associative (IDEA) model, a residue-level, in...

Full description

Saved in:
Bibliographic Details
Main Authors: Yafan Zhang, Irene Silvernail, Zhuyang Lin, Xingcheng Lin
Format: Article
Language:English
Published: eLife Sciences Publications Ltd 2025-07-01
Series:eLife
Subjects:
Online Access:https://elifesciences.org/articles/105565
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849719738019610624
author Yafan Zhang
Irene Silvernail
Zhuyang Lin
Xingcheng Lin
author_facet Yafan Zhang
Irene Silvernail
Zhuyang Lin
Xingcheng Lin
author_sort Yafan Zhang
collection DOAJ
description Sequence-specific DNA recognition underlies essential processes in gene regulation, yet methods for simultaneous predictions of genomic DNA recognition sites and their binding affinity remain lacking. Here, we present the Interpretable protein-DNA Energy Associative (IDEA) model, a residue-level, interpretable biophysical model capable of predicting binding sites and affinities of DNA-binding proteins. By fusing structures and sequences of known protein-DNA complexes into an optimized energy model, IDEA enables direct interpretation of physicochemical interactions among individual amino acids and nucleotides. We demonstrate that this energy model can accurately predict DNA recognition sites and their binding strengths across various protein families. Additionally, the IDEA model is integrated into a coarse-grained simulation framework that quantitatively captures the absolute protein-DNA binding free energies. Overall, IDEA provides an integrated computational platform that alleviates experimental costs and biases in assessing DNA recognition and can be utilized for mechanistic studies of various DNA-recognition processes.
format Article
id doaj-art-1086ffac8a0d4af4a10b03bd4ed84eaa
institution DOAJ
issn 2050-084X
language English
publishDate 2025-07-01
publisher eLife Sciences Publications Ltd
record_format Article
series eLife
spelling doaj-art-1086ffac8a0d4af4a10b03bd4ed84eaa2025-08-20T03:12:05ZengeLife Sciences Publications LtdeLife2050-084X2025-07-011410.7554/eLife.105565Interpretable protein-DNA interactions captured by structure-sequence optimizationYafan Zhang0https://orcid.org/0000-0002-7867-2873Irene Silvernail1https://orcid.org/0009-0003-3070-974XZhuyang Lin2https://orcid.org/0009-0009-0480-7024Xingcheng Lin3https://orcid.org/0000-0002-9378-6174Bioinformatics Research Center, North Carolina State University, Raleigh, United StatesDepartment of Physics, North Carolina State University, Raleigh, United StatesBioinformatics Research Center, North Carolina State University, Raleigh, United StatesBioinformatics Research Center, North Carolina State University, Raleigh, United States; Department of Physics, North Carolina State University, Raleigh, United StatesSequence-specific DNA recognition underlies essential processes in gene regulation, yet methods for simultaneous predictions of genomic DNA recognition sites and their binding affinity remain lacking. Here, we present the Interpretable protein-DNA Energy Associative (IDEA) model, a residue-level, interpretable biophysical model capable of predicting binding sites and affinities of DNA-binding proteins. By fusing structures and sequences of known protein-DNA complexes into an optimized energy model, IDEA enables direct interpretation of physicochemical interactions among individual amino acids and nucleotides. We demonstrate that this energy model can accurately predict DNA recognition sites and their binding strengths across various protein families. Additionally, the IDEA model is integrated into a coarse-grained simulation framework that quantitatively captures the absolute protein-DNA binding free energies. Overall, IDEA provides an integrated computational platform that alleviates experimental costs and biases in assessing DNA recognition and can be utilized for mechanistic studies of various DNA-recognition processes.https://elifesciences.org/articles/105565data-driven modelingstructure-sequence integrationprotein-DNA binding affinity predictiongenomic binding sites predictionssequence-specific simulation
spellingShingle Yafan Zhang
Irene Silvernail
Zhuyang Lin
Xingcheng Lin
Interpretable protein-DNA interactions captured by structure-sequence optimization
eLife
data-driven modeling
structure-sequence integration
protein-DNA binding affinity prediction
genomic binding sites predictions
sequence-specific simulation
title Interpretable protein-DNA interactions captured by structure-sequence optimization
title_full Interpretable protein-DNA interactions captured by structure-sequence optimization
title_fullStr Interpretable protein-DNA interactions captured by structure-sequence optimization
title_full_unstemmed Interpretable protein-DNA interactions captured by structure-sequence optimization
title_short Interpretable protein-DNA interactions captured by structure-sequence optimization
title_sort interpretable protein dna interactions captured by structure sequence optimization
topic data-driven modeling
structure-sequence integration
protein-DNA binding affinity prediction
genomic binding sites predictions
sequence-specific simulation
url https://elifesciences.org/articles/105565
work_keys_str_mv AT yafanzhang interpretableproteindnainteractionscapturedbystructuresequenceoptimization
AT irenesilvernail interpretableproteindnainteractionscapturedbystructuresequenceoptimization
AT zhuyanglin interpretableproteindnainteractionscapturedbystructuresequenceoptimization
AT xingchenglin interpretableproteindnainteractionscapturedbystructuresequenceoptimization