HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences

Abstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Mateusz Chiliński, Dariusz Plewczynski
Format: Article
Language:English
Published: BMC 2024-10-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-024-10885-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representation, and then decoded using 2D convolutions into the Hi-C pairwise chromatin spatial proximity matrix. Those methods, while obtaining high correlation scores and improved metrics, produce Hi-C matrices that are artificial - they are blurred due to the deep learning model architecture. In our study, we propose the HiCDiffusion, sequence-only model that addresses this problem. We first train the encoder-decoder neural network and then use it as a component of the diffusion model - where we guide the diffusion using a latent representation of the sequence, as well as the final output from the encoder-decoder. That way, we obtain the high-resolution Hi-C matrices that not only better resemble the experimental results - improving the Fréchet inception distance by an average of 11 times, with the highest improvement of 56 times - but also obtain similar classic metrics to current state-of-the-art encoder-decoder architectures used for the task.
ISSN:1471-2164