HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences

Abstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Mateusz Chiliński, Dariusz Plewczynski
Format: Article
Language:English
Published: BMC 2024-10-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-024-10885-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850182323201376256
author Mateusz Chiliński
Dariusz Plewczynski
author_facet Mateusz Chiliński
Dariusz Plewczynski
author_sort Mateusz Chiliński
collection DOAJ
description Abstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representation, and then decoded using 2D convolutions into the Hi-C pairwise chromatin spatial proximity matrix. Those methods, while obtaining high correlation scores and improved metrics, produce Hi-C matrices that are artificial - they are blurred due to the deep learning model architecture. In our study, we propose the HiCDiffusion, sequence-only model that addresses this problem. We first train the encoder-decoder neural network and then use it as a component of the diffusion model - where we guide the diffusion using a latent representation of the sequence, as well as the final output from the encoder-decoder. That way, we obtain the high-resolution Hi-C matrices that not only better resemble the experimental results - improving the Fréchet inception distance by an average of 11 times, with the highest improvement of 56 times - but also obtain similar classic metrics to current state-of-the-art encoder-decoder architectures used for the task.
format Article
id doaj-art-c7b7196adf184ba6a41bcce9375e5cb0
institution OA Journals
issn 1471-2164
language English
publishDate 2024-10-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj-art-c7b7196adf184ba6a41bcce9375e5cb02025-08-20T02:17:39ZengBMCBMC Genomics1471-21642024-10-0125111210.1186/s12864-024-10885-zHiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequencesMateusz Chiliński0Dariusz Plewczynski1Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of TechnologyLaboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of TechnologyAbstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representation, and then decoded using 2D convolutions into the Hi-C pairwise chromatin spatial proximity matrix. Those methods, while obtaining high correlation scores and improved metrics, produce Hi-C matrices that are artificial - they are blurred due to the deep learning model architecture. In our study, we propose the HiCDiffusion, sequence-only model that addresses this problem. We first train the encoder-decoder neural network and then use it as a component of the diffusion model - where we guide the diffusion using a latent representation of the sequence, as well as the final output from the encoder-decoder. That way, we obtain the high-resolution Hi-C matrices that not only better resemble the experimental results - improving the Fréchet inception distance by an average of 11 times, with the highest improvement of 56 times - but also obtain similar classic metrics to current state-of-the-art encoder-decoder architectures used for the task.https://doi.org/10.1186/s12864-024-10885-z3D genomicsHi-CMachine learningArtificial intelligence
spellingShingle Mateusz Chiliński
Dariusz Plewczynski
HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences
BMC Genomics
3D genomics
Hi-C
Machine learning
Artificial intelligence
title HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences
title_full HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences
title_fullStr HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences
title_full_unstemmed HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences
title_short HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences
title_sort hicdiffusion diffusion enhanced transformer based prediction of chromatin interactions from dna sequences
topic 3D genomics
Hi-C
Machine learning
Artificial intelligence
url https://doi.org/10.1186/s12864-024-10885-z
work_keys_str_mv AT mateuszchilinski hicdiffusiondiffusionenhancedtransformerbasedpredictionofchromatininteractionsfromdnasequences
AT dariuszplewczynski hicdiffusiondiffusionenhancedtransformerbasedpredictionofchromatininteractionsfromdnasequences