HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences
Abstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representatio...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2024-10-01
|
| Series: | BMC Genomics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12864-024-10885-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850182323201376256 |
|---|---|
| author | Mateusz Chiliński Dariusz Plewczynski |
| author_facet | Mateusz Chiliński Dariusz Plewczynski |
| author_sort | Mateusz Chiliński |
| collection | DOAJ |
| description | Abstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representation, and then decoded using 2D convolutions into the Hi-C pairwise chromatin spatial proximity matrix. Those methods, while obtaining high correlation scores and improved metrics, produce Hi-C matrices that are artificial - they are blurred due to the deep learning model architecture. In our study, we propose the HiCDiffusion, sequence-only model that addresses this problem. We first train the encoder-decoder neural network and then use it as a component of the diffusion model - where we guide the diffusion using a latent representation of the sequence, as well as the final output from the encoder-decoder. That way, we obtain the high-resolution Hi-C matrices that not only better resemble the experimental results - improving the Fréchet inception distance by an average of 11 times, with the highest improvement of 56 times - but also obtain similar classic metrics to current state-of-the-art encoder-decoder architectures used for the task. |
| format | Article |
| id | doaj-art-c7b7196adf184ba6a41bcce9375e5cb0 |
| institution | OA Journals |
| issn | 1471-2164 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Genomics |
| spelling | doaj-art-c7b7196adf184ba6a41bcce9375e5cb02025-08-20T02:17:39ZengBMCBMC Genomics1471-21642024-10-0125111210.1186/s12864-024-10885-zHiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequencesMateusz Chiliński0Dariusz Plewczynski1Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of TechnologyLaboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of TechnologyAbstract Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representation, and then decoded using 2D convolutions into the Hi-C pairwise chromatin spatial proximity matrix. Those methods, while obtaining high correlation scores and improved metrics, produce Hi-C matrices that are artificial - they are blurred due to the deep learning model architecture. In our study, we propose the HiCDiffusion, sequence-only model that addresses this problem. We first train the encoder-decoder neural network and then use it as a component of the diffusion model - where we guide the diffusion using a latent representation of the sequence, as well as the final output from the encoder-decoder. That way, we obtain the high-resolution Hi-C matrices that not only better resemble the experimental results - improving the Fréchet inception distance by an average of 11 times, with the highest improvement of 56 times - but also obtain similar classic metrics to current state-of-the-art encoder-decoder architectures used for the task.https://doi.org/10.1186/s12864-024-10885-z3D genomicsHi-CMachine learningArtificial intelligence |
| spellingShingle | Mateusz Chiliński Dariusz Plewczynski HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences BMC Genomics 3D genomics Hi-C Machine learning Artificial intelligence |
| title | HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences |
| title_full | HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences |
| title_fullStr | HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences |
| title_full_unstemmed | HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences |
| title_short | HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences |
| title_sort | hicdiffusion diffusion enhanced transformer based prediction of chromatin interactions from dna sequences |
| topic | 3D genomics Hi-C Machine learning Artificial intelligence |
| url | https://doi.org/10.1186/s12864-024-10885-z |
| work_keys_str_mv | AT mateuszchilinski hicdiffusiondiffusionenhancedtransformerbasedpredictionofchromatininteractionsfromdnasequences AT dariuszplewczynski hicdiffusiondiffusionenhancedtransformerbasedpredictionofchromatininteractionsfromdnasequences |