Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.

Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence c...

Full description

Saved in:
Bibliographic Details
Main Authors: Ramzan Kh Umarov, Victor V Solovyev
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0171410&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850079165507698688
author Ramzan Kh Umarov
Victor V Solovyev
author_facet Ramzan Kh Umarov
Victor V Solovyev
author_sort Ramzan Kh Umarov
collection DOAJ
description Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.
format Article
id doaj-art-851c70b4513544b5852dbfefec85eaba
institution DOAJ
issn 1932-6203
language English
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-851c70b4513544b5852dbfefec85eaba2025-08-20T02:45:19ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01122e017141010.1371/journal.pone.0171410Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.Ramzan Kh UmarovVictor V SolovyevAccurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0171410&type=printable
spellingShingle Ramzan Kh Umarov
Victor V Solovyev
Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.
PLoS ONE
title Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.
title_full Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.
title_fullStr Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.
title_full_unstemmed Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.
title_short Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.
title_sort recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0171410&type=printable
work_keys_str_mv AT ramzankhumarov recognitionofprokaryoticandeukaryoticpromotersusingconvolutionaldeeplearningneuralnetworks
AT victorvsolovyev recognitionofprokaryoticandeukaryoticpromotersusingconvolutionaldeeplearningneuralnetworks