Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation

Novel computational methods for finding transcription factor binding motifs have long been sought due to tedious work of experimentally identifying them. However, the current prevailing methods yield a large number of false positive predictions due to the short, variable nature of transcriptional fa...

Full description

Saved in:
Bibliographic Details
Main Authors: Hui Jia, Jinming Li
Format: Article
Language:English
Published: Wiley 2012-01-01
Series:Journal of Probability and Statistics
Online Access:http://dx.doi.org/10.1155/2012/830575
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849683046411796480
author Hui Jia
Jinming Li
author_facet Hui Jia
Jinming Li
author_sort Hui Jia
collection DOAJ
description Novel computational methods for finding transcription factor binding motifs have long been sought due to tedious work of experimentally identifying them. However, the current prevailing methods yield a large number of false positive predictions due to the short, variable nature of transcriptional factor binding sites (TFBSs). We proposed here a method that combines sequence overrepresentation and cross-species sequence conservation to detect TFBSs in upstream regions of a given set of coregulated genes. We applied the method to 35 S. cerevisiae transcriptional factors with known DNA binding motifs (with the support of orthologous sequences from genomes of S. mikatae, S. bayanus, and S. paradoxus), and the proposed method outperformed the single-genome-based motif finding methods MEME and AlignACE as well as the multiple-genome-based methods PHYME and Footprinter for the majority of these transcriptional factors. Compared with the prevailing motif finding software, our method has some advantages in finding transcriptional factor binding motifs for potential coregulated genes if the gene upstream sequences of multiple closely related species are available. Although we used yeast genomes to assess our method in this study, it might also be applied to other organisms if suitable related species are available and the upstream sequences of coregulated genes can be obtained for the multiple closely related species.
format Article
id doaj-art-d190fe1c3d484acbac8e1832d68a12d7
institution DOAJ
issn 1687-952X
1687-9538
language English
publishDate 2012-01-01
publisher Wiley
record_format Article
series Journal of Probability and Statistics
spelling doaj-art-d190fe1c3d484acbac8e1832d68a12d72025-08-20T03:24:00ZengWileyJournal of Probability and Statistics1687-952X1687-95382012-01-01201210.1155/2012/830575830575Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species ConservationHui Jia0Jinming Li1School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, SingaporeSchool of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, SingaporeNovel computational methods for finding transcription factor binding motifs have long been sought due to tedious work of experimentally identifying them. However, the current prevailing methods yield a large number of false positive predictions due to the short, variable nature of transcriptional factor binding sites (TFBSs). We proposed here a method that combines sequence overrepresentation and cross-species sequence conservation to detect TFBSs in upstream regions of a given set of coregulated genes. We applied the method to 35 S. cerevisiae transcriptional factors with known DNA binding motifs (with the support of orthologous sequences from genomes of S. mikatae, S. bayanus, and S. paradoxus), and the proposed method outperformed the single-genome-based motif finding methods MEME and AlignACE as well as the multiple-genome-based methods PHYME and Footprinter for the majority of these transcriptional factors. Compared with the prevailing motif finding software, our method has some advantages in finding transcriptional factor binding motifs for potential coregulated genes if the gene upstream sequences of multiple closely related species are available. Although we used yeast genomes to assess our method in this study, it might also be applied to other organisms if suitable related species are available and the upstream sequences of coregulated genes can be obtained for the multiple closely related species.http://dx.doi.org/10.1155/2012/830575
spellingShingle Hui Jia
Jinming Li
Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation
Journal of Probability and Statistics
title Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation
title_full Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation
title_fullStr Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation
title_full_unstemmed Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation
title_short Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation
title_sort finding transcription factor binding motifs for coregulated genes by combining sequence overrepresentation with cross species conservation
url http://dx.doi.org/10.1155/2012/830575
work_keys_str_mv AT huijia findingtranscriptionfactorbindingmotifsforcoregulatedgenesbycombiningsequenceoverrepresentationwithcrossspeciesconservation
AT jinmingli findingtranscriptionfactorbindingmotifsforcoregulatedgenesbycombiningsequenceoverrepresentationwithcrossspeciesconservation