In silico gene prioritization by integrating multiple data sources.

Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known...

Full description

Saved in:
Bibliographic Details
Main Authors: Yixuan Chen, Wenhui Wang, Yingyao Zhou, Robert Shields, Sumit K Chanda, Robert C Elston, Jing Li
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0021137&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850224499230769152
author Yixuan Chen
Wenhui Wang
Yingyao Zhou
Robert Shields
Sumit K Chanda
Robert C Elston
Jing Li
author_facet Yixuan Chen
Wenhui Wang
Yingyao Zhou
Robert Shields
Sumit K Chanda
Robert C Elston
Jing Li
author_sort Yixuan Chen
collection DOAJ
description Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies.
format Article
id doaj-art-5c4d69caf8604ed482ed02ae2c8e1bcc
institution OA Journals
issn 1932-6203
language English
publishDate 2011-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-5c4d69caf8604ed482ed02ae2c8e1bcc2025-08-20T02:05:36ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-0166e2113710.1371/journal.pone.0021137In silico gene prioritization by integrating multiple data sources.Yixuan ChenWenhui WangYingyao ZhouRobert ShieldsSumit K ChandaRobert C ElstonJing LiIdentifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0021137&type=printable
spellingShingle Yixuan Chen
Wenhui Wang
Yingyao Zhou
Robert Shields
Sumit K Chanda
Robert C Elston
Jing Li
In silico gene prioritization by integrating multiple data sources.
PLoS ONE
title In silico gene prioritization by integrating multiple data sources.
title_full In silico gene prioritization by integrating multiple data sources.
title_fullStr In silico gene prioritization by integrating multiple data sources.
title_full_unstemmed In silico gene prioritization by integrating multiple data sources.
title_short In silico gene prioritization by integrating multiple data sources.
title_sort in silico gene prioritization by integrating multiple data sources
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0021137&type=printable
work_keys_str_mv AT yixuanchen insilicogeneprioritizationbyintegratingmultipledatasources
AT wenhuiwang insilicogeneprioritizationbyintegratingmultipledatasources
AT yingyaozhou insilicogeneprioritizationbyintegratingmultipledatasources
AT robertshields insilicogeneprioritizationbyintegratingmultipledatasources
AT sumitkchanda insilicogeneprioritizationbyintegratingmultipledatasources
AT robertcelston insilicogeneprioritizationbyintegratingmultipledatasources
AT jingli insilicogeneprioritizationbyintegratingmultipledatasources