Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods.
Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefit...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2010-09-01
|
| Series: | PLoS ONE |
| Online Access: | https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0012693&type=printable |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849695044390354944 |
|---|---|
| author | Brooke L Fridley Gregory D Jenkins Joanna M Biernacka |
| author_facet | Brooke L Fridley Gregory D Jenkins Joanna M Biernacka |
| author_sort | Brooke L Fridley |
| collection | DOAJ |
| description | Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-to-event phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits. |
| format | Article |
| id | doaj-art-116d20bef5684f2b92362fa882ff4fbb |
| institution | DOAJ |
| issn | 1932-6203 |
| language | English |
| publishDate | 2010-09-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-116d20bef5684f2b92362fa882ff4fbb2025-08-20T03:19:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-09-0159e1269310.1371/journal.pone.0012693Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods.Brooke L FridleyGregory D JenkinsJoanna M BiernackaGene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-to-event phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher's method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0012693&type=printable |
| spellingShingle | Brooke L Fridley Gregory D Jenkins Joanna M Biernacka Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. PLoS ONE |
| title | Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. |
| title_full | Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. |
| title_fullStr | Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. |
| title_full_unstemmed | Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. |
| title_short | Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. |
| title_sort | self contained gene set analysis of expression data an evaluation of existing and novel methods |
| url | https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0012693&type=printable |
| work_keys_str_mv | AT brookelfridley selfcontainedgenesetanalysisofexpressiondataanevaluationofexistingandnovelmethods AT gregorydjenkins selfcontainedgenesetanalysisofexpressiondataanevaluationofexistingandnovelmethods AT joannambiernacka selfcontainedgenesetanalysisofexpressiondataanevaluationofexistingandnovelmethods |