ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.

Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequ...

Full description

Saved in:
Bibliographic Details
Main Authors: Brett A McKinney, Bill C White, Diane E Grill, Peter W Li, Richard B Kennedy, Gregory A Poland, Ann L Oberg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0081527
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850161602336129024
author Brett A McKinney
Bill C White
Diane E Grill
Peter W Li
Richard B Kennedy
Gregory A Poland
Ann L Oberg
author_facet Brett A McKinney
Bill C White
Diane E Grill
Peter W Li
Richard B Kennedy
Gregory A Poland
Ann L Oberg
author_sort Brett A McKinney
collection DOAJ
description Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php.
format Article
id doaj-art-1f4bc57b25a14476aeb2f92c2a9d8edb
institution OA Journals
issn 1932-6203
language English
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-1f4bc57b25a14476aeb2f92c2a9d8edb2025-08-20T02:22:46ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-01812e8152710.1371/journal.pone.0081527ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.Brett A McKinneyBill C WhiteDiane E GrillPeter W LiRichard B KennedyGregory A PolandAnn L ObergRelief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php.https://doi.org/10.1371/journal.pone.0081527
spellingShingle Brett A McKinney
Bill C White
Diane E Grill
Peter W Li
Richard B Kennedy
Gregory A Poland
Ann L Oberg
ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.
PLoS ONE
title ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.
title_full ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.
title_fullStr ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.
title_full_unstemmed ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.
title_short ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.
title_sort reliefseq a gene wise adaptive k nearest neighbor feature selection tool for finding gene gene interactions and main effects in mrna seq gene expression data
url https://doi.org/10.1371/journal.pone.0081527
work_keys_str_mv AT brettamckinney reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT billcwhite reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT dianeegrill reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT peterwli reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT richardbkennedy reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT gregoryapoland reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata
AT annloberg reliefseqagenewiseadaptiveknearestneighborfeatureselectiontoolforfindinggenegeneinteractionsandmaineffectsinmrnaseqgeneexpressiondata