A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e.,...

Full description

Saved in:
Bibliographic Details
Main Authors: Aaditya V Rangan, Caroline C McGrouther, John Kelsoe, Nicholas Schork, Eli Stahl, Qian Zhu, Arjun Krishnan, Vicky Yao, Olga Troyanskaya, Seda Bilaloglu, Preeti Raghavan, Sarah Bergen, Anders Jureus, Mikael Landen, Bipolar Disorders Working Group of the Psychiatric Genomics Consortium
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-05-01
Series:PLoS Computational Biology
Online Access:https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006105&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850230186638835712
author Aaditya V Rangan
Caroline C McGrouther
John Kelsoe
Nicholas Schork
Eli Stahl
Qian Zhu
Arjun Krishnan
Vicky Yao
Olga Troyanskaya
Seda Bilaloglu
Preeti Raghavan
Sarah Bergen
Anders Jureus
Mikael Landen
Bipolar Disorders Working Group of the Psychiatric Genomics Consortium
author_facet Aaditya V Rangan
Caroline C McGrouther
John Kelsoe
Nicholas Schork
Eli Stahl
Qian Zhu
Arjun Krishnan
Vicky Yao
Olga Troyanskaya
Seda Bilaloglu
Preeti Raghavan
Sarah Bergen
Anders Jureus
Mikael Landen
Bipolar Disorders Working Group of the Psychiatric Genomics Consortium
author_sort Aaditya V Rangan
collection DOAJ
description A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., 'loops') within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).
format Article
id doaj-art-3516fd8d65ac40969c8cda7731788d4e
institution OA Journals
issn 1553-734X
1553-7358
language English
publishDate 2018-05-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-3516fd8d65ac40969c8cda7731788d4e2025-08-20T02:03:57ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-05-01145e100610510.1371/journal.pcbi.1006105A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.Aaditya V RanganCaroline C McGroutherJohn KelsoeNicholas SchorkEli StahlQian ZhuArjun KrishnanVicky YaoOlga TroyanskayaSeda BilalogluPreeti RaghavanSarah BergenAnders JureusMikael LandenBipolar Disorders Working Group of the Psychiatric Genomics ConsortiumA common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., 'loops') within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006105&type=printable
spellingShingle Aaditya V Rangan
Caroline C McGrouther
John Kelsoe
Nicholas Schork
Eli Stahl
Qian Zhu
Arjun Krishnan
Vicky Yao
Olga Troyanskaya
Seda Bilaloglu
Preeti Raghavan
Sarah Bergen
Anders Jureus
Mikael Landen
Bipolar Disorders Working Group of the Psychiatric Genomics Consortium
A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.
PLoS Computational Biology
title A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.
title_full A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.
title_fullStr A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.
title_full_unstemmed A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.
title_short A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.
title_sort loop counting method for covariate corrected low rank biclustering of gene expression and genome wide association study data
url https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006105&type=printable
work_keys_str_mv AT aadityavrangan aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT carolinecmcgrouther aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT johnkelsoe aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT nicholasschork aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT elistahl aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT qianzhu aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT arjunkrishnan aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT vickyyao aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT olgatroyanskaya aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT sedabilaloglu aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT preetiraghavan aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT sarahbergen aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT andersjureus aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT mikaellanden aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT bipolardisordersworkinggroupofthepsychiatricgenomicsconsortium aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT aadityavrangan loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT carolinecmcgrouther loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT johnkelsoe loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT nicholasschork loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT elistahl loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT qianzhu loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT arjunkrishnan loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT vickyyao loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT olgatroyanskaya loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT sedabilaloglu loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT preetiraghavan loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT sarahbergen loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT andersjureus loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT mikaellanden loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT bipolardisordersworkinggroupofthepsychiatricgenomicsconsortium loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata