The effect of non-linear signal in classification problems using gene expression.

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Benjamin J Heil, Jake Crawford, Casey S Greene
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2023-03-01
Series:	PLoS Computational Biology
Online Access:	https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010984&type=printable
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850063854829043712
author	Benjamin J Heil Jake Crawford Casey S Greene
author_facet	Benjamin J Heil Jake Crawford Casey S Greene
author_sort	Benjamin J Heil
collection	DOAJ
description	Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.
format	Article
id	doaj-art-1a94026c384a4ba5bca5cb2647c9c2bd
institution	DOAJ
issn	1553-734X 1553-7358
language	English
publishDate	2023-03-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj-art-1a94026c384a4ba5bca5cb2647c9c2bd2025-08-20T02:49:29ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582023-03-01193e101098410.1371/journal.pcbi.1010984The effect of non-linear signal in classification problems using gene expression.Benjamin J HeilJake CrawfordCasey S GreeneThose building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010984&type=printable
spellingShingle	Benjamin J Heil Jake Crawford Casey S Greene The effect of non-linear signal in classification problems using gene expression. PLoS Computational Biology
title	The effect of non-linear signal in classification problems using gene expression.
title_full	The effect of non-linear signal in classification problems using gene expression.
title_fullStr	The effect of non-linear signal in classification problems using gene expression.
title_full_unstemmed	The effect of non-linear signal in classification problems using gene expression.
title_short	The effect of non-linear signal in classification problems using gene expression.
title_sort	effect of non linear signal in classification problems using gene expression
url	https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010984&type=printable
work_keys_str_mv	AT benjaminjheil theeffectofnonlinearsignalinclassificationproblemsusinggeneexpression AT jakecrawford theeffectofnonlinearsignalinclassificationproblemsusinggeneexpression AT caseysgreene theeffectofnonlinearsignalinclassificationproblemsusinggeneexpression AT benjaminjheil effectofnonlinearsignalinclassificationproblemsusinggeneexpression AT jakecrawford effectofnonlinearsignalinclassificationproblemsusinggeneexpression AT caseysgreene effectofnonlinearsignalinclassificationproblemsusinggeneexpression

The effect of non-linear signal in classification problems using gene expression.

Similar Items