Population modeling with machine learning can enhance measures of mental health - Open-data replication

Efforts to predict trait phenotypes based on functional MRI data from large cohorts have been hampered by low prediction accuracy and/or small effect sizes. Although these findings are highly replicable, the small effect sizes are somewhat surprising given the presumed brain basis of phenotypic trai...

Full description

Saved in:
Bibliographic Details
Main Authors: Ty Easley, Ruiqi Chen, Kayla Hannon, Rosie Dutt, Janine Bijsterbosch
Format: Article
Language:English
Published: Elsevier 2023-06-01
Series:NeuroImage: Reports
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666956023000089
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849703436668370944
author Ty Easley
Ruiqi Chen
Kayla Hannon
Rosie Dutt
Janine Bijsterbosch
author_facet Ty Easley
Ruiqi Chen
Kayla Hannon
Rosie Dutt
Janine Bijsterbosch
author_sort Ty Easley
collection DOAJ
description Efforts to predict trait phenotypes based on functional MRI data from large cohorts have been hampered by low prediction accuracy and/or small effect sizes. Although these findings are highly replicable, the small effect sizes are somewhat surprising given the presumed brain basis of phenotypic traits such as neuroticism and fluid intelligence. We aim to replicate previous work and additionally test multiple data manipulations that may improve prediction accuracy by addressing data pollution challenges. Specifically, we added additional fMRI features, averaged the target phenotype across multiple measurements to obtain more accurate estimates of the underlying trait, balanced the target phenotype's distribution through undersampling of majority scores, and identified data-driven subtypes to investigate the impact of between-participant heterogeneity. Our results replicated prior results from Dadi et al. (2021) in a larger sample. Each data manipulation further led to small but consistent improvements in prediction accuracy, which were largely additive when combining multiple data manipulations. Combining data manipulations (i.e., extended fMRI features, averaged target phenotype, balanced target phenotype distribution) led to a three-fold increase in prediction accuracy for fluid intelligence compared to prior work. These findings highlight the benefit of several relatively easy and low-cost data manipulations, which may positively impact future work.
format Article
id doaj-art-06550f2f7ac04d56ac62eafa7d5459c5
institution DOAJ
issn 2666-9560
language English
publishDate 2023-06-01
publisher Elsevier
record_format Article
series NeuroImage: Reports
spelling doaj-art-06550f2f7ac04d56ac62eafa7d5459c52025-08-20T03:17:18ZengElsevierNeuroImage: Reports2666-95602023-06-013210016310.1016/j.ynirp.2023.100163Population modeling with machine learning can enhance measures of mental health - Open-data replicationTy Easley0Ruiqi Chen1Kayla Hannon2Rosie Dutt3Janine Bijsterbosch4Department of Radiology, Washington University School of Medicine, Saint Louis, Missouri, 63110, USADivision of Biology and Biomedical Sciences, Washington University in St. Louis, Saint Louis, Missouri, 63110, USADepartment of Radiology, Washington University School of Medicine, Saint Louis, Missouri, 63110, USADepartment of Radiology, Washington University School of Medicine, Saint Louis, Missouri, 63110, USADepartment of Radiology, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA; Corresponding author.Efforts to predict trait phenotypes based on functional MRI data from large cohorts have been hampered by low prediction accuracy and/or small effect sizes. Although these findings are highly replicable, the small effect sizes are somewhat surprising given the presumed brain basis of phenotypic traits such as neuroticism and fluid intelligence. We aim to replicate previous work and additionally test multiple data manipulations that may improve prediction accuracy by addressing data pollution challenges. Specifically, we added additional fMRI features, averaged the target phenotype across multiple measurements to obtain more accurate estimates of the underlying trait, balanced the target phenotype's distribution through undersampling of majority scores, and identified data-driven subtypes to investigate the impact of between-participant heterogeneity. Our results replicated prior results from Dadi et al. (2021) in a larger sample. Each data manipulation further led to small but consistent improvements in prediction accuracy, which were largely additive when combining multiple data manipulations. Combining data manipulations (i.e., extended fMRI features, averaged target phenotype, balanced target phenotype distribution) led to a three-fold increase in prediction accuracy for fluid intelligence compared to prior work. These findings highlight the benefit of several relatively easy and low-cost data manipulations, which may positively impact future work.http://www.sciencedirect.com/science/article/pii/S2666956023000089ReplicationPredictionNeuroticismIntelligenceData pollutionResting state fMRI
spellingShingle Ty Easley
Ruiqi Chen
Kayla Hannon
Rosie Dutt
Janine Bijsterbosch
Population modeling with machine learning can enhance measures of mental health - Open-data replication
NeuroImage: Reports
Replication
Prediction
Neuroticism
Intelligence
Data pollution
Resting state fMRI
title Population modeling with machine learning can enhance measures of mental health - Open-data replication
title_full Population modeling with machine learning can enhance measures of mental health - Open-data replication
title_fullStr Population modeling with machine learning can enhance measures of mental health - Open-data replication
title_full_unstemmed Population modeling with machine learning can enhance measures of mental health - Open-data replication
title_short Population modeling with machine learning can enhance measures of mental health - Open-data replication
title_sort population modeling with machine learning can enhance measures of mental health open data replication
topic Replication
Prediction
Neuroticism
Intelligence
Data pollution
Resting state fMRI
url http://www.sciencedirect.com/science/article/pii/S2666956023000089
work_keys_str_mv AT tyeasley populationmodelingwithmachinelearningcanenhancemeasuresofmentalhealthopendatareplication
AT ruiqichen populationmodelingwithmachinelearningcanenhancemeasuresofmentalhealthopendatareplication
AT kaylahannon populationmodelingwithmachinelearningcanenhancemeasuresofmentalhealthopendatareplication
AT rosiedutt populationmodelingwithmachinelearningcanenhancemeasuresofmentalhealthopendatareplication
AT janinebijsterbosch populationmodelingwithmachinelearningcanenhancemeasuresofmentalhealthopendatareplication