Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.

As the biomedical data ecosystem increasingly embraces the findable, accessible, interoperable, and reusable (FAIR) data principles to publish multimodal datasets to the cloud, opportunities for cloud-based research continue to expand. Besides the potential for accelerated and diverse biomedical dis...

Full description

Saved in:
Bibliographic Details
Main Authors: Seth R Berke, Kanika Kanchan, Mary L Marazita, Eric Tobin, Ingo Ruczinski
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-07-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1013215
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850116732569518080
author Seth R Berke
Kanika Kanchan
Mary L Marazita
Eric Tobin
Ingo Ruczinski
author_facet Seth R Berke
Kanika Kanchan
Mary L Marazita
Eric Tobin
Ingo Ruczinski
author_sort Seth R Berke
collection DOAJ
description As the biomedical data ecosystem increasingly embraces the findable, accessible, interoperable, and reusable (FAIR) data principles to publish multimodal datasets to the cloud, opportunities for cloud-based research continue to expand. Besides the potential for accelerated and diverse biomedical discovery that comes from a harmonized data ecosystem, the cloud also presents a shift away from the standard practice of duplicating data to computational clusters or local computers for analysis. However, despite these benefits, researcher migration to the cloud has lagged, in part due to insufficient educational resources to train biomedical scientists on cloud infrastructure. There exists a conceptual lack especially around the crafting of custom analytic pipelines that require software not pre-installed by cloud analysis platforms. We here present three fundamental concepts necessary for custom pipeline creation in the cloud. These overarching concepts are workflow and cloud provider agnostic, extending the utility of this education to serve as a foundation for any computational analysis running any dataset in any biomedical cloud platform. We illustrate these concepts using one of our own custom analyses, a study using the case-parent trio design to detect sex-specific genetic effects on orofacial cleft (OFC) risk, which we crafted in the biomedical cloud analysis platform CAVATICA.
format Article
id doaj-art-e0acebed60f646608682c58086418fe9
institution OA Journals
issn 1553-734X
1553-7358
language English
publishDate 2025-07-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-e0acebed60f646608682c58086418fe92025-08-20T02:36:15ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-07-01217e101321510.1371/journal.pcbi.1013215Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.Seth R BerkeKanika KanchanMary L MarazitaEric TobinIngo RuczinskiAs the biomedical data ecosystem increasingly embraces the findable, accessible, interoperable, and reusable (FAIR) data principles to publish multimodal datasets to the cloud, opportunities for cloud-based research continue to expand. Besides the potential for accelerated and diverse biomedical discovery that comes from a harmonized data ecosystem, the cloud also presents a shift away from the standard practice of duplicating data to computational clusters or local computers for analysis. However, despite these benefits, researcher migration to the cloud has lagged, in part due to insufficient educational resources to train biomedical scientists on cloud infrastructure. There exists a conceptual lack especially around the crafting of custom analytic pipelines that require software not pre-installed by cloud analysis platforms. We here present three fundamental concepts necessary for custom pipeline creation in the cloud. These overarching concepts are workflow and cloud provider agnostic, extending the utility of this education to serve as a foundation for any computational analysis running any dataset in any biomedical cloud platform. We illustrate these concepts using one of our own custom analyses, a study using the case-parent trio design to detect sex-specific genetic effects on orofacial cleft (OFC) risk, which we crafted in the biomedical cloud analysis platform CAVATICA.https://doi.org/10.1371/journal.pcbi.1013215
spellingShingle Seth R Berke
Kanika Kanchan
Mary L Marazita
Eric Tobin
Ingo Ruczinski
Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.
PLoS Computational Biology
title Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.
title_full Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.
title_fullStr Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.
title_full_unstemmed Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.
title_short Fundamentals of FAIR biomedical data analyses in the cloud using custom pipelines.
title_sort fundamentals of fair biomedical data analyses in the cloud using custom pipelines
url https://doi.org/10.1371/journal.pcbi.1013215
work_keys_str_mv AT sethrberke fundamentalsoffairbiomedicaldataanalysesinthecloudusingcustompipelines
AT kanikakanchan fundamentalsoffairbiomedicaldataanalysesinthecloudusingcustompipelines
AT marylmarazita fundamentalsoffairbiomedicaldataanalysesinthecloudusingcustompipelines
AT erictobin fundamentalsoffairbiomedicaldataanalysesinthecloudusingcustompipelines
AT ingoruczinski fundamentalsoffairbiomedicaldataanalysesinthecloudusingcustompipelines