Enabling pan-repository reanalysis for big data science of public metabolomics data

Abstract Public untargeted metabolomics data is a growing resource for metabolite and phenotype discovery; however, accessing and utilizing these data across repositories pose significant challenges. Therefore, here we develop pan-repository universal identifiers and harmonized cross-repository meta...

Full description

Saved in:
Bibliographic Details
Main Authors: Yasin El Abiead, Michael Strobel, Thomas Payne, Eoin Fahy, Claire O’Donovan, Shankar Subramamiam, Juan Antonio Vizcaíno, Ozgur Yurekten, Victoria Deleray, Simone Zuffa, Shipei Xing, Helena Mannochio-Russo, Ipsita Mohanty, Haoqi Nina Zhao, Andres M. Caraballo-Rodriguez, Paulo Wender P. Gomes, Nicole E. Avalon, Trent R. Northen, Benjamin P. Bowen, Katherine B. Louie, Pieter C. Dorrestein, Mingxun Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-60067-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850124724218101760
author Yasin El Abiead
Michael Strobel
Thomas Payne
Eoin Fahy
Claire O’Donovan
Shankar Subramamiam
Juan Antonio Vizcaíno
Ozgur Yurekten
Victoria Deleray
Simone Zuffa
Shipei Xing
Helena Mannochio-Russo
Ipsita Mohanty
Haoqi Nina Zhao
Andres M. Caraballo-Rodriguez
Paulo Wender P. Gomes
Nicole E. Avalon
Trent R. Northen
Benjamin P. Bowen
Katherine B. Louie
Pieter C. Dorrestein
Mingxun Wang
author_facet Yasin El Abiead
Michael Strobel
Thomas Payne
Eoin Fahy
Claire O’Donovan
Shankar Subramamiam
Juan Antonio Vizcaíno
Ozgur Yurekten
Victoria Deleray
Simone Zuffa
Shipei Xing
Helena Mannochio-Russo
Ipsita Mohanty
Haoqi Nina Zhao
Andres M. Caraballo-Rodriguez
Paulo Wender P. Gomes
Nicole E. Avalon
Trent R. Northen
Benjamin P. Bowen
Katherine B. Louie
Pieter C. Dorrestein
Mingxun Wang
author_sort Yasin El Abiead
collection DOAJ
description Abstract Public untargeted metabolomics data is a growing resource for metabolite and phenotype discovery; however, accessing and utilizing these data across repositories pose significant challenges. Therefore, here we develop pan-repository universal identifiers and harmonized cross-repository metadata. This ecosystem facilitates discovery by integrating diverse data sources from public repositories including MetaboLights, Metabolomics Workbench, and GNPS/MassIVE. Our approach simplified data handling and unlocks previously inaccessible reanalysis workflows, fostering unmatched research opportunities.
format Article
id doaj-art-1879a9528dd94a30b698df6037cfb56e
institution OA Journals
issn 2041-1723
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-1879a9528dd94a30b698df6037cfb56e2025-08-20T02:34:15ZengNature PortfolioNature Communications2041-17232025-05-011611710.1038/s41467-025-60067-yEnabling pan-repository reanalysis for big data science of public metabolomics dataYasin El Abiead0Michael Strobel1Thomas Payne2Eoin Fahy3Claire O’Donovan4Shankar Subramamiam5Juan Antonio Vizcaíno6Ozgur Yurekten7Victoria Deleray8Simone Zuffa9Shipei Xing10Helena Mannochio-Russo11Ipsita Mohanty12Haoqi Nina Zhao13Andres M. Caraballo-Rodriguez14Paulo Wender P. Gomes15Nicole E. Avalon16Trent R. Northen17Benjamin P. Bowen18Katherine B. Louie19Pieter C. Dorrestein20Mingxun Wang21Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoDepartment of Computer Science and Engineering, University of California RiversideEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, HinxtonDepartment of Bioengineering, and San Diego Supercomputer Center, University of California, San DiegoEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, HinxtonDepartment of Bioengineering, and San Diego Supercomputer Center, University of California, San DiegoEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, HinxtonEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, HinxtonSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoCenter for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San DiegoEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National LabEnvironmental Genomics and Systems Biology Division, Lawrence Berkeley National LabThe DOE Joint Genome Institute, Lawrence Berkeley National LaboratorySkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San DiegoDepartment of Computer Science and Engineering, University of California RiversideAbstract Public untargeted metabolomics data is a growing resource for metabolite and phenotype discovery; however, accessing and utilizing these data across repositories pose significant challenges. Therefore, here we develop pan-repository universal identifiers and harmonized cross-repository metadata. This ecosystem facilitates discovery by integrating diverse data sources from public repositories including MetaboLights, Metabolomics Workbench, and GNPS/MassIVE. Our approach simplified data handling and unlocks previously inaccessible reanalysis workflows, fostering unmatched research opportunities.https://doi.org/10.1038/s41467-025-60067-y
spellingShingle Yasin El Abiead
Michael Strobel
Thomas Payne
Eoin Fahy
Claire O’Donovan
Shankar Subramamiam
Juan Antonio Vizcaíno
Ozgur Yurekten
Victoria Deleray
Simone Zuffa
Shipei Xing
Helena Mannochio-Russo
Ipsita Mohanty
Haoqi Nina Zhao
Andres M. Caraballo-Rodriguez
Paulo Wender P. Gomes
Nicole E. Avalon
Trent R. Northen
Benjamin P. Bowen
Katherine B. Louie
Pieter C. Dorrestein
Mingxun Wang
Enabling pan-repository reanalysis for big data science of public metabolomics data
Nature Communications
title Enabling pan-repository reanalysis for big data science of public metabolomics data
title_full Enabling pan-repository reanalysis for big data science of public metabolomics data
title_fullStr Enabling pan-repository reanalysis for big data science of public metabolomics data
title_full_unstemmed Enabling pan-repository reanalysis for big data science of public metabolomics data
title_short Enabling pan-repository reanalysis for big data science of public metabolomics data
title_sort enabling pan repository reanalysis for big data science of public metabolomics data
url https://doi.org/10.1038/s41467-025-60067-y
work_keys_str_mv AT yasinelabiead enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT michaelstrobel enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT thomaspayne enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT eoinfahy enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT claireodonovan enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT shankarsubramamiam enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT juanantoniovizcaino enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT ozguryurekten enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT victoriadeleray enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT simonezuffa enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT shipeixing enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT helenamannochiorusso enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT ipsitamohanty enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT haoqininazhao enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT andresmcaraballorodriguez enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT paulowenderpgomes enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT nicoleeavalon enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT trentrnorthen enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT benjaminpbowen enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT katherineblouie enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT pietercdorrestein enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata
AT mingxunwang enablingpanrepositoryreanalysisforbigdatascienceofpublicmetabolomicsdata