The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations

Abstract Marine microbial eukaryotes (protists) perform essential metabolic functions in oceanic ecosystems. The diversity of protist functions remains poorly understood as few species have been isolated in laboratory settings. Metatranscriptomes provide an invaluable tool for exploring protist dive...

Full description

Saved in:
Bibliographic Details
Main Authors: R. D. Groussman, S. N. Coesel, B. P. Durham, M. J. Schatz, E. V. Armbrust
Format: Article
Language:English
Published: Nature Portfolio 2024-10-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-024-04005-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850203979997249536
author R. D. Groussman
S. N. Coesel
B. P. Durham
M. J. Schatz
E. V. Armbrust
author_facet R. D. Groussman
S. N. Coesel
B. P. Durham
M. J. Schatz
E. V. Armbrust
author_sort R. D. Groussman
collection DOAJ
description Abstract Marine microbial eukaryotes (protists) perform essential metabolic functions in oceanic ecosystems. The diversity of protist functions remains poorly understood as few species have been isolated in laboratory settings. Metatranscriptomes provide an invaluable tool for exploring protist diversity and genetic capacities within their natural habitats. Here, we introduce the North Pacific Eukaryotic Gene Catalog, a compilation of metatranscriptome data derived from a total of 261 metatranscriptomes: 169 metatranscriptomes were derived from samples collected on three meridional surface transects along 158°W, each spanning ~20 degrees of latitude from the North Pacific Subtropical Gyre (NPSG) to the North Pacific Transition Zone (NPTZ); 92 metatranscriptomes were derived from two diel-resolved field studies, one in the NPSG at 157°W, 23°N, one in the NPTZ at 158°W, 41°N. The metatranscriptome sequences were de novo assembled into 175 assemblies and pooled into five datasets each containing between 22 M and 49 M contigs clustered at 99% protein identity. Assemblies were annotated by taxonomy and function, and enumerated by short read alignment. All data are available in the Zenodo repository, with underlying code available on github.
format Article
id doaj-art-710c142528154a75bb192ae7e34b46f6
institution OA Journals
issn 2052-4463
language English
publishDate 2024-10-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-710c142528154a75bb192ae7e34b46f62025-08-20T02:11:23ZengNature PortfolioScientific Data2052-44632024-10-0111111010.1038/s41597-024-04005-5The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotationsR. D. Groussman0S. N. Coesel1B. P. Durham2M. J. Schatz3E. V. Armbrust4School of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonDepartment of Biology, Genetics Institute, University of FloridaSchool of Oceanography, University of WashingtonSchool of Oceanography, University of WashingtonAbstract Marine microbial eukaryotes (protists) perform essential metabolic functions in oceanic ecosystems. The diversity of protist functions remains poorly understood as few species have been isolated in laboratory settings. Metatranscriptomes provide an invaluable tool for exploring protist diversity and genetic capacities within their natural habitats. Here, we introduce the North Pacific Eukaryotic Gene Catalog, a compilation of metatranscriptome data derived from a total of 261 metatranscriptomes: 169 metatranscriptomes were derived from samples collected on three meridional surface transects along 158°W, each spanning ~20 degrees of latitude from the North Pacific Subtropical Gyre (NPSG) to the North Pacific Transition Zone (NPTZ); 92 metatranscriptomes were derived from two diel-resolved field studies, one in the NPSG at 157°W, 23°N, one in the NPTZ at 158°W, 41°N. The metatranscriptome sequences were de novo assembled into 175 assemblies and pooled into five datasets each containing between 22 M and 49 M contigs clustered at 99% protein identity. Assemblies were annotated by taxonomy and function, and enumerated by short read alignment. All data are available in the Zenodo repository, with underlying code available on github.https://doi.org/10.1038/s41597-024-04005-5
spellingShingle R. D. Groussman
S. N. Coesel
B. P. Durham
M. J. Schatz
E. V. Armbrust
The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations
Scientific Data
title The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations
title_full The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations
title_fullStr The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations
title_full_unstemmed The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations
title_short The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations
title_sort north pacific eukaryotic gene catalog of metatranscriptome assemblies and annotations
url https://doi.org/10.1038/s41597-024-04005-5
work_keys_str_mv AT rdgroussman thenorthpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT sncoesel thenorthpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT bpdurham thenorthpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT mjschatz thenorthpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT evarmbrust thenorthpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT rdgroussman northpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT sncoesel northpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT bpdurham northpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT mjschatz northpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations
AT evarmbrust northpacificeukaryoticgenecatalogofmetatranscriptomeassembliesandannotations