Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade
Abstract Diatoms, a major group of microalgae, play a critical role in global carbon cycling and primary production. Despite their ecological significance, comprehensive genomic resources for diatoms are limited. To address this, we have annotated previously unannotated genome assemblies of 49 diato...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-06-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-05306-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849335657544024064 |
|---|---|
| author | Natalia Nenasheva Clara Pitzschel Cynthia N. Webster Alexander J. Hart Jill L. Wegrzyn Mia M. Bengtsson Katharina J. Hoff |
| author_facet | Natalia Nenasheva Clara Pitzschel Cynthia N. Webster Alexander J. Hart Jill L. Wegrzyn Mia M. Bengtsson Katharina J. Hoff |
| author_sort | Natalia Nenasheva |
| collection | DOAJ |
| description | Abstract Diatoms, a major group of microalgae, play a critical role in global carbon cycling and primary production. Despite their ecological significance, comprehensive genomic resources for diatoms are limited. To address this, we have annotated previously unannotated genome assemblies of 49 diatom species. Genome assemblies were obtained from NCBI Datasets and processed for repeat elements using RepeatModeler2 and RepeatMasker. For gene prediction, BRAKER2 was employed in the absence of transcriptomic data, while BRAKER3 was utilised when transcriptome short read data were available from the Sequence Read Archive. The quality of genome assemblies and predicted protein sets was evaluated using BUSCO, ensuring high-quality genomic resources. Functional annotation was performed using EnTAP, providing insights into the biological roles of the predicted proteins. Our study enhances the genomic toolkit available for diatoms, facilitating future research in diatom biology, ecology, and evolution. |
| format | Article |
| id | doaj-art-34928713c60241a99d561ea8fd840607 |
| institution | Kabale University |
| issn | 2052-4463 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-34928713c60241a99d561ea8fd8406072025-08-20T03:45:11ZengNature PortfolioScientific Data2052-44632025-06-0112111910.1038/s41597-025-05306-zAnnotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta cladeNatalia Nenasheva0Clara Pitzschel1Cynthia N. Webster2Alexander J. Hart3Jill L. Wegrzyn4Mia M. Bengtsson5Katharina J. Hoff6University of Greifswald, Institute of Mathematics and Computer Science and Center for Functional Genomics of MicrobesUniversity of Greifswald, Institute of Mathematics and Computer Science and Center for Functional Genomics of MicrobesUniversity of Connecticut, Department of Ecology and Evolutionary Biology, Plant Computational Genomics Lab, 75 N. Eagleville Road, Unit 3043University of Connecticut, Department of Ecology and Evolutionary Biology, Plant Computational Genomics Lab, 75 N. Eagleville Road, Unit 3043University of Connecticut, Department of Ecology and Evolutionary Biology, Plant Computational Genomics Lab, 75 N. Eagleville Road, Unit 3043University of Greifswald, Institute of Microbiology, Felix-Hausdorff-Straße 8University of Greifswald, Institute of Mathematics and Computer Science and Center for Functional Genomics of MicrobesAbstract Diatoms, a major group of microalgae, play a critical role in global carbon cycling and primary production. Despite their ecological significance, comprehensive genomic resources for diatoms are limited. To address this, we have annotated previously unannotated genome assemblies of 49 diatom species. Genome assemblies were obtained from NCBI Datasets and processed for repeat elements using RepeatModeler2 and RepeatMasker. For gene prediction, BRAKER2 was employed in the absence of transcriptomic data, while BRAKER3 was utilised when transcriptome short read data were available from the Sequence Read Archive. The quality of genome assemblies and predicted protein sets was evaluated using BUSCO, ensuring high-quality genomic resources. Functional annotation was performed using EnTAP, providing insights into the biological roles of the predicted proteins. Our study enhances the genomic toolkit available for diatoms, facilitating future research in diatom biology, ecology, and evolution.https://doi.org/10.1038/s41597-025-05306-z |
| spellingShingle | Natalia Nenasheva Clara Pitzschel Cynthia N. Webster Alexander J. Hart Jill L. Wegrzyn Mia M. Bengtsson Katharina J. Hoff Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade Scientific Data |
| title | Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade |
| title_full | Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade |
| title_fullStr | Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade |
| title_full_unstemmed | Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade |
| title_short | Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade |
| title_sort | annotation of protein coding genes in 49 diatom genomes from the bacillariophyta clade |
| url | https://doi.org/10.1038/s41597-025-05306-z |
| work_keys_str_mv | AT natalianenasheva annotationofproteincodinggenesin49diatomgenomesfromthebacillariophytaclade AT clarapitzschel annotationofproteincodinggenesin49diatomgenomesfromthebacillariophytaclade AT cynthianwebster annotationofproteincodinggenesin49diatomgenomesfromthebacillariophytaclade AT alexanderjhart annotationofproteincodinggenesin49diatomgenomesfromthebacillariophytaclade AT jilllwegrzyn annotationofproteincodinggenesin49diatomgenomesfromthebacillariophytaclade AT miambengtsson annotationofproteincodinggenesin49diatomgenomesfromthebacillariophytaclade AT katharinajhoff annotationofproteincodinggenesin49diatomgenomesfromthebacillariophytaclade |