dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on speci...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-03-01
|
Series: | Ecological Informatics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1574954124005120 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on species distributions. Other information, such as taxonomy, collection site locations, and conservation status, is often critical for these studies. Some databases, such as GBIF, BOLD Systems, and GenBank, provide data on the taxonomy, habitat, and geographic distribution of various taxonomic groups, while others, such as WoRMS and IUCN, have specific data on marine species and conservation status. However, depending on the taxonomic group studied, searches in these databases can encompass dozens or hundreds of queries, forcing researchers to conduct extensive searches in each database, which is a time-consuming and error-prone process. To facilitate and automate access to this information, we introduce dataFishing, a Python script and a web form. dataFishing is faster and more efficient than other R packages, such as bold, taxize, rgbif, rredlist, and worrms, for obtaining taxonomic information from the consulted databases. Moreover, it allows the retrieval of DNA sequences, common names, synonyms, conservation status, and species occurrence points. This tool is free and will enable a more systematized and time-efficient search, which tends to facilitate such data inquiries. |
---|---|
ISSN: | 1574-9541 |