dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data

NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on speci...

Full description

Saved in:
Bibliographic Details
Main Authors: Luan Rabelo, Davidson Sodré, Oscar David Albito Balcázar, Murilo Furtado do Rosário, Aurycéia Jaquelyne Guimarães-Costa, Grazielle Gomes, Iracilda Sampaio, Marcelo Vallinoto
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Ecological Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1574954124005120
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595411866484736
author Luan Rabelo
Davidson Sodré
Oscar David Albito Balcázar
Murilo Furtado do Rosário
Aurycéia Jaquelyne Guimarães-Costa
Grazielle Gomes
Iracilda Sampaio
Marcelo Vallinoto
author_facet Luan Rabelo
Davidson Sodré
Oscar David Albito Balcázar
Murilo Furtado do Rosário
Aurycéia Jaquelyne Guimarães-Costa
Grazielle Gomes
Iracilda Sampaio
Marcelo Vallinoto
author_sort Luan Rabelo
collection DOAJ
description NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on species distributions. Other information, such as taxonomy, collection site locations, and conservation status, is often critical for these studies. Some databases, such as GBIF, BOLD Systems, and GenBank, provide data on the taxonomy, habitat, and geographic distribution of various taxonomic groups, while others, such as WoRMS and IUCN, have specific data on marine species and conservation status. However, depending on the taxonomic group studied, searches in these databases can encompass dozens or hundreds of queries, forcing researchers to conduct extensive searches in each database, which is a time-consuming and error-prone process. To facilitate and automate access to this information, we introduce dataFishing, a Python script and a web form. dataFishing is faster and more efficient than other R packages, such as bold, taxize, rgbif, rredlist, and worrms, for obtaining taxonomic information from the consulted databases. Moreover, it allows the retrieval of DNA sequences, common names, synonyms, conservation status, and species occurrence points. This tool is free and will enable a more systematized and time-efficient search, which tends to facilitate such data inquiries.
format Article
id doaj-art-961c22c3eb0e4c7c845ff4d23ba669cc
institution Kabale University
issn 1574-9541
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series Ecological Informatics
spelling doaj-art-961c22c3eb0e4c7c845ff4d23ba669cc2025-01-19T06:24:41ZengElsevierEcological Informatics1574-95412025-03-0185102970dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity dataLuan Rabelo0Davidson Sodré1Oscar David Albito Balcázar2Murilo Furtado do Rosário3Aurycéia Jaquelyne Guimarães-Costa4Grazielle Gomes5Iracilda Sampaio6Marcelo Vallinoto7Laboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil; Instituto Tecnológico Vale (ITV), Belém, Pará, Brazil; Corresponding authors at: Laboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil.Universidade Federal Rural da Amazônia (UFRA), Campus de Capitão Poço, Capitão Poço, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil; AFYA, Faculdade de Ciências Médicas, Bragança, Pará, BrazilLaboratório de Genética Aplicada (LAGA), Universidade Federal do Pará, Campus de Bragança, IECOS, Bragança, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil; CIBIO-InBIO, Centro de Investigação em Biodiversidade and Recursos Genéticos, Universidade do Porto, Porto, Portugal; Corresponding authors at: Laboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil.NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on species distributions. Other information, such as taxonomy, collection site locations, and conservation status, is often critical for these studies. Some databases, such as GBIF, BOLD Systems, and GenBank, provide data on the taxonomy, habitat, and geographic distribution of various taxonomic groups, while others, such as WoRMS and IUCN, have specific data on marine species and conservation status. However, depending on the taxonomic group studied, searches in these databases can encompass dozens or hundreds of queries, forcing researchers to conduct extensive searches in each database, which is a time-consuming and error-prone process. To facilitate and automate access to this information, we introduce dataFishing, a Python script and a web form. dataFishing is faster and more efficient than other R packages, such as bold, taxize, rgbif, rredlist, and worrms, for obtaining taxonomic information from the consulted databases. Moreover, it allows the retrieval of DNA sequences, common names, synonyms, conservation status, and species occurrence points. This tool is free and will enable a more systematized and time-efficient search, which tends to facilitate such data inquiries.http://www.sciencedirect.com/science/article/pii/S1574954124005120TaxonomyGenetic dataMultirepositoryBioinformatics
spellingShingle Luan Rabelo
Davidson Sodré
Oscar David Albito Balcázar
Murilo Furtado do Rosário
Aurycéia Jaquelyne Guimarães-Costa
Grazielle Gomes
Iracilda Sampaio
Marcelo Vallinoto
dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
Ecological Informatics
Taxonomy
Genetic data
Multirepository
Bioinformatics
title dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
title_full dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
title_fullStr dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
title_full_unstemmed dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
title_short dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
title_sort datafishing an efficient python tool and user friendly web form for mining mitochondrial and chloroplast sequences taxonomic and biodiversity data
topic Taxonomy
Genetic data
Multirepository
Bioinformatics
url http://www.sciencedirect.com/science/article/pii/S1574954124005120
work_keys_str_mv AT luanrabelo datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata
AT davidsonsodre datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata
AT oscardavidalbitobalcazar datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata
AT murilofurtadodorosario datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata
AT auryceiajaquelyneguimaraescosta datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata
AT graziellegomes datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata
AT iracildasampaio datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata
AT marcelovallinoto datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata