dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data
NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on speci...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-03-01
|
Series: | Ecological Informatics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1574954124005120 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832595411866484736 |
---|---|
author | Luan Rabelo Davidson Sodré Oscar David Albito Balcázar Murilo Furtado do Rosário Aurycéia Jaquelyne Guimarães-Costa Grazielle Gomes Iracilda Sampaio Marcelo Vallinoto |
author_facet | Luan Rabelo Davidson Sodré Oscar David Albito Balcázar Murilo Furtado do Rosário Aurycéia Jaquelyne Guimarães-Costa Grazielle Gomes Iracilda Sampaio Marcelo Vallinoto |
author_sort | Luan Rabelo |
collection | DOAJ |
description | NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on species distributions. Other information, such as taxonomy, collection site locations, and conservation status, is often critical for these studies. Some databases, such as GBIF, BOLD Systems, and GenBank, provide data on the taxonomy, habitat, and geographic distribution of various taxonomic groups, while others, such as WoRMS and IUCN, have specific data on marine species and conservation status. However, depending on the taxonomic group studied, searches in these databases can encompass dozens or hundreds of queries, forcing researchers to conduct extensive searches in each database, which is a time-consuming and error-prone process. To facilitate and automate access to this information, we introduce dataFishing, a Python script and a web form. dataFishing is faster and more efficient than other R packages, such as bold, taxize, rgbif, rredlist, and worrms, for obtaining taxonomic information from the consulted databases. Moreover, it allows the retrieval of DNA sequences, common names, synonyms, conservation status, and species occurrence points. This tool is free and will enable a more systematized and time-efficient search, which tends to facilitate such data inquiries. |
format | Article |
id | doaj-art-961c22c3eb0e4c7c845ff4d23ba669cc |
institution | Kabale University |
issn | 1574-9541 |
language | English |
publishDate | 2025-03-01 |
publisher | Elsevier |
record_format | Article |
series | Ecological Informatics |
spelling | doaj-art-961c22c3eb0e4c7c845ff4d23ba669cc2025-01-19T06:24:41ZengElsevierEcological Informatics1574-95412025-03-0185102970dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity dataLuan Rabelo0Davidson Sodré1Oscar David Albito Balcázar2Murilo Furtado do Rosário3Aurycéia Jaquelyne Guimarães-Costa4Grazielle Gomes5Iracilda Sampaio6Marcelo Vallinoto7Laboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil; Instituto Tecnológico Vale (ITV), Belém, Pará, Brazil; Corresponding authors at: Laboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil.Universidade Federal Rural da Amazônia (UFRA), Campus de Capitão Poço, Capitão Poço, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil; AFYA, Faculdade de Ciências Médicas, Bragança, Pará, BrazilLaboratório de Genética Aplicada (LAGA), Universidade Federal do Pará, Campus de Bragança, IECOS, Bragança, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, BrazilLaboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil; CIBIO-InBIO, Centro de Investigação em Biodiversidade and Recursos Genéticos, Universidade do Porto, Porto, Portugal; Corresponding authors at: Laboratório de Evolução (LEVO), Instituto de Estudos Costeiros (IECOS), Universidade Federal do Pará (UFPA), Campus de Bragança, Bragança, Pará, Brazil.NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on species distributions. Other information, such as taxonomy, collection site locations, and conservation status, is often critical for these studies. Some databases, such as GBIF, BOLD Systems, and GenBank, provide data on the taxonomy, habitat, and geographic distribution of various taxonomic groups, while others, such as WoRMS and IUCN, have specific data on marine species and conservation status. However, depending on the taxonomic group studied, searches in these databases can encompass dozens or hundreds of queries, forcing researchers to conduct extensive searches in each database, which is a time-consuming and error-prone process. To facilitate and automate access to this information, we introduce dataFishing, a Python script and a web form. dataFishing is faster and more efficient than other R packages, such as bold, taxize, rgbif, rredlist, and worrms, for obtaining taxonomic information from the consulted databases. Moreover, it allows the retrieval of DNA sequences, common names, synonyms, conservation status, and species occurrence points. This tool is free and will enable a more systematized and time-efficient search, which tends to facilitate such data inquiries.http://www.sciencedirect.com/science/article/pii/S1574954124005120TaxonomyGenetic dataMultirepositoryBioinformatics |
spellingShingle | Luan Rabelo Davidson Sodré Oscar David Albito Balcázar Murilo Furtado do Rosário Aurycéia Jaquelyne Guimarães-Costa Grazielle Gomes Iracilda Sampaio Marcelo Vallinoto dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data Ecological Informatics Taxonomy Genetic data Multirepository Bioinformatics |
title | dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data |
title_full | dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data |
title_fullStr | dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data |
title_full_unstemmed | dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data |
title_short | dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data |
title_sort | datafishing an efficient python tool and user friendly web form for mining mitochondrial and chloroplast sequences taxonomic and biodiversity data |
topic | Taxonomy Genetic data Multirepository Bioinformatics |
url | http://www.sciencedirect.com/science/article/pii/S1574954124005120 |
work_keys_str_mv | AT luanrabelo datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata AT davidsonsodre datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata AT oscardavidalbitobalcazar datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata AT murilofurtadodorosario datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata AT auryceiajaquelyneguimaraescosta datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata AT graziellegomes datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata AT iracildasampaio datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata AT marcelovallinoto datafishinganefficientpythontoolanduserfriendlywebformforminingmitochondrialandchloroplastsequencestaxonomicandbiodiversitydata |