ProTaxoVis—protein taxonomic visualisation of presence

Abstract Background: Protein presence information is an essential component of biological pathway identification. Presence of certain enzymes in an organism points towards the metabolic pathways that occur within it, whereas the absence of these enzymes indicates either the existence of alternative...

Full description

Saved in:
Bibliographic Details
Main Authors: Yin-Chen Hsieh, Mathias Bockwoldt, Ines Heiland
Format: Article
Language:English
Published: BMC 2025-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06146-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849730948588896256
author Yin-Chen Hsieh
Mathias Bockwoldt
Ines Heiland
author_facet Yin-Chen Hsieh
Mathias Bockwoldt
Ines Heiland
author_sort Yin-Chen Hsieh
collection DOAJ
description Abstract Background: Protein presence information is an essential component of biological pathway identification. Presence of certain enzymes in an organism points towards the metabolic pathways that occur within it, whereas the absence of these enzymes indicates either the existence of alternative pathways or a lack of these pathways altogether. The same inference applies to regulatory pathways such as gene regulation and signal transduction. Protein presence information therefore forms the basis for biological pathway studies, and patterns in presence-absence across multiple organisms allow for comparative pathway analyses. Results: Here we present ProTaxoVis, a novel bioinformatic tool that extracts protein presence information from database queries and maps it to a taxonomic tree or heatmap. ProTaxoVis generates a large-scale overview of presence patterns in taxonomic clades of interest. This overview reveals protein distribution patterns, and this can be used to deduce pathway evolution or to probe other biological questions. ProTaxoVis combines and filters sequence query results to extract information on the distribution of proteins and translates this information into two types of visual outputs: taxonomic trees and heatmaps. The trees supplement their topology with scaled pie-chart representations per node of the presence of target proteins and combinations of these proteins, such that patterns in taxonomic groups can easily be identified. The heatmap visualisation shows presence and conservation of these proteins for a user-determined set of species, allowing for a more detailed view over a larger group of proteins as compared to the trees. ProTaxoVis also allows for visual quality checks of hits based on a coverage plot and a length histogram, which can be used to determine e-value and minimum protein length cutoffs. Tabular output of resulting data from the query, combined, and heatmap building step are saved and easily accessible for further analyses. Conclusions: We evaluate our tool with the phosphoribosyltransferases, a transferase enzyme family with notable distribution patterns amongst organisms of varying complexities and across Eukaryota, Bacteria, and Archaea. ProTaxoVis is open-source and available at: https://github.com/MolecularBioinformatics/ProTaxoVis .
format Article
id doaj-art-1ae06a0f41cb48ee81caf400ca70e341
institution DOAJ
issn 1471-2105
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-1ae06a0f41cb48ee81caf400ca70e3412025-08-20T03:08:43ZengBMCBMC Bioinformatics1471-21052025-05-0126111710.1186/s12859-025-06146-9ProTaxoVis—protein taxonomic visualisation of presenceYin-Chen Hsieh0Mathias Bockwoldt1Ines Heiland2Department of Arctic and Marine Biology, Faculty of Biosciences, Fisheries and Economics, UiT Arctic University of NorwayDepartment of Arctic and Marine Biology, Faculty of Biosciences, Fisheries and Economics, UiT Arctic University of NorwayDepartment of Arctic and Marine Biology, Faculty of Biosciences, Fisheries and Economics, UiT Arctic University of NorwayAbstract Background: Protein presence information is an essential component of biological pathway identification. Presence of certain enzymes in an organism points towards the metabolic pathways that occur within it, whereas the absence of these enzymes indicates either the existence of alternative pathways or a lack of these pathways altogether. The same inference applies to regulatory pathways such as gene regulation and signal transduction. Protein presence information therefore forms the basis for biological pathway studies, and patterns in presence-absence across multiple organisms allow for comparative pathway analyses. Results: Here we present ProTaxoVis, a novel bioinformatic tool that extracts protein presence information from database queries and maps it to a taxonomic tree or heatmap. ProTaxoVis generates a large-scale overview of presence patterns in taxonomic clades of interest. This overview reveals protein distribution patterns, and this can be used to deduce pathway evolution or to probe other biological questions. ProTaxoVis combines and filters sequence query results to extract information on the distribution of proteins and translates this information into two types of visual outputs: taxonomic trees and heatmaps. The trees supplement their topology with scaled pie-chart representations per node of the presence of target proteins and combinations of these proteins, such that patterns in taxonomic groups can easily be identified. The heatmap visualisation shows presence and conservation of these proteins for a user-determined set of species, allowing for a more detailed view over a larger group of proteins as compared to the trees. ProTaxoVis also allows for visual quality checks of hits based on a coverage plot and a length histogram, which can be used to determine e-value and minimum protein length cutoffs. Tabular output of resulting data from the query, combined, and heatmap building step are saved and easily accessible for further analyses. Conclusions: We evaluate our tool with the phosphoribosyltransferases, a transferase enzyme family with notable distribution patterns amongst organisms of varying complexities and across Eukaryota, Bacteria, and Archaea. ProTaxoVis is open-source and available at: https://github.com/MolecularBioinformatics/ProTaxoVis .https://doi.org/10.1186/s12859-025-06146-9Protein distributionProtein taxonomyEnzyme presence-absenceSequence queryComparative pathway analysis
spellingShingle Yin-Chen Hsieh
Mathias Bockwoldt
Ines Heiland
ProTaxoVis—protein taxonomic visualisation of presence
BMC Bioinformatics
Protein distribution
Protein taxonomy
Enzyme presence-absence
Sequence query
Comparative pathway analysis
title ProTaxoVis—protein taxonomic visualisation of presence
title_full ProTaxoVis—protein taxonomic visualisation of presence
title_fullStr ProTaxoVis—protein taxonomic visualisation of presence
title_full_unstemmed ProTaxoVis—protein taxonomic visualisation of presence
title_short ProTaxoVis—protein taxonomic visualisation of presence
title_sort protaxovis protein taxonomic visualisation of presence
topic Protein distribution
Protein taxonomy
Enzyme presence-absence
Sequence query
Comparative pathway analysis
url https://doi.org/10.1186/s12859-025-06146-9
work_keys_str_mv AT yinchenhsieh protaxovisproteintaxonomicvisualisationofpresence
AT mathiasbockwoldt protaxovisproteintaxonomicvisualisationofpresence
AT inesheiland protaxovisproteintaxonomicvisualisationofpresence