GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis

Abstract Background Gene-Set Analysis (GSA) is commonly used to analyze high-throughput experiments. However, GSA cannot readily disentangle clusters or pathways due to redundancies in upstream knowledge bases, which hinders comprehensive exploration and interpretation of biological findings. To add...

Full description

Saved in:
Bibliographic Details
Main Authors: Asier Ortega-Legarreta, Alberto Maillo, Daniel Mouzo, Ana Rosa López-Pérez, Lara Kular, Majid Pahlevan Kakhki, Maja Jagodic, Jesper Tegner, Vincenzo Lagani, Ewoud Ewing, David Gomez-Cabrero
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06249-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849225820701196288
author Asier Ortega-Legarreta
Alberto Maillo
Daniel Mouzo
Ana Rosa López-Pérez
Lara Kular
Majid Pahlevan Kakhki
Maja Jagodic
Jesper Tegner
Vincenzo Lagani
Ewoud Ewing
David Gomez-Cabrero
author_facet Asier Ortega-Legarreta
Alberto Maillo
Daniel Mouzo
Ana Rosa López-Pérez
Lara Kular
Majid Pahlevan Kakhki
Maja Jagodic
Jesper Tegner
Vincenzo Lagani
Ewoud Ewing
David Gomez-Cabrero
author_sort Asier Ortega-Legarreta
collection DOAJ
description Abstract Background Gene-Set Analysis (GSA) is commonly used to analyze high-throughput experiments. However, GSA cannot readily disentangle clusters or pathways due to redundancies in upstream knowledge bases, which hinders comprehensive exploration and interpretation of biological findings. To address this challenge, we developed GeneSetCluster, an R package designed to summarize and integrate GSA results. Over time, we and users as well identified limitations in the original version, such as difficulties in managing redundancies across multiple gene-sets, large computational times, and its lack of accessibility for users without programming expertise. Results We present GeneSetCluster 2.0, a comprehensive upgrade that delivers methodological, computational, interpretative, and user-experience enhancements. Methodologically, GeneSetCluster 2.0 introduces a novel approach to address duplicated gene-sets and implements a seriation-based clustering algorithm that reorders results, aiding pattern identification. Computationally, the package is optimized for parallel processing, significantly reducing execution time. GeneSetCluster 2.0 enhances cluster annotations by associating clusters with relevant tissues and biological processes to improve biological interpretation, particularly for human and mouse data. To broaden accessibility, we have developed a user-friendly web application enabling non-programmers to use it. This version also ensures seamless integration between the R package, catering to users with programming expertise, and the web application for broader audiences. We evaluated the updates in a single-cell RNA public dataset. Conclusion GeneSetCluster 2.0 offers substantial improvements over its predecessor. Furthermore, by bridging the gap between bioinformaticians and clinicians in multidisciplinary teams, GeneSetCluster 2.0 facilitates collaborative research. The R package and web application, along with detailed installation and usage guides, are available on GitHub ( https://github.com/TranslationalBioinformaticsUnit/GeneSetCluster2.0 ), and the web application can be accessed at https://translationalbio.shinyapps.io/genesetcluster/ .
format Article
id doaj-art-1bca2362ac8e4128b847ea3099270155
institution Kabale University
issn 1471-2105
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-1bca2362ac8e4128b847ea30992701552025-08-24T11:54:36ZengBMCBMC Bioinformatics1471-21052025-08-0126111710.1186/s12859-025-06249-3GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysisAsier Ortega-Legarreta0Alberto Maillo1Daniel Mouzo2Ana Rosa López-Pérez3Lara Kular4Majid Pahlevan Kakhki5Maja Jagodic6Jesper Tegner7Vincenzo Lagani8Ewoud Ewing9David Gomez-Cabrero10Translational Bioinformatics Unit, Navarrabiomed, Hospital Universitario de Navarra (HUN), Universidad Pública de Navarra (UPNA), IdiSNABiological and Environmental Sciences and Engineering Division, King Abdullah University of Science and TechnologyTranslational Bioinformatics Unit, Navarrabiomed, Hospital Universitario de Navarra (HUN), Universidad Pública de Navarra (UPNA), IdiSNATranslational Bioinformatics Unit, Navarrabiomed, Hospital Universitario de Navarra (HUN), Universidad Pública de Navarra (UPNA), IdiSNADepartment of Clinical Neuroscience, Karolinska Institutet, and Center for Molecular Medicine, Karolinska University HospitalDepartment of Clinical Neuroscience, Karolinska Institutet, and Center for Molecular Medicine, Karolinska University HospitalDepartment of Clinical Neuroscience, Karolinska Institutet, and Center for Molecular Medicine, Karolinska University HospitalBiological and Environmental Sciences and Engineering Division, King Abdullah University of Science and TechnologyBiological and Environmental Sciences and Engineering Division, King Abdullah University of Science and TechnologyDepartment of Clinical Neuroscience, Karolinska Institutet, and Center for Molecular Medicine, Karolinska University HospitalBiological and Environmental Sciences and Engineering Division, King Abdullah University of Science and TechnologyAbstract Background Gene-Set Analysis (GSA) is commonly used to analyze high-throughput experiments. However, GSA cannot readily disentangle clusters or pathways due to redundancies in upstream knowledge bases, which hinders comprehensive exploration and interpretation of biological findings. To address this challenge, we developed GeneSetCluster, an R package designed to summarize and integrate GSA results. Over time, we and users as well identified limitations in the original version, such as difficulties in managing redundancies across multiple gene-sets, large computational times, and its lack of accessibility for users without programming expertise. Results We present GeneSetCluster 2.0, a comprehensive upgrade that delivers methodological, computational, interpretative, and user-experience enhancements. Methodologically, GeneSetCluster 2.0 introduces a novel approach to address duplicated gene-sets and implements a seriation-based clustering algorithm that reorders results, aiding pattern identification. Computationally, the package is optimized for parallel processing, significantly reducing execution time. GeneSetCluster 2.0 enhances cluster annotations by associating clusters with relevant tissues and biological processes to improve biological interpretation, particularly for human and mouse data. To broaden accessibility, we have developed a user-friendly web application enabling non-programmers to use it. This version also ensures seamless integration between the R package, catering to users with programming expertise, and the web application for broader audiences. We evaluated the updates in a single-cell RNA public dataset. Conclusion GeneSetCluster 2.0 offers substantial improvements over its predecessor. Furthermore, by bridging the gap between bioinformaticians and clinicians in multidisciplinary teams, GeneSetCluster 2.0 facilitates collaborative research. The R package and web application, along with detailed installation and usage guides, are available on GitHub ( https://github.com/TranslationalBioinformaticsUnit/GeneSetCluster2.0 ), and the web application can be accessed at https://translationalbio.shinyapps.io/genesetcluster/ .https://doi.org/10.1186/s12859-025-06249-3Gene-set analysisGene-set enrichment analysisFunctional annotationSeriation-based clusteringWeb applicationData-mining
spellingShingle Asier Ortega-Legarreta
Alberto Maillo
Daniel Mouzo
Ana Rosa López-Pérez
Lara Kular
Majid Pahlevan Kakhki
Maja Jagodic
Jesper Tegner
Vincenzo Lagani
Ewoud Ewing
David Gomez-Cabrero
GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis
BMC Bioinformatics
Gene-set analysis
Gene-set enrichment analysis
Functional annotation
Seriation-based clustering
Web application
Data-mining
title GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis
title_full GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis
title_fullStr GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis
title_full_unstemmed GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis
title_short GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis
title_sort genesetcluster 2 0 a comprehensive toolset for summarizing and integrating gene sets analysis
topic Gene-set analysis
Gene-set enrichment analysis
Functional annotation
Seriation-based clustering
Web application
Data-mining
url https://doi.org/10.1186/s12859-025-06249-3
work_keys_str_mv AT asierortegalegarreta genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT albertomaillo genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT danielmouzo genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT anarosalopezperez genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT larakular genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT majidpahlevankakhki genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT majajagodic genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT jespertegner genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT vincenzolagani genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT ewoudewing genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis
AT davidgomezcabrero genesetcluster20acomprehensivetoolsetforsummarizingandintegratinggenesetsanalysis