Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework

Abstract Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the p...

Full description

Saved in:
Bibliographic Details
Main Authors: Yasemin Bridges, Vinicius de Souza, Katherina G. Cortes, Melissa Haendel, Nomi L. Harris, Daniel R. Korn, Nikolaos M. Marinakis, Nicolas Matentzoglu, James A. McLaughlin, Christopher J. Mungall, Aaron Odell, David Osumi-Sutherland, Peter N. Robinson, Damian Smedley, Julius O. B. Jacobsen
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06105-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849389987568549888
author Yasemin Bridges
Vinicius de Souza
Katherina G. Cortes
Melissa Haendel
Nomi L. Harris
Daniel R. Korn
Nikolaos M. Marinakis
Nicolas Matentzoglu
James A. McLaughlin
Christopher J. Mungall
Aaron Odell
David Osumi-Sutherland
Peter N. Robinson
Damian Smedley
Julius O. B. Jacobsen
author_facet Yasemin Bridges
Vinicius de Souza
Katherina G. Cortes
Melissa Haendel
Nomi L. Harris
Daniel R. Korn
Nikolaos M. Marinakis
Nicolas Matentzoglu
James A. McLaughlin
Christopher J. Mungall
Aaron Odell
David Osumi-Sutherland
Peter N. Robinson
Damian Smedley
Julius O. B. Jacobsen
author_sort Yasemin Bridges
collection DOAJ
description Abstract Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs—ultimately hindering the development of effective prioritisation tools. Results: In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets. Conclusions: PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care
format Article
id doaj-art-bb1e55d7dbdf4cbd8c7b908b67733505
institution Kabale University
issn 1471-2105
language English
publishDate 2025-03-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-bb1e55d7dbdf4cbd8c7b908b677335052025-08-20T03:41:47ZengBMCBMC Bioinformatics1471-21052025-03-0126111810.1186/s12859-025-06105-4Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation frameworkYasemin Bridges0Vinicius de Souza1Katherina G. Cortes2Melissa Haendel3Nomi L. Harris4Daniel R. Korn5Nikolaos M. Marinakis6Nicolas Matentzoglu7James A. McLaughlin8Christopher J. Mungall9Aaron Odell10David Osumi-Sutherland11Peter N. Robinson12Damian Smedley13Julius O. B. Jacobsen14William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of LondonEuropean Bioinformatics Institute (EMBL-EBI)School of Public Health, University of Colorado Anschutz Medical CampusDepartment of Genetics, University of North Carolina, Chapel HillDivision of Environmental Genomics and Systems Biology, Lawrence Berkeley National LaboratoryDepartment of Genetics, University of North Carolina, Chapel HillLaboratory of Medical Genetics, National and Kapodistrian University of AthensSemanticlySamples, Phenotypes, and Ontologies (SPOT), European Bioinformatics Institute (EMBL-EBI)Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National LaboratoryDepartment of Genetics, University of North Carolina, Chapel HillWellcome Trust Sanger InstituteBerlin Institute of Health, Charité – Universitätsmedizin BerlinWilliam Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of LondonWilliam Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of LondonAbstract Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs—ultimately hindering the development of effective prioritisation tools. Results: In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets. Conclusions: PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and carehttps://doi.org/10.1186/s12859-025-06105-4Variant prioritisationPhenopacketsBenchmarking FrameworkPhenotype-driven analysisBioinformaticsRare disease diagnosis
spellingShingle Yasemin Bridges
Vinicius de Souza
Katherina G. Cortes
Melissa Haendel
Nomi L. Harris
Daniel R. Korn
Nikolaos M. Marinakis
Nicolas Matentzoglu
James A. McLaughlin
Christopher J. Mungall
Aaron Odell
David Osumi-Sutherland
Peter N. Robinson
Damian Smedley
Julius O. B. Jacobsen
Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework
BMC Bioinformatics
Variant prioritisation
Phenopackets
Benchmarking Framework
Phenotype-driven analysis
Bioinformatics
Rare disease diagnosis
title Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework
title_full Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework
title_fullStr Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework
title_full_unstemmed Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework
title_short Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework
title_sort towards a standard benchmark for phenotype driven variant and gene prioritisation algorithms pheval phenotypic inference evaluation framework
topic Variant prioritisation
Phenopackets
Benchmarking Framework
Phenotype-driven analysis
Bioinformatics
Rare disease diagnosis
url https://doi.org/10.1186/s12859-025-06105-4
work_keys_str_mv AT yaseminbridges towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT viniciusdesouza towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT katherinagcortes towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT melissahaendel towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT nomilharris towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT danielrkorn towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT nikolaosmmarinakis towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT nicolasmatentzoglu towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT jamesamclaughlin towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT christopherjmungall towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT aaronodell towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT davidosumisutherland towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT peternrobinson towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT damiansmedley towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework
AT juliusobjacobsen towardsastandardbenchmarkforphenotypedrivenvariantandgeneprioritisationalgorithmsphevalphenotypicinferenceevaluationframework