G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies

Abstract Background In medical device validation and verification studies, the area under the receiver operating characteristic curve (AUROC) is often used as a primary endpoint despite multiple reports showing its limitations. Hence, researchers are encouraged to consider alternative metrics as pri...

Full description

Saved in:
Bibliographic Details
Main Author: Andrew Marra
Format: Article
Language:English
Published: BMC 2024-10-01
Series:BioData Mining
Subjects:
Online Access:https://doi.org/10.1186/s13040-024-00402-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850203944602566656
author Andrew Marra
author_facet Andrew Marra
author_sort Andrew Marra
collection DOAJ
description Abstract Background In medical device validation and verification studies, the area under the receiver operating characteristic curve (AUROC) is often used as a primary endpoint despite multiple reports showing its limitations. Hence, researchers are encouraged to consider alternative metrics as primary endpoints. A new metric called G4 is presented, which is the geometric mean of sensitivity, specificity, the positive predictive value, and the negative predictive value. G4 is part of a balanced metric family which includes the Unified Performance Measure (also known as P4) and the Matthews’ Correlation Coefficient (MCC). The purpose of this manuscript is to unveil the benefits of using G4 together with the balanced metric family when analyzing the overall performance of binary classifiers. Results Simulated datasets encompassing different prevalence rates of the minority class were analyzed under a multi-reader-multi-case study design. In addition, data from an independently published study that tested the performance of a unique ultrasound artificial intelligence algorithm in the context of breast cancer detection was also considered. Within each dataset, AUROC was reported alongside the balanced metric family for comparison. When the dataset prevalence and bias of the minority class approached 50%, all three balanced metrics provided equivalent interpretations of an AI’s performance. As the prevalence rate increased / decreased and the data became more imbalanced, AUROC tended to overvalue / undervalue the true classifier performance, while the balanced metric family was resistant to such imbalance. Under certain circumstances where data imbalance was strong (minority-class prevalence < 10%), MCC was preferred for standalone assessments while P4 provided a stronger effect size when evaluating between-groups analyses. G4 acted as a middle ground for maximizing both standalone assessments and between-groups analyses. Conclusions Use of AUROC as the primary endpoint in binary classification problems provides misleading results as the dataset becomes more imbalanced. This is explicitly noticed when incorporating AUROC in medical device validation and verification studies. G4, P4, and MCC do not share this limitation and paint a more complete picture of a medical device’s performance in a clinical setting. Therefore, researchers are encouraged to explore the balanced metric family when evaluating binary classification problems.
format Article
id doaj-art-e6b326cf971d402dbe77259dfc44f050
institution OA Journals
issn 1756-0381
language English
publishDate 2024-10-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj-art-e6b326cf971d402dbe77259dfc44f0502025-08-20T02:11:24ZengBMCBioData Mining1756-03812024-10-0117112010.1186/s13040-024-00402-zG4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studiesAndrew Marra0Clinical Biostatistician at GE HealthcareAbstract Background In medical device validation and verification studies, the area under the receiver operating characteristic curve (AUROC) is often used as a primary endpoint despite multiple reports showing its limitations. Hence, researchers are encouraged to consider alternative metrics as primary endpoints. A new metric called G4 is presented, which is the geometric mean of sensitivity, specificity, the positive predictive value, and the negative predictive value. G4 is part of a balanced metric family which includes the Unified Performance Measure (also known as P4) and the Matthews’ Correlation Coefficient (MCC). The purpose of this manuscript is to unveil the benefits of using G4 together with the balanced metric family when analyzing the overall performance of binary classifiers. Results Simulated datasets encompassing different prevalence rates of the minority class were analyzed under a multi-reader-multi-case study design. In addition, data from an independently published study that tested the performance of a unique ultrasound artificial intelligence algorithm in the context of breast cancer detection was also considered. Within each dataset, AUROC was reported alongside the balanced metric family for comparison. When the dataset prevalence and bias of the minority class approached 50%, all three balanced metrics provided equivalent interpretations of an AI’s performance. As the prevalence rate increased / decreased and the data became more imbalanced, AUROC tended to overvalue / undervalue the true classifier performance, while the balanced metric family was resistant to such imbalance. Under certain circumstances where data imbalance was strong (minority-class prevalence < 10%), MCC was preferred for standalone assessments while P4 provided a stronger effect size when evaluating between-groups analyses. G4 acted as a middle ground for maximizing both standalone assessments and between-groups analyses. Conclusions Use of AUROC as the primary endpoint in binary classification problems provides misleading results as the dataset becomes more imbalanced. This is explicitly noticed when incorporating AUROC in medical device validation and verification studies. G4, P4, and MCC do not share this limitation and paint a more complete picture of a medical device’s performance in a clinical setting. Therefore, researchers are encouraged to explore the balanced metric family when evaluating binary classification problems.https://doi.org/10.1186/s13040-024-00402-zG4P4MCCMatthewBinary classificationMRMC
spellingShingle Andrew Marra
G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies
BioData Mining
G4
P4
MCC
Matthew
Binary classification
MRMC
title G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies
title_full G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies
title_fullStr G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies
title_full_unstemmed G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies
title_short G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies
title_sort g4 the balanced metric family a novel approach to solving binary classification problems in medical device validation verification studies
topic G4
P4
MCC
Matthew
Binary classification
MRMC
url https://doi.org/10.1186/s13040-024-00402-z
work_keys_str_mv AT andrewmarra g4thebalancedmetricfamilyanovelapproachtosolvingbinaryclassificationproblemsinmedicaldevicevalidationverificationstudies