A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes

Abstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample s...

Full description

Saved in:
Bibliographic Details
Main Authors: Richard D. Riley, Gary S. Collins, Rebecca Whittle, Lucinda Archer, Kym I. E. Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K. Denniston, Frank E. Harrell, Laure Wynants, Glen P. Martin, Joie Ensor
Format: Article
Language:English
Published: BMC 2025-07-01
Series:Diagnostic and Prognostic Research
Subjects:
Online Access:https://doi.org/10.1186/s41512-025-00193-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849388735519522816
author Richard D. Riley
Gary S. Collins
Rebecca Whittle
Lucinda Archer
Kym I. E. Snell
Paula Dhiman
Laura Kirton
Amardeep Legha
Xiaoxuan Liu
Alastair K. Denniston
Frank E. Harrell
Laure Wynants
Glen P. Martin
Joie Ensor
author_facet Richard D. Riley
Gary S. Collins
Rebecca Whittle
Lucinda Archer
Kym I. E. Snell
Paula Dhiman
Laura Kirton
Amardeep Legha
Xiaoxuan Liu
Alastair K. Denniston
Frank E. Harrell
Laure Wynants
Glen P. Martin
Joie Ensor
author_sort Richard D. Riley
collection DOAJ
description Abstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates. Methods We propose a decomposition of Fisher’s information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed ‘core model’ either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors. Results We produce closed-form solutions that decompose the variance of an individual’s risk estimate into the Fisher’s unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks. Conclusions Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.
format Article
id doaj-art-de3e2d11358a47499ebc1b11f0a3fe54
institution Kabale University
issn 2397-7523
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series Diagnostic and Prognostic Research
spelling doaj-art-de3e2d11358a47499ebc1b11f0a3fe542025-08-20T03:42:10ZengBMCDiagnostic and Prognostic Research2397-75232025-07-019111710.1186/s41512-025-00193-9A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomesRichard D. Riley0Gary S. Collins1Rebecca Whittle2Lucinda Archer3Kym I. E. Snell4Paula Dhiman5Laura Kirton6Amardeep Legha7Xiaoxuan Liu8Alastair K. Denniston9Frank E. Harrell10Laure Wynants11Glen P. Martin12Joie Ensor13Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordCancer Research UK Clinical Trials Unit, Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamNational Institute for Health and Care Research (NIHR) Birmingham Biomedical Research CentreNational Institute for Health and Care Research (NIHR) Birmingham Biomedical Research CentreDepartment of Biostatistics, Vanderbilt University School of MedicineDepartment of Epidemiology, Care and Public Health Research Institute (CAPHRI), Maastricht UniversityDivision of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science CentreDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamAbstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates. Methods We propose a decomposition of Fisher’s information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed ‘core model’ either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors. Results We produce closed-form solutions that decompose the variance of an individual’s risk estimate into the Fisher’s unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks. Conclusions Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.https://doi.org/10.1186/s41512-025-00193-9Clinical prediction modelsSample sizeUncertainty intervalsInstabilityClassificationFisher’s information matrix
spellingShingle Richard D. Riley
Gary S. Collins
Rebecca Whittle
Lucinda Archer
Kym I. E. Snell
Paula Dhiman
Laura Kirton
Amardeep Legha
Xiaoxuan Liu
Alastair K. Denniston
Frank E. Harrell
Laure Wynants
Glen P. Martin
Joie Ensor
A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
Diagnostic and Prognostic Research
Clinical prediction models
Sample size
Uncertainty intervals
Instability
Classification
Fisher’s information matrix
title A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_full A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_fullStr A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_full_unstemmed A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_short A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_sort decomposition of fisher s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk part 1 binary outcomes
topic Clinical prediction models
Sample size
Uncertainty intervals
Instability
Classification
Fisher’s information matrix
url https://doi.org/10.1186/s41512-025-00193-9
work_keys_str_mv AT richarddriley adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT garyscollins adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT rebeccawhittle adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT lucindaarcher adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT kymiesnell adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT pauladhiman adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT laurakirton adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT amardeeplegha adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT xiaoxuanliu adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT alastairkdenniston adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT frankeharrell adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT laurewynants adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT glenpmartin adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT joieensor adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT richarddriley decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT garyscollins decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT rebeccawhittle decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT lucindaarcher decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT kymiesnell decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT pauladhiman decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT laurakirton decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT amardeeplegha decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT xiaoxuanliu decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT alastairkdenniston decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT frankeharrell decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT laurewynants decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT glenpmartin decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes
AT joieensor decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes