A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes

Abstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample s...

Full description

Saved in:

Bibliographic Details
Main Authors:	Richard D. Riley, Gary S. Collins, Rebecca Whittle, Lucinda Archer, Kym I. E. Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K. Denniston, Frank E. Harrell, Laure Wynants, Glen P. Martin, Joie Ensor
Format:	Article
Language:	English
Published:	BMC 2025-07-01
Series:	Diagnostic and Prognostic Research
Subjects:	Clinical prediction models Sample size Uncertainty intervals Instability Classification Fisher’s information matrix
Online Access:	https://doi.org/10.1186/s41512-025-00193-9
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849388735519522816
author	Richard D. Riley Gary S. Collins Rebecca Whittle Lucinda Archer Kym I. E. Snell Paula Dhiman Laura Kirton Amardeep Legha Xiaoxuan Liu Alastair K. Denniston Frank E. Harrell Laure Wynants Glen P. Martin Joie Ensor
author_facet	Richard D. Riley Gary S. Collins Rebecca Whittle Lucinda Archer Kym I. E. Snell Paula Dhiman Laura Kirton Amardeep Legha Xiaoxuan Liu Alastair K. Denniston Frank E. Harrell Laure Wynants Glen P. Martin Joie Ensor
author_sort	Richard D. Riley
collection	DOAJ
description	Abstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates. Methods We propose a decomposition of Fisher’s information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed ‘core model’ either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors. Results We produce closed-form solutions that decompose the variance of an individual’s risk estimate into the Fisher’s unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks. Conclusions Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.
format	Article
id	doaj-art-de3e2d11358a47499ebc1b11f0a3fe54
institution	Kabale University
issn	2397-7523
language	English
publishDate	2025-07-01
publisher	BMC
record_format	Article
series	Diagnostic and Prognostic Research
spelling	doaj-art-de3e2d11358a47499ebc1b11f0a3fe542025-08-20T03:42:10ZengBMCDiagnostic and Prognostic Research2397-75232025-07-019111710.1186/s41512-025-00193-9A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomesRichard D. Riley0Gary S. Collins1Rebecca Whittle2Lucinda Archer3Kym I. E. Snell4Paula Dhiman5Laura Kirton6Amardeep Legha7Xiaoxuan Liu8Alastair K. Denniston9Frank E. Harrell10Laure Wynants11Glen P. Martin12Joie Ensor13Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordCancer Research UK Clinical Trials Unit, Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamNational Institute for Health and Care Research (NIHR) Birmingham Biomedical Research CentreNational Institute for Health and Care Research (NIHR) Birmingham Biomedical Research CentreDepartment of Biostatistics, Vanderbilt University School of MedicineDepartment of Epidemiology, Care and Public Health Research Institute (CAPHRI), Maastricht UniversityDivision of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science CentreDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamAbstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates. Methods We propose a decomposition of Fisher’s information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed ‘core model’ either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors. Results We produce closed-form solutions that decompose the variance of an individual’s risk estimate into the Fisher’s unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks. Conclusions Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.https://doi.org/10.1186/s41512-025-00193-9Clinical prediction modelsSample sizeUncertainty intervalsInstabilityClassificationFisher’s information matrix
spellingShingle	Richard D. Riley Gary S. Collins Rebecca Whittle Lucinda Archer Kym I. E. Snell Paula Dhiman Laura Kirton Amardeep Legha Xiaoxuan Liu Alastair K. Denniston Frank E. Harrell Laure Wynants Glen P. Martin Joie Ensor A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes Diagnostic and Prognostic Research Clinical prediction models Sample size Uncertainty intervals Instability Classification Fisher’s information matrix
title	A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_full	A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_fullStr	A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_full_unstemmed	A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_short	A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
title_sort	decomposition of fisher s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk part 1 binary outcomes
topic	Clinical prediction models Sample size Uncertainty intervals Instability Classification Fisher’s information matrix
url	https://doi.org/10.1186/s41512-025-00193-9
work_keys_str_mv	AT richarddriley adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT garyscollins adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT rebeccawhittle adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT lucindaarcher adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT kymiesnell adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT pauladhiman adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurakirton adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT amardeeplegha adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT xiaoxuanliu adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT alastairkdenniston adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT frankeharrell adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurewynants adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT glenpmartin adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT joieensor adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT richarddriley decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT garyscollins decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT rebeccawhittle decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT lucindaarcher decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT kymiesnell decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT pauladhiman decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurakirton decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT amardeeplegha decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT xiaoxuanliu decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT alastairkdenniston decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT frankeharrell decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurewynants decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT glenpmartin decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT joieensor decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes

A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes

Similar Items