A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes
Abstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample s...
Saved in:
| Main Authors: | , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | Diagnostic and Prognostic Research |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s41512-025-00193-9 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849388735519522816 |
|---|---|
| author | Richard D. Riley Gary S. Collins Rebecca Whittle Lucinda Archer Kym I. E. Snell Paula Dhiman Laura Kirton Amardeep Legha Xiaoxuan Liu Alastair K. Denniston Frank E. Harrell Laure Wynants Glen P. Martin Joie Ensor |
| author_facet | Richard D. Riley Gary S. Collins Rebecca Whittle Lucinda Archer Kym I. E. Snell Paula Dhiman Laura Kirton Amardeep Legha Xiaoxuan Liu Alastair K. Denniston Frank E. Harrell Laure Wynants Glen P. Martin Joie Ensor |
| author_sort | Richard D. Riley |
| collection | DOAJ |
| description | Abstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates. Methods We propose a decomposition of Fisher’s information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed ‘core model’ either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors. Results We produce closed-form solutions that decompose the variance of an individual’s risk estimate into the Fisher’s unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks. Conclusions Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes. |
| format | Article |
| id | doaj-art-de3e2d11358a47499ebc1b11f0a3fe54 |
| institution | Kabale University |
| issn | 2397-7523 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | Diagnostic and Prognostic Research |
| spelling | doaj-art-de3e2d11358a47499ebc1b11f0a3fe542025-08-20T03:42:10ZengBMCDiagnostic and Prognostic Research2397-75232025-07-019111710.1186/s41512-025-00193-9A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomesRichard D. Riley0Gary S. Collins1Rebecca Whittle2Lucinda Archer3Kym I. E. Snell4Paula Dhiman5Laura Kirton6Amardeep Legha7Xiaoxuan Liu8Alastair K. Denniston9Frank E. Harrell10Laure Wynants11Glen P. Martin12Joie Ensor13Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamCentre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of OxfordCancer Research UK Clinical Trials Unit, Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of BirminghamDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamNational Institute for Health and Care Research (NIHR) Birmingham Biomedical Research CentreNational Institute for Health and Care Research (NIHR) Birmingham Biomedical Research CentreDepartment of Biostatistics, Vanderbilt University School of MedicineDepartment of Epidemiology, Care and Public Health Research Institute (CAPHRI), Maastricht UniversityDivision of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science CentreDepartment of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of BirminghamAbstract Background When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates. Methods We propose a decomposition of Fisher’s information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed ‘core model’ either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors. Results We produce closed-form solutions that decompose the variance of an individual’s risk estimate into the Fisher’s unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks. Conclusions Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.https://doi.org/10.1186/s41512-025-00193-9Clinical prediction modelsSample sizeUncertainty intervalsInstabilityClassificationFisher’s information matrix |
| spellingShingle | Richard D. Riley Gary S. Collins Rebecca Whittle Lucinda Archer Kym I. E. Snell Paula Dhiman Laura Kirton Amardeep Legha Xiaoxuan Liu Alastair K. Denniston Frank E. Harrell Laure Wynants Glen P. Martin Joie Ensor A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes Diagnostic and Prognostic Research Clinical prediction models Sample size Uncertainty intervals Instability Classification Fisher’s information matrix |
| title | A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes |
| title_full | A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes |
| title_fullStr | A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes |
| title_full_unstemmed | A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes |
| title_short | A decomposition of Fisher’s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk—part 1: binary outcomes |
| title_sort | decomposition of fisher s information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk part 1 binary outcomes |
| topic | Clinical prediction models Sample size Uncertainty intervals Instability Classification Fisher’s information matrix |
| url | https://doi.org/10.1186/s41512-025-00193-9 |
| work_keys_str_mv | AT richarddriley adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT garyscollins adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT rebeccawhittle adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT lucindaarcher adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT kymiesnell adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT pauladhiman adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurakirton adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT amardeeplegha adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT xiaoxuanliu adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT alastairkdenniston adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT frankeharrell adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurewynants adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT glenpmartin adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT joieensor adecompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT richarddriley decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT garyscollins decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT rebeccawhittle decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT lucindaarcher decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT kymiesnell decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT pauladhiman decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurakirton decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT amardeeplegha decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT xiaoxuanliu decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT alastairkdenniston decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT frankeharrell decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT laurewynants decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT glenpmartin decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes AT joieensor decompositionoffishersinformationtoinformsamplesizefordevelopingorupdatingfairandpreciseclinicalpredictionmodelsforindividualriskpart1binaryoutcomes |