Comparing imputation approaches to handle systematically missing inputs in risk calculators.

Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficul...

Full description

Saved in:
Bibliographic Details
Main Authors: Anja Mühlemann, Philip Stange, Antoine Faul, Serena Lozza-Fiacco, Rowan Iskandar, Manuela Moraru, Susanne Theis, Petra Stute, Ben D Spycher, David Ginsbourger
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLOS Digital Health
Online Access:https://doi.org/10.1371/journal.pdig.0000712
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825206831008448512
author Anja Mühlemann
Philip Stange
Antoine Faul
Serena Lozza-Fiacco
Rowan Iskandar
Manuela Moraru
Susanne Theis
Petra Stute
Ben D Spycher
David Ginsbourger
author_facet Anja Mühlemann
Philip Stange
Antoine Faul
Serena Lozza-Fiacco
Rowan Iskandar
Manuela Moraru
Susanne Theis
Petra Stute
Ben D Spycher
David Ginsbourger
author_sort Anja Mühlemann
collection DOAJ
description Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.
format Article
id doaj-art-e25b8b298c554543a36cd8484e18d91c
institution Kabale University
issn 2767-3170
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLOS Digital Health
spelling doaj-art-e25b8b298c554543a36cd8484e18d91c2025-02-07T05:31:11ZengPublic Library of Science (PLoS)PLOS Digital Health2767-31702025-01-0141e000071210.1371/journal.pdig.0000712Comparing imputation approaches to handle systematically missing inputs in risk calculators.Anja MühlemannPhilip StangeAntoine FaulSerena Lozza-FiaccoRowan IskandarManuela MoraruSusanne TheisPetra StuteBen D SpycherDavid GinsbourgerRisk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.https://doi.org/10.1371/journal.pdig.0000712
spellingShingle Anja Mühlemann
Philip Stange
Antoine Faul
Serena Lozza-Fiacco
Rowan Iskandar
Manuela Moraru
Susanne Theis
Petra Stute
Ben D Spycher
David Ginsbourger
Comparing imputation approaches to handle systematically missing inputs in risk calculators.
PLOS Digital Health
title Comparing imputation approaches to handle systematically missing inputs in risk calculators.
title_full Comparing imputation approaches to handle systematically missing inputs in risk calculators.
title_fullStr Comparing imputation approaches to handle systematically missing inputs in risk calculators.
title_full_unstemmed Comparing imputation approaches to handle systematically missing inputs in risk calculators.
title_short Comparing imputation approaches to handle systematically missing inputs in risk calculators.
title_sort comparing imputation approaches to handle systematically missing inputs in risk calculators
url https://doi.org/10.1371/journal.pdig.0000712
work_keys_str_mv AT anjamuhlemann comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT philipstange comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT antoinefaul comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT serenalozzafiacco comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT rowaniskandar comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT manuelamoraru comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT susannetheis comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT petrastute comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT bendspycher comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators
AT davidginsbourger comparingimputationapproachestohandlesystematicallymissinginputsinriskcalculators