Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis

Abstract BackgroundA major challenge in using electronic health records (EHR) is the inconsistency of patient follow-up, resulting in right-censored outcomes. This becomes particularly problematic in long-horizon event predictions, such as autism and attention-deficit/hyperact...

Full description

Saved in:
Bibliographic Details
Main Authors: De Rong Loh, Elliot D Hill, Nan Liu, Geraldine Dawson, Matthew M Engelhard
Format: Article
Language:English
Published: JMIR Publications 2025-03-01
Series:JMIR AI
Online Access:https://ai.jmir.org/2025/1/e62985
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849709051378663424
author De Rong Loh
Elliot D Hill
Nan Liu
Geraldine Dawson
Matthew M Engelhard
author_facet De Rong Loh
Elliot D Hill
Nan Liu
Geraldine Dawson
Matthew M Engelhard
author_sort De Rong Loh
collection DOAJ
description Abstract BackgroundA major challenge in using electronic health records (EHR) is the inconsistency of patient follow-up, resulting in right-censored outcomes. This becomes particularly problematic in long-horizon event predictions, such as autism and attention-deficit/hyperactivity disorder (ADHD) diagnoses, where a significant number of patients are lost to follow-up before the outcome can be observed. Consequently, fully supervised methods such as binary classification (BC), which are trained to predict observed diagnoses, are substantially affected by the probability of sufficient follow-up, leading to biased results. ObjectiveThis empirical analysis aims to characterize BC’s inherent limitations for long-horizon diagnosis prediction from EHR; and quantify the benefits of a specific time-to-event (TTE) approach, the discrete-time neural network (DTNN). MethodsRecords within the Duke University Health System EHR were analyzed, extracting features such as ICD-10International Classification of Diseases, Tenth Revisiontt ResultsTTE models consistently had comparable or higher ttYOB≤2020YOB≤2020ttttYOB≤2020 ConclusionsBC models substantially underpredicted diagnosis likelihood and inappropriately assigned lower probability scores to individuals with earlier censoring. Common filtering strategies did not adequately address this limitation. TTE approaches, particularly DTNN, effectively mitigated bias from the censoring distribution, resulting in superior discrimination and calibration performance and more accurate prediction of clinical prevalence. Machine learning practitioners should recognize the limitations of BC for long-horizon diagnosis prediction and adopt TTE approaches. The DTNN in particular is well-suited to mitigate the effects of right-censoring and maximize prediction performance in this setting.
format Article
id doaj-art-3bdd851d1bcd4fdebcde9bab6a540daa
institution DOAJ
issn 2817-1705
language English
publishDate 2025-03-01
publisher JMIR Publications
record_format Article
series JMIR AI
spelling doaj-art-3bdd851d1bcd4fdebcde9bab6a540daa2025-08-20T03:15:27ZengJMIR PublicationsJMIR AI2817-17052025-03-014e62985e6298510.2196/62985Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical AnalysisDe Rong Lohhttp://orcid.org/0000-0003-4090-8649Elliot D Hillhttp://orcid.org/0009-0004-1987-3749Nan Liuhttp://orcid.org/0000-0003-3610-4883Geraldine Dawsonhttp://orcid.org/0000-0003-1410-2764Matthew M Engelhardhttp://orcid.org/0000-0003-4112-9639 Abstract BackgroundA major challenge in using electronic health records (EHR) is the inconsistency of patient follow-up, resulting in right-censored outcomes. This becomes particularly problematic in long-horizon event predictions, such as autism and attention-deficit/hyperactivity disorder (ADHD) diagnoses, where a significant number of patients are lost to follow-up before the outcome can be observed. Consequently, fully supervised methods such as binary classification (BC), which are trained to predict observed diagnoses, are substantially affected by the probability of sufficient follow-up, leading to biased results. ObjectiveThis empirical analysis aims to characterize BC’s inherent limitations for long-horizon diagnosis prediction from EHR; and quantify the benefits of a specific time-to-event (TTE) approach, the discrete-time neural network (DTNN). MethodsRecords within the Duke University Health System EHR were analyzed, extracting features such as ICD-10International Classification of Diseases, Tenth Revisiontt ResultsTTE models consistently had comparable or higher ttYOB≤2020YOB≤2020ttttYOB≤2020 ConclusionsBC models substantially underpredicted diagnosis likelihood and inappropriately assigned lower probability scores to individuals with earlier censoring. Common filtering strategies did not adequately address this limitation. TTE approaches, particularly DTNN, effectively mitigated bias from the censoring distribution, resulting in superior discrimination and calibration performance and more accurate prediction of clinical prevalence. Machine learning practitioners should recognize the limitations of BC for long-horizon diagnosis prediction and adopt TTE approaches. The DTNN in particular is well-suited to mitigate the effects of right-censoring and maximize prediction performance in this setting.https://ai.jmir.org/2025/1/e62985
spellingShingle De Rong Loh
Elliot D Hill
Nan Liu
Geraldine Dawson
Matthew M Engelhard
Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis
JMIR AI
title Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis
title_full Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis
title_fullStr Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis
title_full_unstemmed Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis
title_short Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis
title_sort limitations of binary classification for long horizon diagnosis prediction and advantages of a discrete time time to event approach empirical analysis
url https://ai.jmir.org/2025/1/e62985
work_keys_str_mv AT derongloh limitationsofbinaryclassificationforlonghorizondiagnosispredictionandadvantagesofadiscretetimetimetoeventapproachempiricalanalysis
AT elliotdhill limitationsofbinaryclassificationforlonghorizondiagnosispredictionandadvantagesofadiscretetimetimetoeventapproachempiricalanalysis
AT nanliu limitationsofbinaryclassificationforlonghorizondiagnosispredictionandadvantagesofadiscretetimetimetoeventapproachempiricalanalysis
AT geraldinedawson limitationsofbinaryclassificationforlonghorizondiagnosispredictionandadvantagesofadiscretetimetimetoeventapproachempiricalanalysis
AT matthewmengelhard limitationsofbinaryclassificationforlonghorizondiagnosispredictionandadvantagesofadiscretetimetimetoeventapproachempiricalanalysis