Digitalizing English-language CT Interpretation for Positive Haemorrhage Evaluation Reporting: the DECIPHER study

Objectives Identifying whether there is a traumatic intracranial bleed (ICB+) on head CT is critical for clinical care and research. Free text CT reports are unstructured and therefore must undergo time-consuming manual review. Existing artificial intelligence classification schemes are not optimise...

Full description

Saved in:
Bibliographic Details
Main Authors: Stephen H Thomas, Jason Pott, Imogen Skene, Ben Bloom, Michael Cheetham, Adrian Haimovich, Raine Astin-Chamberlain, Sophie L Williams, Sandra Langsted
Format: Article
Language:English
Published: BMJ Publishing Group 2025-07-01
Series:BMJ Health & Care Informatics
Online Access:https://informatics.bmj.com/content/32/1/e101433.full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objectives Identifying whether there is a traumatic intracranial bleed (ICB+) on head CT is critical for clinical care and research. Free text CT reports are unstructured and therefore must undergo time-consuming manual review. Existing artificial intelligence classification schemes are not optimised for the emergency department endpoint of classification of ICB+ or ICB−. We sought to assess three methods for classifying CT reports: a text classification (TC) programme, a commercial natural language processing programme (Clinithink) and a generative pretrained transformer large language model (Digitalizing English-language CT Interpretation for Positive Haemorrhage Evaluation Reporting (DECIPHER)-LLM).Methods Primary objective: determine the diagnostic classification performance of the dichotomous categorisation of each of the three approaches.Secondary objective: determine whether the LLM could achieve a substantial reduction in CT report review workload while maintaining 100% sensitivity.Anonymised radiology reports of head CT scans performed for trauma were manually labelled as ICB+/−. Training and validation sets were randomly created to train the TC and natural language processing models. Prompts were written to train the LLM.Results 898 reports were manually labelled. Sensitivity and specificity (95% CI)) of TC, Clinithink and DECIPHER-LLM (with probability of ICB set at 10%) were respectively 87.9% (76.7% to 95.0%) and 98.2% (96.3% to 99.3%), 75.9% (62.8% to 86.1%) and 96.2% (93.8% to 97.8%) and 100% (93.8% to 100%) and 97.4% (95.3% to 98.8%).With DECIPHER-LLM probability of ICB+ threshold of 10% set to identify CT reports requiring manual evaluation, CT reports requiring manual classification reduced by an estimated 385/449 cases (85.7% (95% CI 82.1% to 88.9%)) while maintaining 100% sensitivity.Discussion and conclusion DECIPHER-LLM outperformed other tested free-text classification methods.
ISSN:2632-1009