23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care

Objectives/Goals: Magnetic resonance imaging (MRI) reports are stored as unstructured text in the electronic health record (EHR), rendering the data inaccessible. Large language models (LLM) are a new tool for analyzing and generating unstructured text. We aimed to evaluate how well an LLM extracts...

Full description

Saved in:

Bibliographic Details
Main Authors:	William Pace, Andrew Liu, Marvin Carlisle, Robert Krumm, Janet Cowan, Peter Carroll, Matthew Cooperberg, Anobel Odisho
Format:	Article
Language:	English
Published:	Cambridge University Press 2025-04-01
Series:	Journal of Clinical and Translational Science
Online Access:	https://www.cambridge.org/core/product/identifier/S2059866124007143/type/journal_article
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850097814090022912
author	William Pace Andrew Liu Marvin Carlisle Robert Krumm Janet Cowan Peter Carroll Matthew Cooperberg Anobel Odisho
author_facet	William Pace Andrew Liu Marvin Carlisle Robert Krumm Janet Cowan Peter Carroll Matthew Cooperberg Anobel Odisho
author_sort	William Pace
collection	DOAJ
description	Objectives/Goals: Magnetic resonance imaging (MRI) reports are stored as unstructured text in the electronic health record (EHR), rendering the data inaccessible. Large language models (LLM) are a new tool for analyzing and generating unstructured text. We aimed to evaluate how well an LLM extracts data from MRI reports compared to manually abstracted data. Methods/Study Population: The University of California, San Francisco has deployed a HIPAA-compliant internal LLM tool utilizing GPT-4 technology and approved for PHI use. We developed a detailed prompt instructing the LLM to extract data elements from prostate MRI reports and to output the results in a structured, computer-readable format. A data pipeline was built using the OpenAI Application Programming Interface (API) to automatically extract distinct data elements from the MRI report that are important in prostate cancer care. Each prompt was executed five times and data were compared with the modal responses to determine variability of responses. Accuracy was also assessed. Results/Anticipated Results: Across 424 prostate MRI reports, GPT-4 response accuracy was consistently above 95% for most parameters. Individual field accuracies were 98.3% (96.3–99.3%) for PSA density, 97.4% (95.4–98.7%) for extracapsular extension, 98.1% (96.3–99.2%) for TNM Stage, had an overall median of 98.1% (96.3–99.2%), a mean of 97.2% (95.2–98.3%), and a range of 99.8% (98.7–100.0%) to 87.7% (84.2–90.7%). Response variability over five repeated runs ranged from 0.14% to 3.61%, differed based on the data element extracted (p Discussion/Significance of Impact: GPT-4 was highly accurate in extracting data points from prostate cancer MRI reports with low upfront programming requirements. This represents an effective tool to expedite medical data extraction for clinical and research use cases.
format	Article
id	doaj-art-951dbb67f3c54fa5b832b40d6761f7d3
institution	DOAJ
issn	2059-8661
language	English
publishDate	2025-04-01
publisher	Cambridge University Press
record_format	Article
series	Journal of Clinical and Translational Science
spelling	doaj-art-951dbb67f3c54fa5b832b40d6761f7d32025-08-20T02:40:52ZengCambridge University PressJournal of Clinical and Translational Science2059-86612025-04-0198810.1017/cts.2024.71423 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer careWilliam Pace0Andrew Liu1Marvin Carlisle2Robert Krumm3Janet Cowan4Peter Carroll5Matthew Cooperberg6Anobel Odisho7University of California, San FranciscoUniversity of California, San FranciscoUniversity of California, San FranciscoUniversity of California, San FranciscoUniversity of California, San FranciscoUniversity of California, San FranciscoUniversity of California, San FranciscoUniversity of California, San FranciscoObjectives/Goals: Magnetic resonance imaging (MRI) reports are stored as unstructured text in the electronic health record (EHR), rendering the data inaccessible. Large language models (LLM) are a new tool for analyzing and generating unstructured text. We aimed to evaluate how well an LLM extracts data from MRI reports compared to manually abstracted data. Methods/Study Population: The University of California, San Francisco has deployed a HIPAA-compliant internal LLM tool utilizing GPT-4 technology and approved for PHI use. We developed a detailed prompt instructing the LLM to extract data elements from prostate MRI reports and to output the results in a structured, computer-readable format. A data pipeline was built using the OpenAI Application Programming Interface (API) to automatically extract distinct data elements from the MRI report that are important in prostate cancer care. Each prompt was executed five times and data were compared with the modal responses to determine variability of responses. Accuracy was also assessed. Results/Anticipated Results: Across 424 prostate MRI reports, GPT-4 response accuracy was consistently above 95% for most parameters. Individual field accuracies were 98.3% (96.3–99.3%) for PSA density, 97.4% (95.4–98.7%) for extracapsular extension, 98.1% (96.3–99.2%) for TNM Stage, had an overall median of 98.1% (96.3–99.2%), a mean of 97.2% (95.2–98.3%), and a range of 99.8% (98.7–100.0%) to 87.7% (84.2–90.7%). Response variability over five repeated runs ranged from 0.14% to 3.61%, differed based on the data element extracted (p Discussion/Significance of Impact: GPT-4 was highly accurate in extracting data points from prostate cancer MRI reports with low upfront programming requirements. This represents an effective tool to expedite medical data extraction for clinical and research use cases.https://www.cambridge.org/core/product/identifier/S2059866124007143/type/journal_article
spellingShingle	William Pace Andrew Liu Marvin Carlisle Robert Krumm Janet Cowan Peter Carroll Matthew Cooperberg Anobel Odisho 23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care Journal of Clinical and Translational Science
title	23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care
title_full	23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care
title_fullStr	23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care
title_full_unstemmed	23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care
title_short	23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care
title_sort	23 generative artificial intelligence for automated unstructured mri data extraction in prostate cancer care
url	https://www.cambridge.org/core/product/identifier/S2059866124007143/type/journal_article
work_keys_str_mv	AT williampace 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare AT andrewliu 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare AT marvincarlisle 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare AT robertkrumm 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare AT janetcowan 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare AT petercarroll 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare AT matthewcooperberg 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare AT anobelodisho 23generativeartificialintelligenceforautomatedunstructuredmridataextractioninprostatecancercare

23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care

Similar Items