Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages

Abstract Background The integration of big data and artificial intelligence (AI) in healthcare, particularly through the analysis of electronic health records (EHR), presents significant opportunities for improving diagnostic accuracy and patient outcomes. However, the challenge of processing and ac...

Full description

Saved in:
Bibliographic Details
Main Authors: Izzet Turkalp Akbasli, Ahmet Ziya Birbilen, Ozlem Teksam
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-02871-6
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background The integration of big data and artificial intelligence (AI) in healthcare, particularly through the analysis of electronic health records (EHR), presents significant opportunities for improving diagnostic accuracy and patient outcomes. However, the challenge of processing and accurately labeling vast amounts of unstructured data remains a critical bottleneck, necessitating efficient and reliable solutions. This study investigates the ability of domain specific, fine-tuned large language models (LLMs) to classify unstructured EHR texts with typographical errors through named entity recognition tasks, aiming to improve the efficiency and reliability of supervised learning AI models in healthcare. Methods Turkish clinical notes from pediatric emergency room admissions at Hacettepe University İhsan Doğramacı Children’s Hospital from 2018 to 2023 were analyzed. The data were preprocessed with open source Python libraries and categorized using a pretrained GPT-3 model, “text-davinci-003,” before and after fine-tuning with domain-specific data on respiratory tract infections (RTI). The model’s predictions were compared against ground truth labels established by pediatric specialists. Results Out of 24,229 patient records classified as poorly labeled, 18,879 were identified without typographical errors and confirmed for RTI through filtering methods. The fine-tuned model achieved a 99.88% accuracy, significantly outperforming the pretrained model’s 78.54% accuracy in identifying RTI cases among the remaining records. The fine-tuned model demonstrated superior performance metrics across all evaluated aspects compared to the pretrained model. Conclusions Fine-tuned LLMs can categorize unstructured EHR data with high accuracy, closely approximating the performance of domain experts. This approach significantly reduces the time and costs associated with manual data labeling, demonstrating the potential to streamline the processing of large-scale healthcare data for AI applications.
ISSN:1472-6947