A Pipeline for Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation
Accurate and efficient documentation of patient information is vital in emergency healthcare settings. Traditional manual documentation methods are often time-consuming and prone to errors, potentially affecting patient outcomes. Large Language Models (LLMs) offer a promising solution to enhance med...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-12-01
|
| Series: | Applied Artificial Intelligence |
| Online Access: | https://www.tandfonline.com/doi/10.1080/08839514.2025.2519169 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Accurate and efficient documentation of patient information is vital in emergency healthcare settings. Traditional manual documentation methods are often time-consuming and prone to errors, potentially affecting patient outcomes. Large Language Models (LLMs) offer a promising solution to enhance medical communication systems; however, their clinical deployment, particularly in non-English languages such as German, presents challenges related to content accuracy, clinical relevance, and data privacy. This study addresses these challenges by developing and evaluating an automated pipeline for emergency medical documentation in German. The research objectives include (1) generating synthetic dialogues with known ground truth data to create controlled datasets for evaluating NLP performance and (2) designing an innovative pipeline to retrieve essential clinical information from these dialogues. A subset of 100 anonymized patient records from the MIMIC-IV-ED dataset was selected, ensuring diversity in demographics, chief complaints, and conditions. A Retrieval-Augmented Generation (RAG) system extracted key nominal and numerical features using chunking, embedding, and dynamic prompts. Evaluation metrics included precision, recall, F1-score, and sentiment analysis. Initial results demonstrated high extraction accuracy, particularly in medication data (F1-scores: 86.21%–100%), though performance declined in nuanced clinical language, requiring further refinement for real-world emergency settings. |
|---|---|
| ISSN: | 0883-9514 1087-6545 |