Evaluation of a large language model to simplify discharge summaries and provide cardiological lifestyle recommendations

Abstract Background Hospital discharge summaries are essential for the continuity of care. However, medical jargon, abbreviations, and technical language often make them too complex for patients to understand, and they frequently omit lifestyle recommendations important for self-management. This stu...

Full description

Saved in:
Bibliographic Details
Main Authors: Paul Rust, Julian Frings, Sven Meister, Leonard Fehring
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Communications Medicine
Online Access:https://doi.org/10.1038/s43856-025-00927-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Hospital discharge summaries are essential for the continuity of care. However, medical jargon, abbreviations, and technical language often make them too complex for patients to understand, and they frequently omit lifestyle recommendations important for self-management. This study explored using a large language model (LLM) to enhance discharge summary readability and augment it with lifestyle recommendations. Methods We collected 20 anonymized cardiology discharge summaries. GPT-4o was prompted using full-text and segment-wise approaches to simplify each summary and generate lifestyle recommendations. Readability was measured via three standardized metrics (modified Flesch-Reading-Ease, Vienna Non-fiction Text Formula, Lesbarkeitsindex), and multiple quality dimensions were evaluated by 12 medical experts. Results LLM-generated summaries from both prompting approaches are significantly more readable compared to the original summaries across all metrics (p < 0.0001). Based on 60 expert ratings for the full-text approach and 60 for the segment-wise approach, experts ‘(strongly) agree’ that LLM-summaries are correct (full-text: 85%; segment-wise: 80%), complete (78%; 92%), harmless (83%; 88%), and comprehensible for patients (88%; 97%). Experts ‘(strongly) agree’ that LLM-generated recommendations are relevant in 92%, evidence-based in 88%, personalized in 70%, complete in 88%, consistent in 93%, and harmless in 88% of 60 ratings. Conclusions LLM-generated summaries achieve a 10th-grade readability level and high-quality ratings. While LLM-generated lifestyle recommendations are generally of high quality, personalization is limited. These findings suggest that LLMs could help create more patient-centric discharge summaries. Further research is needed to confirm clinical utility and address quality assurance, regulatory compliance, and clinical integration challenges.
ISSN:2730-664X