Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines

Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of...

Full description

Saved in:
Bibliographic Details
Main Author: Cheng Ye
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-09-01
Series:Informatics and Health
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949953424000146
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of search results to support RAG-based search engines. Methods: Given a prompt or search query, the system first asks the user to label a few randomly selected documents, which contain some keywords, as relevant to the prompt or not. The system then identifies relevant sentences and adjusts word similarities by updating a medical semantic embedding. New documents are ranked by the number of relevant sentences identified by the weighted embedding. Only the top-ranked documents and sentences are provided to a Large-Language-Model (LLM) to generate answers for further review. Findings: To evaluate our approach, four medical researchers labeled documents based on their relevance to specific diseases. We measured the information retrieval performance of our approach and two baseline methods. Results show that our approach achieved at least a 0.60 Precision-at-10 (P @ 10) score with only ten positive labels, outperforming the baseline methods. In our pilot study, we demonstrate that the learned semantic preference can transfer to the analysis of unseen datasets, boosting the accuracy of an RAG model in extracting and explaining cancer progression diagnoses from 0.14 to 0.50. Interpretation: This study demonstrates that a customized learning-to-rank method can enhance state-of-the-art natural language models, such as LLMs, by quickly adapting to users' semantics. This approach supports EMR document retrieval and helps RAG models generate clinically meaningful answers to specific questions, underscoring the potential of user-tailored learning-to-rank methods in clinical practice.
ISSN:2949-9534