Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines

Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of...

Full description

Saved in:
Bibliographic Details
Main Author: Cheng Ye
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-09-01
Series:Informatics and Health
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949953424000146
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849690353805819904
author Cheng Ye
author_facet Cheng Ye
author_sort Cheng Ye
collection DOAJ
description Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of search results to support RAG-based search engines. Methods: Given a prompt or search query, the system first asks the user to label a few randomly selected documents, which contain some keywords, as relevant to the prompt or not. The system then identifies relevant sentences and adjusts word similarities by updating a medical semantic embedding. New documents are ranked by the number of relevant sentences identified by the weighted embedding. Only the top-ranked documents and sentences are provided to a Large-Language-Model (LLM) to generate answers for further review. Findings: To evaluate our approach, four medical researchers labeled documents based on their relevance to specific diseases. We measured the information retrieval performance of our approach and two baseline methods. Results show that our approach achieved at least a 0.60 Precision-at-10 (P @ 10) score with only ten positive labels, outperforming the baseline methods. In our pilot study, we demonstrate that the learned semantic preference can transfer to the analysis of unseen datasets, boosting the accuracy of an RAG model in extracting and explaining cancer progression diagnoses from 0.14 to 0.50. Interpretation: This study demonstrates that a customized learning-to-rank method can enhance state-of-the-art natural language models, such as LLMs, by quickly adapting to users' semantics. This approach supports EMR document retrieval and helps RAG models generate clinically meaningful answers to specific questions, underscoring the potential of user-tailored learning-to-rank methods in clinical practice.
format Article
id doaj-art-83e8f56cf986434f8a6c5f344f3d4b6c
institution DOAJ
issn 2949-9534
language English
publishDate 2024-09-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Informatics and Health
spelling doaj-art-83e8f56cf986434f8a6c5f344f3d4b6c2025-08-20T03:21:19ZengKeAi Communications Co., Ltd.Informatics and Health2949-95342024-09-0112939910.1016/j.infoh.2024.07.001Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search enginesCheng Ye0Correspondence to: Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave # 1475, Nashville, TN 37203, USA.; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USABackground: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of search results to support RAG-based search engines. Methods: Given a prompt or search query, the system first asks the user to label a few randomly selected documents, which contain some keywords, as relevant to the prompt or not. The system then identifies relevant sentences and adjusts word similarities by updating a medical semantic embedding. New documents are ranked by the number of relevant sentences identified by the weighted embedding. Only the top-ranked documents and sentences are provided to a Large-Language-Model (LLM) to generate answers for further review. Findings: To evaluate our approach, four medical researchers labeled documents based on their relevance to specific diseases. We measured the information retrieval performance of our approach and two baseline methods. Results show that our approach achieved at least a 0.60 Precision-at-10 (P @ 10) score with only ten positive labels, outperforming the baseline methods. In our pilot study, we demonstrate that the learned semantic preference can transfer to the analysis of unseen datasets, boosting the accuracy of an RAG model in extracting and explaining cancer progression diagnoses from 0.14 to 0.50. Interpretation: This study demonstrates that a customized learning-to-rank method can enhance state-of-the-art natural language models, such as LLMs, by quickly adapting to users' semantics. This approach supports EMR document retrieval and helps RAG models generate clinically meaningful answers to specific questions, underscoring the potential of user-tailored learning-to-rank methods in clinical practice.http://www.sciencedirect.com/science/article/pii/S2949953424000146Retrieval Augmented GenerationElectronic medical recordsInformation retrievalLarge Language ModelLearning to rank
spellingShingle Cheng Ye
Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
Informatics and Health
Retrieval Augmented Generation
Electronic medical records
Information retrieval
Large Language Model
Learning to rank
title Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_full Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_fullStr Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_full_unstemmed Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_short Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_sort exploring a learning to rank approach to enhance the retrieval augmented generation rag based electronic medical records search engines
topic Retrieval Augmented Generation
Electronic medical records
Information retrieval
Large Language Model
Learning to rank
url http://www.sciencedirect.com/science/article/pii/S2949953424000146
work_keys_str_mv AT chengye exploringalearningtorankapproachtoenhancetheretrievalaugmentedgenerationragbasedelectronicmedicalrecordssearchengines