Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines

Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of...

Full description

Saved in:

Bibliographic Details
Main Author:	Cheng Ye
Format:	Article
Language:	English
Published:	KeAi Communications Co., Ltd. 2024-09-01
Series:	Informatics and Health
Subjects:	Retrieval Augmented Generation Electronic medical records Information retrieval Large Language Model Learning to rank
Online Access:	http://www.sciencedirect.com/science/article/pii/S2949953424000146
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849690353805819904
author	Cheng Ye
author_facet	Cheng Ye
author_sort	Cheng Ye
collection	DOAJ
description	Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of search results to support RAG-based search engines. Methods: Given a prompt or search query, the system first asks the user to label a few randomly selected documents, which contain some keywords, as relevant to the prompt or not. The system then identifies relevant sentences and adjusts word similarities by updating a medical semantic embedding. New documents are ranked by the number of relevant sentences identified by the weighted embedding. Only the top-ranked documents and sentences are provided to a Large-Language-Model (LLM) to generate answers for further review. Findings: To evaluate our approach, four medical researchers labeled documents based on their relevance to specific diseases. We measured the information retrieval performance of our approach and two baseline methods. Results show that our approach achieved at least a 0.60 Precision-at-10 (P @ 10) score with only ten positive labels, outperforming the baseline methods. In our pilot study, we demonstrate that the learned semantic preference can transfer to the analysis of unseen datasets, boosting the accuracy of an RAG model in extracting and explaining cancer progression diagnoses from 0.14 to 0.50. Interpretation: This study demonstrates that a customized learning-to-rank method can enhance state-of-the-art natural language models, such as LLMs, by quickly adapting to users' semantics. This approach supports EMR document retrieval and helps RAG models generate clinically meaningful answers to specific questions, underscoring the potential of user-tailored learning-to-rank methods in clinical practice.
format	Article
id	doaj-art-83e8f56cf986434f8a6c5f344f3d4b6c
institution	DOAJ
issn	2949-9534
language	English
publishDate	2024-09-01
publisher	KeAi Communications Co., Ltd.
record_format	Article
series	Informatics and Health
spelling	doaj-art-83e8f56cf986434f8a6c5f344f3d4b6c2025-08-20T03:21:19ZengKeAi Communications Co., Ltd.Informatics and Health2949-95342024-09-0112939910.1016/j.infoh.2024.07.001Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search enginesCheng Ye0Correspondence to: Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave # 1475, Nashville, TN 37203, USA.; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USABackground: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of search results to support RAG-based search engines. Methods: Given a prompt or search query, the system first asks the user to label a few randomly selected documents, which contain some keywords, as relevant to the prompt or not. The system then identifies relevant sentences and adjusts word similarities by updating a medical semantic embedding. New documents are ranked by the number of relevant sentences identified by the weighted embedding. Only the top-ranked documents and sentences are provided to a Large-Language-Model (LLM) to generate answers for further review. Findings: To evaluate our approach, four medical researchers labeled documents based on their relevance to specific diseases. We measured the information retrieval performance of our approach and two baseline methods. Results show that our approach achieved at least a 0.60 Precision-at-10 (P @ 10) score with only ten positive labels, outperforming the baseline methods. In our pilot study, we demonstrate that the learned semantic preference can transfer to the analysis of unseen datasets, boosting the accuracy of an RAG model in extracting and explaining cancer progression diagnoses from 0.14 to 0.50. Interpretation: This study demonstrates that a customized learning-to-rank method can enhance state-of-the-art natural language models, such as LLMs, by quickly adapting to users' semantics. This approach supports EMR document retrieval and helps RAG models generate clinically meaningful answers to specific questions, underscoring the potential of user-tailored learning-to-rank methods in clinical practice.http://www.sciencedirect.com/science/article/pii/S2949953424000146Retrieval Augmented GenerationElectronic medical recordsInformation retrievalLarge Language ModelLearning to rank
spellingShingle	Cheng Ye Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines Informatics and Health Retrieval Augmented Generation Electronic medical records Information retrieval Large Language Model Learning to rank
title	Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_full	Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_fullStr	Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_full_unstemmed	Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_short	Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
title_sort	exploring a learning to rank approach to enhance the retrieval augmented generation rag based electronic medical records search engines
topic	Retrieval Augmented Generation Electronic medical records Information retrieval Large Language Model Learning to rank
url	http://www.sciencedirect.com/science/article/pii/S2949953424000146
work_keys_str_mv	AT chengye exploringalearningtorankapproachtoenhancetheretrievalaugmentedgenerationragbasedelectronicmedicalrecordssearchengines

Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines

Similar Items