Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines
Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co., Ltd.
2024-09-01
|
| Series: | Informatics and Health |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2949953424000146 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849690353805819904 |
|---|---|
| author | Cheng Ye |
| author_facet | Cheng Ye |
| author_sort | Cheng Ye |
| collection | DOAJ |
| description | Background: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of search results to support RAG-based search engines. Methods: Given a prompt or search query, the system first asks the user to label a few randomly selected documents, which contain some keywords, as relevant to the prompt or not. The system then identifies relevant sentences and adjusts word similarities by updating a medical semantic embedding. New documents are ranked by the number of relevant sentences identified by the weighted embedding. Only the top-ranked documents and sentences are provided to a Large-Language-Model (LLM) to generate answers for further review. Findings: To evaluate our approach, four medical researchers labeled documents based on their relevance to specific diseases. We measured the information retrieval performance of our approach and two baseline methods. Results show that our approach achieved at least a 0.60 Precision-at-10 (P @ 10) score with only ten positive labels, outperforming the baseline methods. In our pilot study, we demonstrate that the learned semantic preference can transfer to the analysis of unseen datasets, boosting the accuracy of an RAG model in extracting and explaining cancer progression diagnoses from 0.14 to 0.50. Interpretation: This study demonstrates that a customized learning-to-rank method can enhance state-of-the-art natural language models, such as LLMs, by quickly adapting to users' semantics. This approach supports EMR document retrieval and helps RAG models generate clinically meaningful answers to specific questions, underscoring the potential of user-tailored learning-to-rank methods in clinical practice. |
| format | Article |
| id | doaj-art-83e8f56cf986434f8a6c5f344f3d4b6c |
| institution | DOAJ |
| issn | 2949-9534 |
| language | English |
| publishDate | 2024-09-01 |
| publisher | KeAi Communications Co., Ltd. |
| record_format | Article |
| series | Informatics and Health |
| spelling | doaj-art-83e8f56cf986434f8a6c5f344f3d4b6c2025-08-20T03:21:19ZengKeAi Communications Co., Ltd.Informatics and Health2949-95342024-09-0112939910.1016/j.infoh.2024.07.001Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search enginesCheng Ye0Correspondence to: Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave # 1475, Nashville, TN 37203, USA.; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USABackground: This study addresses the challenge of enhancing Retrieval Augmented Generation (RAG) search engines for electronic medical records (EMR) by learning users' distinct search semantics. The specific aim is to develop a learning-to-rank system that improves the accuracy and relevance of search results to support RAG-based search engines. Methods: Given a prompt or search query, the system first asks the user to label a few randomly selected documents, which contain some keywords, as relevant to the prompt or not. The system then identifies relevant sentences and adjusts word similarities by updating a medical semantic embedding. New documents are ranked by the number of relevant sentences identified by the weighted embedding. Only the top-ranked documents and sentences are provided to a Large-Language-Model (LLM) to generate answers for further review. Findings: To evaluate our approach, four medical researchers labeled documents based on their relevance to specific diseases. We measured the information retrieval performance of our approach and two baseline methods. Results show that our approach achieved at least a 0.60 Precision-at-10 (P @ 10) score with only ten positive labels, outperforming the baseline methods. In our pilot study, we demonstrate that the learned semantic preference can transfer to the analysis of unseen datasets, boosting the accuracy of an RAG model in extracting and explaining cancer progression diagnoses from 0.14 to 0.50. Interpretation: This study demonstrates that a customized learning-to-rank method can enhance state-of-the-art natural language models, such as LLMs, by quickly adapting to users' semantics. This approach supports EMR document retrieval and helps RAG models generate clinically meaningful answers to specific questions, underscoring the potential of user-tailored learning-to-rank methods in clinical practice.http://www.sciencedirect.com/science/article/pii/S2949953424000146Retrieval Augmented GenerationElectronic medical recordsInformation retrievalLarge Language ModelLearning to rank |
| spellingShingle | Cheng Ye Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines Informatics and Health Retrieval Augmented Generation Electronic medical records Information retrieval Large Language Model Learning to rank |
| title | Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines |
| title_full | Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines |
| title_fullStr | Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines |
| title_full_unstemmed | Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines |
| title_short | Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-based electronic medical records search engines |
| title_sort | exploring a learning to rank approach to enhance the retrieval augmented generation rag based electronic medical records search engines |
| topic | Retrieval Augmented Generation Electronic medical records Information retrieval Large Language Model Learning to rank |
| url | http://www.sciencedirect.com/science/article/pii/S2949953424000146 |
| work_keys_str_mv | AT chengye exploringalearningtorankapproachtoenhancetheretrievalaugmentedgenerationragbasedelectronicmedicalrecordssearchengines |