358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases

Objectives/Goals: Identifying and indexing rare disease studies is labor intensive, especially in research centers with a large number of trials. To address this gap, we applied natural language processing (NLP) and visualization techniques to develop an efficient pipeline and user-friendly web inte...

Full description

Saved in:
Bibliographic Details
Main Authors: Michael Lin, Jennifer Weis, H M Abdul Fattah, Jungwei Fan
Format: Article
Language:English
Published: Cambridge University Press 2025-04-01
Series:Journal of Clinical and Translational Science
Online Access:https://www.cambridge.org/core/product/identifier/S2059866124009841/type/journal_article
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849393738411933696
author Michael Lin
Jennifer Weis
H M Abdul Fattah
Jungwei Fan
author_facet Michael Lin
Jennifer Weis
H M Abdul Fattah
Jungwei Fan
author_sort Michael Lin
collection DOAJ
description Objectives/Goals: Identifying and indexing rare disease studies is labor intensive, especially in research centers with a large number of trials. To address this gap, we applied natural language processing (NLP) and visualization techniques to develop an efficient pipeline and user-friendly web interface. Our goal is to offer the rare disease study identification (RDSI) tool for adoption by other sites. Methods/Study Population: The RDSI retrieves study information (short and long titles, study abstract) from the IRB system. These descriptive fields are then processed by the MetaMap Lite NLP program for identifying disease terms and standardizing them to UMLS concepts. By terminology identifier mapping, the diseases intersecting with concepts in rare disease databases (Genetic and Rare Disease program and Orphanet) are further scored to pinpoint studies that focus on a rare disease. The web interface displays a scatter bubble chart as an overview of all the rare diseases, with each bubble size proportional to the number of studies for that disease. In addition to the visual navigation, users can search studies by disease name, PI, or IRB number. Search results contain detailed study information as well as the evidence used by algorithms of the pipeline. Results/Anticipated Results: The RDSI identification results and functions were verified manually and spot-checked by several study investigators. The web interface is a self-contained solution available to our staff for various use cases like reporting or environment scan. We have built in a versioning mechanism that logs the date of each major result in the process. Therefore, even as the rare disease data sources evolve over time, we will be able to preserve any historical context or perform updates as needed. The RDSI outputs are replicated to Mayo Clinic’s enterprise data warehouse daily, allowing tech-savvy users to leverage any useful intermediate results at the backend. We anticipate the performance of the rare disease identification to be further enhanced by employing the advancements in AI technology. Discussion/Significance of Impact: The RDSI represents an informatics solution that offers efficiency in identifying and navigating rare disease clinical studies. It features the use of public databases and open-source tools, manifesting return on investment from the broad translational science ecosystem. These considerations are informative and adoptable by other institutions.
format Article
id doaj-art-2ca7865fb30a49f4b40e37998cf1b968
institution Kabale University
issn 2059-8661
language English
publishDate 2025-04-01
publisher Cambridge University Press
record_format Article
series Journal of Clinical and Translational Science
spelling doaj-art-2ca7865fb30a49f4b40e37998cf1b9682025-08-20T03:40:18ZengCambridge University PressJournal of Clinical and Translational Science2059-86612025-04-01911011010.1017/cts.2024.984358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseasesMichael Lin0Jennifer Weis1H M Abdul Fattah2Jungwei Fan3Mayo ClinicMayo ClinicUniversity of ArizonaMayo ClinicObjectives/Goals: Identifying and indexing rare disease studies is labor intensive, especially in research centers with a large number of trials. To address this gap, we applied natural language processing (NLP) and visualization techniques to develop an efficient pipeline and user-friendly web interface. Our goal is to offer the rare disease study identification (RDSI) tool for adoption by other sites. Methods/Study Population: The RDSI retrieves study information (short and long titles, study abstract) from the IRB system. These descriptive fields are then processed by the MetaMap Lite NLP program for identifying disease terms and standardizing them to UMLS concepts. By terminology identifier mapping, the diseases intersecting with concepts in rare disease databases (Genetic and Rare Disease program and Orphanet) are further scored to pinpoint studies that focus on a rare disease. The web interface displays a scatter bubble chart as an overview of all the rare diseases, with each bubble size proportional to the number of studies for that disease. In addition to the visual navigation, users can search studies by disease name, PI, or IRB number. Search results contain detailed study information as well as the evidence used by algorithms of the pipeline. Results/Anticipated Results: The RDSI identification results and functions were verified manually and spot-checked by several study investigators. The web interface is a self-contained solution available to our staff for various use cases like reporting or environment scan. We have built in a versioning mechanism that logs the date of each major result in the process. Therefore, even as the rare disease data sources evolve over time, we will be able to preserve any historical context or perform updates as needed. The RDSI outputs are replicated to Mayo Clinic’s enterprise data warehouse daily, allowing tech-savvy users to leverage any useful intermediate results at the backend. We anticipate the performance of the rare disease identification to be further enhanced by employing the advancements in AI technology. Discussion/Significance of Impact: The RDSI represents an informatics solution that offers efficiency in identifying and navigating rare disease clinical studies. It features the use of public databases and open-source tools, manifesting return on investment from the broad translational science ecosystem. These considerations are informative and adoptable by other institutions.https://www.cambridge.org/core/product/identifier/S2059866124009841/type/journal_article
spellingShingle Michael Lin
Jennifer Weis
H M Abdul Fattah
Jungwei Fan
358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases
Journal of Clinical and Translational Science
title 358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases
title_full 358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases
title_fullStr 358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases
title_full_unstemmed 358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases
title_short 358 Rare disease study identification (RDSI): A natural language processing assisted search and visualization tool for clinical studies of rare diseases
title_sort 358 rare disease study identification rdsi a natural language processing assisted search and visualization tool for clinical studies of rare diseases
url https://www.cambridge.org/core/product/identifier/S2059866124009841/type/journal_article
work_keys_str_mv AT michaellin 358rarediseasestudyidentificationrdsianaturallanguageprocessingassistedsearchandvisualizationtoolforclinicalstudiesofrarediseases
AT jenniferweis 358rarediseasestudyidentificationrdsianaturallanguageprocessingassistedsearchandvisualizationtoolforclinicalstudiesofrarediseases
AT hmabdulfattah 358rarediseasestudyidentificationrdsianaturallanguageprocessingassistedsearchandvisualizationtoolforclinicalstudiesofrarediseases
AT jungweifan 358rarediseasestudyidentificationrdsianaturallanguageprocessingassistedsearchandvisualizationtoolforclinicalstudiesofrarediseases