A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance

Adverse drug events are harms associated with drug use, whether the drug is used correctly or incorrectly. Identifying adverse drug events is vital in pharmacovigilance to safeguard public health. Drug safety surveillance can be performed using unstructured data. A comprehensive and accurate list of...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenjing Guo, Fan Dong, Jie Liu, Aasma Aslam, Tucker A. Patterson, Huixiao Hong
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-05-01
Series:Experimental Biology and Medicine
Subjects:
Online Access:https://www.ebm-journal.org/articles/10.3389/ebm.2025.10374/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850198784500301824
author Wenjing Guo
Fan Dong
Jie Liu
Aasma Aslam
Tucker A. Patterson
Huixiao Hong
author_facet Wenjing Guo
Fan Dong
Jie Liu
Aasma Aslam
Tucker A. Patterson
Huixiao Hong
author_sort Wenjing Guo
collection DOAJ
description Adverse drug events are harms associated with drug use, whether the drug is used correctly or incorrectly. Identifying adverse drug events is vital in pharmacovigilance to safeguard public health. Drug safety surveillance can be performed using unstructured data. A comprehensive and accurate list of drug names is essential for effective identification of adverse drug events. While there are numerous sources for drug names, RxNorm is widely recognized as a leading resource. However, its effectiveness for unstructured data analysis in drug safety surveillance has not been thoroughly assessed. To address this, we evaluated the drug names in RxNorm for their suitability in unstructured data analysis and developed a refined set of drug names. Initially, we removed duplicates, the names exceeding 199 characters, and those that only describe administrative details. Drug names with four or fewer characters were analyzed using 18,000 drug-related PubMed abstracts to remove names which rarely appear in unstructured data. The remaining names, which ranged from five to 199 characters, were further refined to exclude those that could lead to inaccurate drug counts in unstructured data analysis. We compared the efficiency and accuracy of the refined set with the original RxNorm set by testing both on the 18,000 drug-related PubMed abstracts. The results showed a decrease in both computational cost and the number of false drug names identified. Further analysis of the removed names revealed that most originated from only one of the 14 sources. Our findings suggest that the refined set can enhance drug identification in unstructured data analysis, thereby improving pharmacovigilance.
format Article
id doaj-art-e926be9891f846cab5c0f3abf66279e3
institution OA Journals
issn 1535-3699
language English
publishDate 2025-05-01
publisher Frontiers Media S.A.
record_format Article
series Experimental Biology and Medicine
spelling doaj-art-e926be9891f846cab5c0f3abf66279e32025-08-20T02:12:46ZengFrontiers Media S.A.Experimental Biology and Medicine1535-36992025-05-0125010.3389/ebm.2025.1037410374A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillanceWenjing GuoFan DongJie LiuAasma AslamTucker A. PattersonHuixiao HongAdverse drug events are harms associated with drug use, whether the drug is used correctly or incorrectly. Identifying adverse drug events is vital in pharmacovigilance to safeguard public health. Drug safety surveillance can be performed using unstructured data. A comprehensive and accurate list of drug names is essential for effective identification of adverse drug events. While there are numerous sources for drug names, RxNorm is widely recognized as a leading resource. However, its effectiveness for unstructured data analysis in drug safety surveillance has not been thoroughly assessed. To address this, we evaluated the drug names in RxNorm for their suitability in unstructured data analysis and developed a refined set of drug names. Initially, we removed duplicates, the names exceeding 199 characters, and those that only describe administrative details. Drug names with four or fewer characters were analyzed using 18,000 drug-related PubMed abstracts to remove names which rarely appear in unstructured data. The remaining names, which ranged from five to 199 characters, were further refined to exclude those that could lead to inaccurate drug counts in unstructured data analysis. We compared the efficiency and accuracy of the refined set with the original RxNorm set by testing both on the 18,000 drug-related PubMed abstracts. The results showed a decrease in both computational cost and the number of false drug names identified. Further analysis of the removed names revealed that most originated from only one of the 14 sources. Our findings suggest that the refined set can enhance drug identification in unstructured data analysis, thereby improving pharmacovigilance.https://www.ebm-journal.org/articles/10.3389/ebm.2025.10374/fulladverse drug eventspharmacovigilancenatural language processingdatabaseDrugBank
spellingShingle Wenjing Guo
Fan Dong
Jie Liu
Aasma Aslam
Tucker A. Patterson
Huixiao Hong
A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance
Experimental Biology and Medicine
adverse drug events
pharmacovigilance
natural language processing
database
DrugBank
title A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance
title_full A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance
title_fullStr A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance
title_full_unstemmed A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance
title_short A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance
title_sort refined set of rxnorm drug names for enhancing unstructured data analysis in drug safety surveillance
topic adverse drug events
pharmacovigilance
natural language processing
database
DrugBank
url https://www.ebm-journal.org/articles/10.3389/ebm.2025.10374/full
work_keys_str_mv AT wenjingguo arefinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT fandong arefinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT jieliu arefinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT aasmaaslam arefinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT tuckerapatterson arefinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT huixiaohong arefinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT wenjingguo refinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT fandong refinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT jieliu refinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT aasmaaslam refinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT tuckerapatterson refinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance
AT huixiaohong refinedsetofrxnormdrugnamesforenhancingunstructureddataanalysisindrugsafetysurveillance