Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug Administration

The US Food and Drug Administration (FDA) receives millions of postmarket adverse event reports for drug and therapeutic biologic products every year. One of the most salient issues with these submissions is report duplication, where an adverse event experienced by one patient is reported multiple t...

Full description

Saved in:
Bibliographic Details
Main Authors: Kory Kreimeyer, Oanh Dang, Jonathan Spiker, Paula Gish, Jessica Weintraub, Eileen Wu, Robert Ball, Taxiarchis Botsis
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-06-01
Series:Frontiers in Drug Safety and Regulation
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdsfr.2022.918897/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846162246814662656
author Kory Kreimeyer
Oanh Dang
Jonathan Spiker
Paula Gish
Jessica Weintraub
Eileen Wu
Robert Ball
Taxiarchis Botsis
author_facet Kory Kreimeyer
Oanh Dang
Jonathan Spiker
Paula Gish
Jessica Weintraub
Eileen Wu
Robert Ball
Taxiarchis Botsis
author_sort Kory Kreimeyer
collection DOAJ
description The US Food and Drug Administration (FDA) receives millions of postmarket adverse event reports for drug and therapeutic biologic products every year. One of the most salient issues with these submissions is report duplication, where an adverse event experienced by one patient is reported multiple times to the FDA. Duplication has important negative implications for data analysis. We improved and optimized an existing deduplication algorithm that used both structured and free-text data, developed a web-based application to support data processing, and conducted a 6-month dedicated evaluation to assess the potential operationalization of the deduplication process in the FDA. Comparing algorithm predictions with reviewer determinations of duplicates for twenty-seven files for case series reviews (with a median size of 281 reports), the average pairwise recall and precision were equal to 0.71 (SD ± 0.32) and 0.67 (SD ± 0.34). Overall, reviewers felt confident about the algorithm and expressed their interest in using it. These findings support the operationalization of the deduplication process for case series review as a supplement to human review.
format Article
id doaj-art-7eec6d5ca3354cf68344523790afa879
institution Kabale University
issn 2674-0869
language English
publishDate 2022-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Drug Safety and Regulation
spelling doaj-art-7eec6d5ca3354cf68344523790afa8792024-11-20T15:49:07ZengFrontiers Media S.A.Frontiers in Drug Safety and Regulation2674-08692022-06-01210.3389/fdsfr.2022.918897918897Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug AdministrationKory Kreimeyer0Oanh Dang1Jonathan Spiker2Paula Gish3Jessica Weintraub4Eileen Wu5Robert Ball6Taxiarchis Botsis7The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, United StatesOffice of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, United StatesThe Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, United StatesOffice of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, United StatesOffice of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, United StatesOffice of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, United StatesOffice of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, United StatesThe Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, United StatesThe US Food and Drug Administration (FDA) receives millions of postmarket adverse event reports for drug and therapeutic biologic products every year. One of the most salient issues with these submissions is report duplication, where an adverse event experienced by one patient is reported multiple times to the FDA. Duplication has important negative implications for data analysis. We improved and optimized an existing deduplication algorithm that used both structured and free-text data, developed a web-based application to support data processing, and conducted a 6-month dedicated evaluation to assess the potential operationalization of the deduplication process in the FDA. Comparing algorithm predictions with reviewer determinations of duplicates for twenty-seven files for case series reviews (with a median size of 281 reports), the average pairwise recall and precision were equal to 0.71 (SD ± 0.32) and 0.67 (SD ± 0.34). Overall, reviewers felt confident about the algorithm and expressed their interest in using it. These findings support the operationalization of the deduplication process for case series review as a supplement to human review.https://www.frontiersin.org/articles/10.3389/fdsfr.2022.918897/fullpharmacovigilancededuplicationdecision supportnatural language processingsafety surveillance
spellingShingle Kory Kreimeyer
Oanh Dang
Jonathan Spiker
Paula Gish
Jessica Weintraub
Eileen Wu
Robert Ball
Taxiarchis Botsis
Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug Administration
Frontiers in Drug Safety and Regulation
pharmacovigilance
deduplication
decision support
natural language processing
safety surveillance
title Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug Administration
title_full Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug Administration
title_fullStr Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug Administration
title_full_unstemmed Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug Administration
title_short Increased Confidence in Deduplication of Drug Safety Reports with Natural Language Processing of Narratives at the US Food and Drug Administration
title_sort increased confidence in deduplication of drug safety reports with natural language processing of narratives at the us food and drug administration
topic pharmacovigilance
deduplication
decision support
natural language processing
safety surveillance
url https://www.frontiersin.org/articles/10.3389/fdsfr.2022.918897/full
work_keys_str_mv AT korykreimeyer increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration
AT oanhdang increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration
AT jonathanspiker increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration
AT paulagish increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration
AT jessicaweintraub increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration
AT eileenwu increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration
AT robertball increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration
AT taxiarchisbotsis increasedconfidenceindeduplicationofdrugsafetyreportswithnaturallanguageprocessingofnarrativesattheusfoodanddrugadministration