PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts

Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Jordi Armengol-Estapé, Felipe Soares, Montserrat Marimon, Martin Krallinger
Format: Article
Language:English
Published: BioMed Central 2019-06-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gi-2019-17-2-e15.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832570945766686720
author Jordi Armengol-Estapé
Felipe Soares
Montserrat Marimon
Martin Krallinger
author_facet Jordi Armengol-Estapé
Felipe Soares
Montserrat Marimon
Martin Krallinger
author_sort Jordi Armengol-Estapé
collection DOAJ
description Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL).
format Article
id doaj-art-232b7f00e0dc4286a018f3dd2a16d0a6
institution Kabale University
issn 2234-0742
language English
publishDate 2019-06-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-232b7f00e0dc4286a018f3dd2a16d0a62025-02-02T13:28:54ZengBioMed CentralGenomics & Informatics2234-07422019-06-0117210.5808/GI.2019.17.2.e15557PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical textsJordi Armengol-Estapé0Felipe Soares1Montserrat Marimon2Martin Krallinger3 Universitat Politècnica de Catalunya (UPC), 08034 Barcelona, Spain Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain Barcelona Supercomputing Center (BSC), 08034 Barcelona, SpainAutomatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL).http://genominfo.org/upload/pdf/gi-2019-17-2-e15.pdfmachine learningnatural language processingneural networks (computer)
spellingShingle Jordi Armengol-Estapé
Felipe Soares
Montserrat Marimon
Martin Krallinger
PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts
Genomics & Informatics
machine learning
natural language processing
neural networks (computer)
title PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts
title_full PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts
title_fullStr PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts
title_full_unstemmed PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts
title_short PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts
title_sort pharmaconer tagger a deep learning based tool for automatically finding chemicals and drugs in spanish medical texts
topic machine learning
natural language processing
neural networks (computer)
url http://genominfo.org/upload/pdf/gi-2019-17-2-e15.pdf
work_keys_str_mv AT jordiarmengolestape pharmaconertaggeradeeplearningbasedtoolforautomaticallyfindingchemicalsanddrugsinspanishmedicaltexts
AT felipesoares pharmaconertaggeradeeplearningbasedtoolforautomaticallyfindingchemicalsanddrugsinspanishmedicaltexts
AT montserratmarimon pharmaconertaggeradeeplearningbasedtoolforautomaticallyfindingchemicalsanddrugsinspanishmedicaltexts
AT martinkrallinger pharmaconertaggeradeeplearningbasedtoolforautomaticallyfindingchemicalsanddrugsinspanishmedicaltexts