Extending TextAE for annotation of non-contiguous entities

Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or iden...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jake Lever, Russ Altman, Jin-Dong Kim
Format:	Article
Language:	English
Published:	BioMed Central 2020-06-01
Series:	Genomics & Informatics
Subjects:	editor text annotation text mining visualization
Online Access:	http://genominfo.org/upload/pdf/gi-2020-18-2-e15.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832569517931233280
author	Jake Lever Russ Altman Jin-Dong Kim
author_facet	Jake Lever Russ Altman Jin-Dong Kim
author_sort	Jake Lever
collection	DOAJ
description	Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity “type 1 diabetes” in the phrase “type 1 and type 2 diabetes.” This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE’s existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems.
format	Article
id	doaj-art-47dba2f8424c44e6a55f6c0ae53900cd
institution	Kabale University
issn	2234-0742
language	English
publishDate	2020-06-01
publisher	BioMed Central
record_format	Article
series	Genomics & Informatics
spelling	doaj-art-47dba2f8424c44e6a55f6c0ae53900cd2025-02-02T20:41:01ZengBioMed CentralGenomics & Informatics2234-07422020-06-01182e1510.5808/GI.2020.18.2.e15604Extending TextAE for annotation of non-contiguous entitiesJake Lever0Russ Altman1Jin-Dong Kim2 Department of Bioengineering, Stanford University, Stanford, CA 94305, USA Department of Bioengineering, Stanford University, Stanford, CA 94305, USA Database Center for Life Science, Research Organization of Information and Systems, Kashiwa 277-0871, JapanNamed entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity “type 1 diabetes” in the phrase “type 1 and type 2 diabetes.” This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE’s existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems.http://genominfo.org/upload/pdf/gi-2020-18-2-e15.pdfeditortext annotationtext miningvisualization
spellingShingle	Jake Lever Russ Altman Jin-Dong Kim Extending TextAE for annotation of non-contiguous entities Genomics & Informatics editor text annotation text mining visualization
title	Extending TextAE for annotation of non-contiguous entities
title_full	Extending TextAE for annotation of non-contiguous entities
title_fullStr	Extending TextAE for annotation of non-contiguous entities
title_full_unstemmed	Extending TextAE for annotation of non-contiguous entities
title_short	Extending TextAE for annotation of non-contiguous entities
title_sort	extending textae for annotation of non contiguous entities
topic	editor text annotation text mining visualization
url	http://genominfo.org/upload/pdf/gi-2020-18-2-e15.pdf
work_keys_str_mv	AT jakelever extendingtextaeforannotationofnoncontiguousentities AT russaltman extendingtextaeforannotationofnoncontiguousentities AT jindongkim extendingtextaeforannotationofnoncontiguousentities

Extending TextAE for annotation of non-contiguous entities

Similar Items