Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of

There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text a...

Full description

Saved in:
Bibliographic Details
Main Author: Hyun-Seok Park
Format: Article
Language:English
Published: BioMed Central 2018-12-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832570081395081216
author Hyun-Seok Park
author_facet Hyun-Seok Park
author_sort Hyun-Seok Park
collection DOAJ
description There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.
format Article
id doaj-art-88d5117789994eb7a87272fc34e213c4
institution Kabale University
issn 2234-0742
language English
publishDate 2018-12-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-88d5117789994eb7a87272fc34e213c42025-02-02T17:54:24ZengBioMed CentralGenomics & Informatics2234-07422018-12-0116410.5808/GI.2018.16.4.e40542Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus ofHyun-Seok Park0 Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, KoreaThere is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdfbiomedical text miningcorpustext analytics
spellingShingle Hyun-Seok Park
Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
Genomics & Informatics
biomedical text mining
corpus
text analytics
title Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_full Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_fullStr Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_full_unstemmed Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_short Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
title_sort opinion strategy of semi automatically annotating a full text corpus of
topic biomedical text mining
corpus
text analytics
url http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf
work_keys_str_mv AT hyunseokpark opinionstrategyofsemiautomaticallyannotatingafulltextcorpusof