Chinese Sequence Labeling Based on Stack Pre-training Model

Sequence labeling is an important task in natural language processing. In this paper, according to the relevance of tasks, we use stacking pretraining model to extract features, segment words, and name entity recognition/chunk tagging.Through in-depth research on the internal structure of BERT, whil...

Full description

Saved in:
Bibliographic Details
Main Authors: LIU Yu-peng, LI Guo-dong
Format: Article
Language:zho
Published: Harbin University of Science and Technology Publications 2022-02-01
Series:Journal of Harbin University of Science and Technology
Subjects:
Online Access:https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2050
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sequence labeling is an important task in natural language processing. In this paper, according to the relevance of tasks, we use stacking pretraining model to extract features, segment words, and name entity recognition/chunk tagging.Through in-depth research on the internal structure of BERT, while ensuring the accuracy of the original model, the Bidirectional Encoder Representation from Transformers (BERT) is optimized, which reduces the complexity and the time cost of the model in the training and prediction process.In the upper layer structure, compared with the traditional long-short-term memory network (LSTM), this paper uses a two-layer bidirectional LSTM structure, the bottom layer uses a bidirectional long-short-term memory network (Bi-LSTM) for word segmentation, and the top layer is used for sequence labeling tasks.On the New Semi-Conditional Random Field (NSCRF), the traditional semi-Markov Conditional Random Field (Semi-CRF) and Conditional Random Field (CRF) are combined while considering the segmentation.The labeling of words improves accuracy in training and decoding. We trained the model on the CCKS2019, MSRANER, and BosonNLP datasets and achieved great improvements. The F1 measures reached 92.37%, 95.69%, and 93.75%, respectively.
ISSN:1007-2683