Chinese Sequence Labeling Based on Stack Pre-training Model
Sequence labeling is an important task in natural language processing. In this paper, according to the relevance of tasks, we use stacking pretraining model to extract features, segment words, and name entity recognition/chunk tagging.Through in-depth research on the internal structure of BERT, whil...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
Harbin University of Science and Technology Publications
2022-02-01
|
| Series: | Journal of Harbin University of Science and Technology |
| Subjects: | |
| Online Access: | https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2050 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Sequence labeling is an important task in natural language processing. In this paper, according to the relevance of tasks, we use stacking pretraining model to extract features, segment words, and name entity recognition/chunk tagging.Through in-depth research on the internal structure of BERT, while ensuring the accuracy of the original model, the Bidirectional Encoder Representation from Transformers (BERT) is optimized, which reduces the
complexity and the time cost of the model in the training and prediction process.In the upper layer structure, compared with the traditional long-short-term memory network (LSTM), this paper uses a two-layer bidirectional LSTM structure, the bottom layer uses a bidirectional long-short-term memory network (Bi-LSTM) for word segmentation, and the top layer is used for sequence labeling tasks.On the New Semi-Conditional Random Field (NSCRF), the traditional semi-Markov Conditional Random Field (Semi-CRF) and Conditional Random Field (CRF) are combined while considering the segmentation.The labeling of words improves accuracy in training and decoding. We trained the model on the CCKS2019, MSRANER, and BosonNLP datasets and achieved great improvements. The F1 measures reached 92.37%, 95.69%, and 93.75%, respectively. |
|---|---|
| ISSN: | 1007-2683 |