Enhancing named entity recognition with a novel BERT‐BiLSTM‐CRF‐RC joint training model for biomedical materials database

Abstract In this study, we propose a novel joint training model for named entity recognition (NER) that combines BERT, BiLSTM, CRF, and a reading comprehension (RC) mechanism. Traditional BERT‐BiLSTM‐CRF models often struggle with inaccurate boundary detection and excessive fragmentation of named en...

Full description

Saved in:
Bibliographic Details
Main Authors: Mufei Li, Yan Zhuang, Ke Chen, Lin Han, Xiangfeng Li, Yongtao wei, Xiangdong Zhu, Mingli Yang, Guangfu Yin, Jiangli Lin, Xingdong Zhang
Format: Article
Language:English
Published: Wiley-VCH 2025-03-01
Series:Materials Genome Engineering Advances
Subjects:
Online Access:https://doi.org/10.1002/mgea.70001
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract In this study, we propose a novel joint training model for named entity recognition (NER) that combines BERT, BiLSTM, CRF, and a reading comprehension (RC) mechanism. Traditional BERT‐BiLSTM‐CRF models often struggle with inaccurate boundary detection and excessive fragmentation of named entities due to their lack of specialized vocabulary. Our model addresses these issues by integrating an RC mechanism, which helps refine fragmented results by enabling the model to more precisely identify entity boundaries without relying on an expert‐annotated dictionary. Additionally, segmentation issues are further mitigated through a segmented combined voting‐ and positive‐sample‐coverage technique. We applied this model to develop a database for mesoporous bioactive glass (MBG). Furthermore, a classifier was developed to automatically detect the presence of pertinent information within paragraphs. For this study, 200 articles were searched using MBG‐related keywords, and the data were split into a training set and a test set in a 9:1 ratio. A total of 492 paragraphs were automatically extracted for training, and 50 paragraphs were extracted for testing the model. The results demonstrate that our joint training model achieves an accuracy of 92.8% in named entity recognition, which is 4.3% higher than the 88.5% accuracy of the traditional BERT‐BiLSTM‐CRF model.
ISSN:2940-9489
2940-9497