Enhancing named entity recognition with a novel BERT‐BiLSTM‐CRF‐RC joint training model for biomedical materials database
Abstract In this study, we propose a novel joint training model for named entity recognition (NER) that combines BERT, BiLSTM, CRF, and a reading comprehension (RC) mechanism. Traditional BERT‐BiLSTM‐CRF models often struggle with inaccurate boundary detection and excessive fragmentation of named en...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley-VCH
2025-03-01
|
| Series: | Materials Genome Engineering Advances |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/mgea.70001 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract In this study, we propose a novel joint training model for named entity recognition (NER) that combines BERT, BiLSTM, CRF, and a reading comprehension (RC) mechanism. Traditional BERT‐BiLSTM‐CRF models often struggle with inaccurate boundary detection and excessive fragmentation of named entities due to their lack of specialized vocabulary. Our model addresses these issues by integrating an RC mechanism, which helps refine fragmented results by enabling the model to more precisely identify entity boundaries without relying on an expert‐annotated dictionary. Additionally, segmentation issues are further mitigated through a segmented combined voting‐ and positive‐sample‐coverage technique. We applied this model to develop a database for mesoporous bioactive glass (MBG). Furthermore, a classifier was developed to automatically detect the presence of pertinent information within paragraphs. For this study, 200 articles were searched using MBG‐related keywords, and the data were split into a training set and a test set in a 9:1 ratio. A total of 492 paragraphs were automatically extracted for training, and 50 paragraphs were extracted for testing the model. The results demonstrate that our joint training model achieves an accuracy of 92.8% in named entity recognition, which is 4.3% higher than the 88.5% accuracy of the traditional BERT‐BiLSTM‐CRF model. |
|---|---|
| ISSN: | 2940-9489 2940-9497 |