Medical named entity recognition based on domain knowledge and position encoding
Abstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. Af...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | BMC Medical Informatics and Decision Making |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12911-025-03037-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849768796294742016 |
|---|---|
| author | Shuifa Sun Qin Hu Fengjiao Xu Feng Hu Yirong Wu Ben Wang |
| author_facet | Shuifa Sun Qin Hu Fengjiao Xu Feng Hu Yirong Wu Ben Wang |
| author_sort | Shuifa Sun |
| collection | DOAJ |
| description | Abstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence’s feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field. |
| format | Article |
| id | doaj-art-1e7dcc3b464043b681b7457f7cc3d6a3 |
| institution | DOAJ |
| issn | 1472-6947 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Informatics and Decision Making |
| spelling | doaj-art-1e7dcc3b464043b681b7457f7cc3d6a32025-08-20T03:03:41ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111610.1186/s12911-025-03037-0Medical named entity recognition based on domain knowledge and position encodingShuifa Sun0Qin Hu1Fengjiao Xu2Feng Hu3Yirong Wu4Ben Wang5School of Information Science and Technology, Hangzhou Normal UniversitySchool of Information Science and Technology, Hangzhou Normal UniversityCollege of Computer and Information Technology, China Three Gorges UniversitySchool of Government, Beijing Normal UniversityInstitute of Advanced Studies in Humanities and Social Sciences, Beijing Normal UniversitySchool of Information Science and Technology, Hangzhou Normal UniversityAbstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence’s feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.https://doi.org/10.1186/s12911-025-03037-0Named entity recognitionMedical domain dictionaryRoPEBERTStar-Transformer |
| spellingShingle | Shuifa Sun Qin Hu Fengjiao Xu Feng Hu Yirong Wu Ben Wang Medical named entity recognition based on domain knowledge and position encoding BMC Medical Informatics and Decision Making Named entity recognition Medical domain dictionary RoPE BERT Star-Transformer |
| title | Medical named entity recognition based on domain knowledge and position encoding |
| title_full | Medical named entity recognition based on domain knowledge and position encoding |
| title_fullStr | Medical named entity recognition based on domain knowledge and position encoding |
| title_full_unstemmed | Medical named entity recognition based on domain knowledge and position encoding |
| title_short | Medical named entity recognition based on domain knowledge and position encoding |
| title_sort | medical named entity recognition based on domain knowledge and position encoding |
| topic | Named entity recognition Medical domain dictionary RoPE BERT Star-Transformer |
| url | https://doi.org/10.1186/s12911-025-03037-0 |
| work_keys_str_mv | AT shuifasun medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT qinhu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT fengjiaoxu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT fenghu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT yirongwu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT benwang medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding |