Medical named entity recognition based on domain knowledge and position encoding

Abstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. Af...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-03037-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849768796294742016
author Shuifa Sun
Qin Hu
Fengjiao Xu
Feng Hu
Yirong Wu
Ben Wang
author_facet Shuifa Sun
Qin Hu
Fengjiao Xu
Feng Hu
Yirong Wu
Ben Wang
author_sort Shuifa Sun
collection DOAJ
description Abstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence’s feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.
format Article
id doaj-art-1e7dcc3b464043b681b7457f7cc3d6a3
institution DOAJ
issn 1472-6947
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-1e7dcc3b464043b681b7457f7cc3d6a32025-08-20T03:03:41ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111610.1186/s12911-025-03037-0Medical named entity recognition based on domain knowledge and position encodingShuifa Sun0Qin Hu1Fengjiao Xu2Feng Hu3Yirong Wu4Ben Wang5School of Information Science and Technology, Hangzhou Normal UniversitySchool of Information Science and Technology, Hangzhou Normal UniversityCollege of Computer and Information Technology, China Three Gorges UniversitySchool of Government, Beijing Normal UniversityInstitute of Advanced Studies in Humanities and Social Sciences, Beijing Normal UniversitySchool of Information Science and Technology, Hangzhou Normal UniversityAbstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence’s feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.https://doi.org/10.1186/s12911-025-03037-0Named entity recognitionMedical domain dictionaryRoPEBERTStar-Transformer
spellingShingle Shuifa Sun
Qin Hu
Fengjiao Xu
Feng Hu
Yirong Wu
Ben Wang
Medical named entity recognition based on domain knowledge and position encoding
BMC Medical Informatics and Decision Making
Named entity recognition
Medical domain dictionary
RoPE
BERT
Star-Transformer
title Medical named entity recognition based on domain knowledge and position encoding
title_full Medical named entity recognition based on domain knowledge and position encoding
title_fullStr Medical named entity recognition based on domain knowledge and position encoding
title_full_unstemmed Medical named entity recognition based on domain knowledge and position encoding
title_short Medical named entity recognition based on domain knowledge and position encoding
title_sort medical named entity recognition based on domain knowledge and position encoding
topic Named entity recognition
Medical domain dictionary
RoPE
BERT
Star-Transformer
url https://doi.org/10.1186/s12911-025-03037-0
work_keys_str_mv AT shuifasun medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding
AT qinhu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding
AT fengjiaoxu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding
AT fenghu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding
AT yirongwu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding
AT benwang medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding