Medical named entity recognition based on domain knowledge and position encoding

Abstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. Af...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shuifa Sun, Qin Hu, Fengjiao Xu, Feng Hu, Yirong Wu, Ben Wang
Format:	Article
Language:	English
Published:	BMC 2025-07-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Named entity recognition Medical domain dictionary RoPE BERT Star-Transformer
Online Access:	https://doi.org/10.1186/s12911-025-03037-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849768796294742016
author	Shuifa Sun Qin Hu Fengjiao Xu Feng Hu Yirong Wu Ben Wang
author_facet	Shuifa Sun Qin Hu Fengjiao Xu Feng Hu Yirong Wu Ben Wang
author_sort	Shuifa Sun
collection	DOAJ
description	Abstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence’s feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.
format	Article
id	doaj-art-1e7dcc3b464043b681b7457f7cc3d6a3
institution	DOAJ
issn	1472-6947
language	English
publishDate	2025-07-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj-art-1e7dcc3b464043b681b7457f7cc3d6a32025-08-20T03:03:41ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111610.1186/s12911-025-03037-0Medical named entity recognition based on domain knowledge and position encodingShuifa Sun0Qin Hu1Fengjiao Xu2Feng Hu3Yirong Wu4Ben Wang5School of Information Science and Technology, Hangzhou Normal UniversitySchool of Information Science and Technology, Hangzhou Normal UniversityCollege of Computer and Information Technology, China Three Gorges UniversitySchool of Government, Beijing Normal UniversityInstitute of Advanced Studies in Humanities and Social Sciences, Beijing Normal UniversitySchool of Information Science and Technology, Hangzhou Normal UniversityAbstract A model for recognizing named entities in Chinese electronic medical records is proposed, focusing on accurate boundary detection, by leveraging medical domain knowledge and positional encoding. Medical domain-specific terms are integrated into a BERT module by a lexical adapter firstly. After pre-training, the model captures the dynamic character feature representation containing lexical information and boundary information. In the feature encoding module, Star-Transformer and BiLSTM are employed to extract local features and long-distance features respectively in order to generate the sequence’s feature representation. Additionally, considering the influence of the relative position information between characters in the text on recognition results, Rotary Position Embedding (RoPE) is incorporated to improve Star-Transformer to enhance the ability of extracting semantic features. Experimental results on the CCKS2020 dataset show an improvement in the F1-score, reaching 85.78%. Compared to the baseline model, the F1-score increases by 2.96%. For the self-build breast cancer ultrasound report dataset, improvement is also observed, which proves the effectiveness and applicability of the model in medical field.https://doi.org/10.1186/s12911-025-03037-0Named entity recognitionMedical domain dictionaryRoPEBERTStar-Transformer
spellingShingle	Shuifa Sun Qin Hu Fengjiao Xu Feng Hu Yirong Wu Ben Wang Medical named entity recognition based on domain knowledge and position encoding BMC Medical Informatics and Decision Making Named entity recognition Medical domain dictionary RoPE BERT Star-Transformer
title	Medical named entity recognition based on domain knowledge and position encoding
title_full	Medical named entity recognition based on domain knowledge and position encoding
title_fullStr	Medical named entity recognition based on domain knowledge and position encoding
title_full_unstemmed	Medical named entity recognition based on domain knowledge and position encoding
title_short	Medical named entity recognition based on domain knowledge and position encoding
title_sort	medical named entity recognition based on domain knowledge and position encoding
topic	Named entity recognition Medical domain dictionary RoPE BERT Star-Transformer
url	https://doi.org/10.1186/s12911-025-03037-0
work_keys_str_mv	AT shuifasun medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT qinhu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT fengjiaoxu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT fenghu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT yirongwu medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding AT benwang medicalnamedentityrecognitionbasedondomainknowledgeandpositionencoding

Medical named entity recognition based on domain knowledge and position encoding

Similar Items