Linguistic-visual based multimodal Yi character recognition

Abstract The recognition of Yi characters is challenged by considerable variability in their morphological structures and complex semantic relationships, leading to decreased recognition accuracy. This paper presents a multimodal Yi character recognition method comprehensively incorporating linguist...

Full description

Saved in:
Bibliographic Details
Main Authors: Haipeng Sun, Xueyan Ding, Zimeng Li, Jian Sun, Hua Yu, Jianxin Zhang
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-96397-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850202463171248128
author Haipeng Sun
Xueyan Ding
Zimeng Li
Jian Sun
Hua Yu
Jianxin Zhang
author_facet Haipeng Sun
Xueyan Ding
Zimeng Li
Jian Sun
Hua Yu
Jianxin Zhang
author_sort Haipeng Sun
collection DOAJ
description Abstract The recognition of Yi characters is challenged by considerable variability in their morphological structures and complex semantic relationships, leading to decreased recognition accuracy. This paper presents a multimodal Yi character recognition method comprehensively incorporating linguistic and visual features. The visual transformer, integrated with deformable convolution, effectively captures key features during the visual modeling phase. It effectively adapts to variations in Yi character images, improving recognition accuracy, particularly for images with deformations and complex backgrounds. In the linguistic modeling phase, a Pyramid Pooling Transformer incorporates semantic contextual information across multiple scales, enhancing feature representation and capturing the detailed linguistic structure. Finally, a fusion strategy utilizing the cross-attention mechanism is employed to refine the relationships between feature regions and combine features from different modalities, thereby achieving high-precision character recognition. Experimental results demonstrate that the proposed method achieves a recognition accuracy of 99.5%, surpassing baseline methods by 3.4%, thereby validating its effectiveness.
format Article
id doaj-art-6c9963a1c1c848d493aaafa4b6f3f7bc
institution OA Journals
issn 2045-2322
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-6c9963a1c1c848d493aaafa4b6f3f7bc2025-08-20T02:11:46ZengNature PortfolioScientific Reports2045-23222025-04-0115111410.1038/s41598-025-96397-6Linguistic-visual based multimodal Yi character recognitionHaipeng Sun0Xueyan Ding1Zimeng Li2Jian Sun3Hua Yu4Jianxin Zhang5Key Laboratory of Ethnic Language Intelligent Analysis and Security Management of MOE, Minzu University of ChinaSchool of Computer Science and Engineering, Dalian Minzu UniversitySchool of Computer Science and Engineering, Dalian Minzu UniversitySchool of Computer Science and Engineering, Dalian Minzu UniversityYi Language Research Room, China Ethnic Languages Translation CentreSchool of Computer Science and Engineering, Dalian Minzu UniversityAbstract The recognition of Yi characters is challenged by considerable variability in their morphological structures and complex semantic relationships, leading to decreased recognition accuracy. This paper presents a multimodal Yi character recognition method comprehensively incorporating linguistic and visual features. The visual transformer, integrated with deformable convolution, effectively captures key features during the visual modeling phase. It effectively adapts to variations in Yi character images, improving recognition accuracy, particularly for images with deformations and complex backgrounds. In the linguistic modeling phase, a Pyramid Pooling Transformer incorporates semantic contextual information across multiple scales, enhancing feature representation and capturing the detailed linguistic structure. Finally, a fusion strategy utilizing the cross-attention mechanism is employed to refine the relationships between feature regions and combine features from different modalities, thereby achieving high-precision character recognition. Experimental results demonstrate that the proposed method achieves a recognition accuracy of 99.5%, surpassing baseline methods by 3.4%, thereby validating its effectiveness.https://doi.org/10.1038/s41598-025-96397-6Deep learningCharacter recognitionTransformerLinguistic-visual model
spellingShingle Haipeng Sun
Xueyan Ding
Zimeng Li
Jian Sun
Hua Yu
Jianxin Zhang
Linguistic-visual based multimodal Yi character recognition
Scientific Reports
Deep learning
Character recognition
Transformer
Linguistic-visual model
title Linguistic-visual based multimodal Yi character recognition
title_full Linguistic-visual based multimodal Yi character recognition
title_fullStr Linguistic-visual based multimodal Yi character recognition
title_full_unstemmed Linguistic-visual based multimodal Yi character recognition
title_short Linguistic-visual based multimodal Yi character recognition
title_sort linguistic visual based multimodal yi character recognition
topic Deep learning
Character recognition
Transformer
Linguistic-visual model
url https://doi.org/10.1038/s41598-025-96397-6
work_keys_str_mv AT haipengsun linguisticvisualbasedmultimodalyicharacterrecognition
AT xueyanding linguisticvisualbasedmultimodalyicharacterrecognition
AT zimengli linguisticvisualbasedmultimodalyicharacterrecognition
AT jiansun linguisticvisualbasedmultimodalyicharacterrecognition
AT huayu linguisticvisualbasedmultimodalyicharacterrecognition
AT jianxinzhang linguisticvisualbasedmultimodalyicharacterrecognition