Linguistic-visual based multimodal Yi character recognition
Abstract The recognition of Yi characters is challenged by considerable variability in their morphological structures and complex semantic relationships, leading to decreased recognition accuracy. This paper presents a multimodal Yi character recognition method comprehensively incorporating linguist...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-04-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-96397-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850202463171248128 |
|---|---|
| author | Haipeng Sun Xueyan Ding Zimeng Li Jian Sun Hua Yu Jianxin Zhang |
| author_facet | Haipeng Sun Xueyan Ding Zimeng Li Jian Sun Hua Yu Jianxin Zhang |
| author_sort | Haipeng Sun |
| collection | DOAJ |
| description | Abstract The recognition of Yi characters is challenged by considerable variability in their morphological structures and complex semantic relationships, leading to decreased recognition accuracy. This paper presents a multimodal Yi character recognition method comprehensively incorporating linguistic and visual features. The visual transformer, integrated with deformable convolution, effectively captures key features during the visual modeling phase. It effectively adapts to variations in Yi character images, improving recognition accuracy, particularly for images with deformations and complex backgrounds. In the linguistic modeling phase, a Pyramid Pooling Transformer incorporates semantic contextual information across multiple scales, enhancing feature representation and capturing the detailed linguistic structure. Finally, a fusion strategy utilizing the cross-attention mechanism is employed to refine the relationships between feature regions and combine features from different modalities, thereby achieving high-precision character recognition. Experimental results demonstrate that the proposed method achieves a recognition accuracy of 99.5%, surpassing baseline methods by 3.4%, thereby validating its effectiveness. |
| format | Article |
| id | doaj-art-6c9963a1c1c848d493aaafa4b6f3f7bc |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-6c9963a1c1c848d493aaafa4b6f3f7bc2025-08-20T02:11:46ZengNature PortfolioScientific Reports2045-23222025-04-0115111410.1038/s41598-025-96397-6Linguistic-visual based multimodal Yi character recognitionHaipeng Sun0Xueyan Ding1Zimeng Li2Jian Sun3Hua Yu4Jianxin Zhang5Key Laboratory of Ethnic Language Intelligent Analysis and Security Management of MOE, Minzu University of ChinaSchool of Computer Science and Engineering, Dalian Minzu UniversitySchool of Computer Science and Engineering, Dalian Minzu UniversitySchool of Computer Science and Engineering, Dalian Minzu UniversityYi Language Research Room, China Ethnic Languages Translation CentreSchool of Computer Science and Engineering, Dalian Minzu UniversityAbstract The recognition of Yi characters is challenged by considerable variability in their morphological structures and complex semantic relationships, leading to decreased recognition accuracy. This paper presents a multimodal Yi character recognition method comprehensively incorporating linguistic and visual features. The visual transformer, integrated with deformable convolution, effectively captures key features during the visual modeling phase. It effectively adapts to variations in Yi character images, improving recognition accuracy, particularly for images with deformations and complex backgrounds. In the linguistic modeling phase, a Pyramid Pooling Transformer incorporates semantic contextual information across multiple scales, enhancing feature representation and capturing the detailed linguistic structure. Finally, a fusion strategy utilizing the cross-attention mechanism is employed to refine the relationships between feature regions and combine features from different modalities, thereby achieving high-precision character recognition. Experimental results demonstrate that the proposed method achieves a recognition accuracy of 99.5%, surpassing baseline methods by 3.4%, thereby validating its effectiveness.https://doi.org/10.1038/s41598-025-96397-6Deep learningCharacter recognitionTransformerLinguistic-visual model |
| spellingShingle | Haipeng Sun Xueyan Ding Zimeng Li Jian Sun Hua Yu Jianxin Zhang Linguistic-visual based multimodal Yi character recognition Scientific Reports Deep learning Character recognition Transformer Linguistic-visual model |
| title | Linguistic-visual based multimodal Yi character recognition |
| title_full | Linguistic-visual based multimodal Yi character recognition |
| title_fullStr | Linguistic-visual based multimodal Yi character recognition |
| title_full_unstemmed | Linguistic-visual based multimodal Yi character recognition |
| title_short | Linguistic-visual based multimodal Yi character recognition |
| title_sort | linguistic visual based multimodal yi character recognition |
| topic | Deep learning Character recognition Transformer Linguistic-visual model |
| url | https://doi.org/10.1038/s41598-025-96397-6 |
| work_keys_str_mv | AT haipengsun linguisticvisualbasedmultimodalyicharacterrecognition AT xueyanding linguisticvisualbasedmultimodalyicharacterrecognition AT zimengli linguisticvisualbasedmultimodalyicharacterrecognition AT jiansun linguisticvisualbasedmultimodalyicharacterrecognition AT huayu linguisticvisualbasedmultimodalyicharacterrecognition AT jianxinzhang linguisticvisualbasedmultimodalyicharacterrecognition |