Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognition
Named Entity Recognition11 To improve clarity and accessibility for readers unfamiliar with the topic, we provide definitions of key terms used throughout the paper, along with relevant references for further reading, as shown in Table 5 in Appendix A. (NER), as one of the popular directions in natu...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-12-01
|
| Series: | Smart Agricultural Technology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2772375525004198 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849714010993197056 |
|---|---|
| author | Jingzhong Huang Xia Hao Yu Wang Ruizhi Song Zenan Mu Wen Chu Georgios Papadakis Sijie Niu Xuchao Guo |
| author_facet | Jingzhong Huang Xia Hao Yu Wang Ruizhi Song Zenan Mu Wen Chu Georgios Papadakis Sijie Niu Xuchao Guo |
| author_sort | Jingzhong Huang |
| collection | DOAJ |
| description | Named Entity Recognition11 To improve clarity and accessibility for readers unfamiliar with the topic, we provide definitions of key terms used throughout the paper, along with relevant references for further reading, as shown in Table 5 in Appendix A. (NER), as one of the popular directions in natural language processing, plays a critical role in fields such as information extraction and agricultural knowledge graph construction. However, traditional single modal methods based on pure text often face limitations in agricultural entity recognition, such as text description ambiguity, contextual limitations, and a lack of information fusion capabilities. This paper overcomes those limitations by introducing an agricultural multimodal NER model that uses entity-level cross-modal alignment. First, we propose a Dual-Stream Entity-Level Feature Encoder. The text stream employs a Boundary-Middle (B-M) classification strategy to achieve fine-grained semantic unit segmentation, effectively addressing long-entity boundary ambiguity and parallel computing challenges. The visual stream focuses on interesting region detection to enhance multi-scale visual entity feature extraction capabilities. Secondly, we introduce a Dynamic Cross-modal Gated Attention (DCGA) mechanism that adaptively adjusts visual feature contributions through gating weights. This approach integrates cross-modal contrastive learning to strengthen semantic connections at the entity level between images and text. To validate the model's effectiveness, we constructed a multimodal NER dataset containing 12,074 sample pairs across 10 entity categories, covering 10 crops, 82 typical diseases/pests, and related agrochemical data. The proposed method achieves a macro-average F1 score of 90.73 % across 10 agricultural entity types, outperforming single-modal baselines by 5.96 %, mainstream multimodal NER models by +3.06 %, zero-shot GPT models by +11.41 %, and fine-tuned multimodal large models by +2.1 %. Comprehensive experimental results indicated that our multimodal collaborative learning framework could effectively enhance agricultural entity recognition accuracy, providing reliable technical support for downstream applications such as agricultural knowledge graph construction and intelligent question answering. |
| format | Article |
| id | doaj-art-4cc53f53e3de46989a31c8407f019367 |
| institution | DOAJ |
| issn | 2772-3755 |
| language | English |
| publishDate | 2025-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Smart Agricultural Technology |
| spelling | doaj-art-4cc53f53e3de46989a31c8407f0193672025-08-20T03:13:49ZengElsevierSmart Agricultural Technology2772-37552025-12-011210118810.1016/j.atech.2025.101188Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognitionJingzhong Huang0Xia Hao1Yu Wang2Ruizhi Song3Zenan Mu4Wen Chu5Georgios Papadakis6Sijie Niu7Xuchao Guo8College of Information Science and Engineering, Shandong Agricultural University, Tai’an 271000, ChinaCollege of Information Science and Engineering, Shandong Agricultural University, Tai’an 271000, ChinaCollege of Information Science and Engineering, Shandong Agricultural University, Tai’an 271000, ChinaCollege of Information Science and Engineering, Shandong Agricultural University, Tai’an 271000, ChinaCollege of Information Science and Engineering, Shandong Agricultural University, Tai’an 271000, ChinaCollege of Information Science and Engineering, Shandong Agricultural University, Tai’an 271000, ChinaDigital Twin Agricultural Technology Research Center, Shandong Agricultural University, Tai'an 271018, China; Agricultural University of Athens, Dept of Natural Resources and Agricultural Engineering, Athens, GreeceSchool of Information Science and Technology, University of Jinan, Jinan 250022, China; Shandong Key Laboratory of Ubiquitous Intelligent Computing, Jinan 250022, ChinaCollege of Information Science and Engineering, Shandong Agricultural University, Tai’an 271000, China; Corresponding author.Named Entity Recognition11 To improve clarity and accessibility for readers unfamiliar with the topic, we provide definitions of key terms used throughout the paper, along with relevant references for further reading, as shown in Table 5 in Appendix A. (NER), as one of the popular directions in natural language processing, plays a critical role in fields such as information extraction and agricultural knowledge graph construction. However, traditional single modal methods based on pure text often face limitations in agricultural entity recognition, such as text description ambiguity, contextual limitations, and a lack of information fusion capabilities. This paper overcomes those limitations by introducing an agricultural multimodal NER model that uses entity-level cross-modal alignment. First, we propose a Dual-Stream Entity-Level Feature Encoder. The text stream employs a Boundary-Middle (B-M) classification strategy to achieve fine-grained semantic unit segmentation, effectively addressing long-entity boundary ambiguity and parallel computing challenges. The visual stream focuses on interesting region detection to enhance multi-scale visual entity feature extraction capabilities. Secondly, we introduce a Dynamic Cross-modal Gated Attention (DCGA) mechanism that adaptively adjusts visual feature contributions through gating weights. This approach integrates cross-modal contrastive learning to strengthen semantic connections at the entity level between images and text. To validate the model's effectiveness, we constructed a multimodal NER dataset containing 12,074 sample pairs across 10 entity categories, covering 10 crops, 82 typical diseases/pests, and related agrochemical data. The proposed method achieves a macro-average F1 score of 90.73 % across 10 agricultural entity types, outperforming single-modal baselines by 5.96 %, mainstream multimodal NER models by +3.06 %, zero-shot GPT models by +11.41 %, and fine-tuned multimodal large models by +2.1 %. Comprehensive experimental results indicated that our multimodal collaborative learning framework could effectively enhance agricultural entity recognition accuracy, providing reliable technical support for downstream applications such as agricultural knowledge graph construction and intelligent question answering.http://www.sciencedirect.com/science/article/pii/S2772375525004198Chinese named entity recognitionMultimodal named entity recognitionMultimodal alignmentDynamic attention mechanismDual stream encoder |
| spellingShingle | Jingzhong Huang Xia Hao Yu Wang Ruizhi Song Zenan Mu Wen Chu Georgios Papadakis Sijie Niu Xuchao Guo Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognition Smart Agricultural Technology Chinese named entity recognition Multimodal named entity recognition Multimodal alignment Dynamic attention mechanism Dual stream encoder |
| title | Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognition |
| title_full | Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognition |
| title_fullStr | Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognition |
| title_full_unstemmed | Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognition |
| title_short | Entity-level cross-modal fusion for multimodal chinese agricultural diseases and pests named entity recognition |
| title_sort | entity level cross modal fusion for multimodal chinese agricultural diseases and pests named entity recognition |
| topic | Chinese named entity recognition Multimodal named entity recognition Multimodal alignment Dynamic attention mechanism Dual stream encoder |
| url | http://www.sciencedirect.com/science/article/pii/S2772375525004198 |
| work_keys_str_mv | AT jingzhonghuang entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT xiahao entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT yuwang entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT ruizhisong entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT zenanmu entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT wenchu entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT georgiospapadakis entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT sijieniu entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition AT xuchaoguo entitylevelcrossmodalfusionformultimodalchineseagriculturaldiseasesandpestsnamedentityrecognition |