A large language model for multimodal identification of crop diseases and pests
Abstract Pests and diseases significantly impact the growth and development of crops. When attempting to precisely identify disease characteristics in crop images through dialogue, existing multimodal models face numerous challenges, often leading to misinterpretation and incorrect feedback regardin...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-01908-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Pests and diseases significantly impact the growth and development of crops. When attempting to precisely identify disease characteristics in crop images through dialogue, existing multimodal models face numerous challenges, often leading to misinterpretation and incorrect feedback regarding disease information. This paper proposed a large language model for multimodal identification of crop diseases and pests, which can be called LLMI-CDP. It builds up on the VisualGLM model and introduces improvements to achieve precise identification of agricultural crop disease and pest images, along with providing professional recommendations for relevant preventive measures. The use of Low-Rank Adaptation (LoRA) technology, which adjusts the weights of pre-trained models, achieves significant performance improvements with a minimal increase in parameters. This ensures the precise capture and efficient identification of crop pest and disease characteristics, greatly enhancing the model’s application flexibility and accuracy in the field of pest and disease recognition. Simultaneously, the model incorporates the Q-Former framework for effective modal alignment between language models and image features. Through this approach, the LLMI-CDP model is able to more deeply understand and process the complex relationships between language and visual information, further enhancing its performance in multimodal recognition tasks. Experiments are carried out in the homemade datasets, The results demonstrate that the LLMI-CDP model surpasses five leading multimodal large language models in relevant evaluation metrics, confirming its outstanding performance in Chinese multimodal dialogues related to agriculture. |
|---|---|
| ISSN: | 2045-2322 |