Optimization and application of vision-based large models in educational scenarios
With the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and l...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
China InfoCom Media Group
2025-01-01
|
| Series: | 大数据 |
| Subjects: | |
| Online Access: | http://www.j-bigdataresearch.com.cn/zh/article/111999691/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850121700068294656 |
|---|---|
| author | XU Yuepeng XU Chaidi GUO Jinjun JIANG Yunqiao WANG Shijia LIU Yao |
| author_facet | XU Yuepeng XU Chaidi GUO Jinjun JIANG Yunqiao WANG Shijia LIU Yao |
| author_sort | XU Yuepeng |
| collection | DOAJ |
| description | With the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and limited information delivery methods. To address these issues, a VELM was proposed. VELM was trained on multimodal public educational datasets and specialized educational datasets, and combined with model optimization techniques, VELM not only enhances response quality in educational scenarios but also optimizes and reduces computational resource consumption. Additionally, RAG technology was utilized to ensure accuracy and richness in generated content. In terms of deployment and application, VELM was implemented through the Dify platform, enabling flexible multi-end deployment, including WeChat mini programs, Web cloud platforms, and localized deployment, meeting the diverse needs of different educational scenarios. Evaluation experiments demonstrated that VELM significantly outperformed open-source large models such as MiniCPM-V, DeepSeek-VL, and Yi-VL on standard benchmark datasets like Mathvista, OCRBench, and MMMU. On specialized educational evaluation datasets, the accuracy of VELM was improved by 9.78% compared to the base model Qwen2-VL. |
| format | Article |
| id | doaj-art-0cc1e0771bbd4e568f2ccf18efddac29 |
| institution | OA Journals |
| issn | 2096-0271 |
| language | zho |
| publishDate | 2025-01-01 |
| publisher | China InfoCom Media Group |
| record_format | Article |
| series | 大数据 |
| spelling | doaj-art-0cc1e0771bbd4e568f2ccf18efddac292025-08-20T02:35:01ZzhoChina InfoCom Media Group大数据2096-02712025-01-01119111999691Optimization and application of vision-based large models in educational scenariosXU YuepengXU ChaidiGUO JinjunJIANG YunqiaoWANG ShijiaLIU YaoWith the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and limited information delivery methods. To address these issues, a VELM was proposed. VELM was trained on multimodal public educational datasets and specialized educational datasets, and combined with model optimization techniques, VELM not only enhances response quality in educational scenarios but also optimizes and reduces computational resource consumption. Additionally, RAG technology was utilized to ensure accuracy and richness in generated content. In terms of deployment and application, VELM was implemented through the Dify platform, enabling flexible multi-end deployment, including WeChat mini programs, Web cloud platforms, and localized deployment, meeting the diverse needs of different educational scenarios. Evaluation experiments demonstrated that VELM significantly outperformed open-source large models such as MiniCPM-V, DeepSeek-VL, and Yi-VL on standard benchmark datasets like Mathvista, OCRBench, and MMMU. On specialized educational evaluation datasets, the accuracy of VELM was improved by 9.78% compared to the base model Qwen2-VL.http://www.j-bigdataresearch.com.cn/zh/article/111999691/large language modelmultimodalsmart educationRAG technology |
| spellingShingle | XU Yuepeng XU Chaidi GUO Jinjun JIANG Yunqiao WANG Shijia LIU Yao Optimization and application of vision-based large models in educational scenarios 大数据 large language model multimodal smart education RAG technology |
| title | Optimization and application of vision-based large models in educational scenarios |
| title_full | Optimization and application of vision-based large models in educational scenarios |
| title_fullStr | Optimization and application of vision-based large models in educational scenarios |
| title_full_unstemmed | Optimization and application of vision-based large models in educational scenarios |
| title_short | Optimization and application of vision-based large models in educational scenarios |
| title_sort | optimization and application of vision based large models in educational scenarios |
| topic | large language model multimodal smart education RAG technology |
| url | http://www.j-bigdataresearch.com.cn/zh/article/111999691/ |
| work_keys_str_mv | AT xuyuepeng optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios AT xuchaidi optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios AT guojinjun optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios AT jiangyunqiao optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios AT wangshijia optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios AT liuyao optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios |