Optimization and application of vision-based large models in educational scenarios

With the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and l...

Full description

Saved in:
Bibliographic Details
Main Authors: XU Yuepeng, XU Chaidi, GUO Jinjun, JIANG Yunqiao, WANG Shijia, LIU Yao
Format: Article
Language:zho
Published: China InfoCom Media Group 2025-01-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/zh/article/111999691/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850121700068294656
author XU Yuepeng
XU Chaidi
GUO Jinjun
JIANG Yunqiao
WANG Shijia
LIU Yao
author_facet XU Yuepeng
XU Chaidi
GUO Jinjun
JIANG Yunqiao
WANG Shijia
LIU Yao
author_sort XU Yuepeng
collection DOAJ
description With the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and limited information delivery methods. To address these issues, a VELM was proposed. VELM was trained on multimodal public educational datasets and specialized educational datasets, and combined with model optimization techniques, VELM not only enhances response quality in educational scenarios but also optimizes and reduces computational resource consumption. Additionally, RAG technology was utilized to ensure accuracy and richness in generated content. In terms of deployment and application, VELM was implemented through the Dify platform, enabling flexible multi-end deployment, including WeChat mini programs, Web cloud platforms, and localized deployment, meeting the diverse needs of different educational scenarios. Evaluation experiments demonstrated that VELM significantly outperformed open-source large models such as MiniCPM-V, DeepSeek-VL, and Yi-VL on standard benchmark datasets like Mathvista, OCRBench, and MMMU. On specialized educational evaluation datasets, the accuracy of VELM was improved by 9.78% compared to the base model Qwen2-VL.
format Article
id doaj-art-0cc1e0771bbd4e568f2ccf18efddac29
institution OA Journals
issn 2096-0271
language zho
publishDate 2025-01-01
publisher China InfoCom Media Group
record_format Article
series 大数据
spelling doaj-art-0cc1e0771bbd4e568f2ccf18efddac292025-08-20T02:35:01ZzhoChina InfoCom Media Group大数据2096-02712025-01-01119111999691Optimization and application of vision-based large models in educational scenariosXU YuepengXU ChaidiGUO JinjunJIANG YunqiaoWANG ShijiaLIU YaoWith the rapid advancement of artificial intelligence technology, LLMs have achieved significant success across various fields. However, their application in the field of education domain still faces challenges such as difficulties in processing multimodal data, insufficient response accuracy, and limited information delivery methods. To address these issues, a VELM was proposed. VELM was trained on multimodal public educational datasets and specialized educational datasets, and combined with model optimization techniques, VELM not only enhances response quality in educational scenarios but also optimizes and reduces computational resource consumption. Additionally, RAG technology was utilized to ensure accuracy and richness in generated content. In terms of deployment and application, VELM was implemented through the Dify platform, enabling flexible multi-end deployment, including WeChat mini programs, Web cloud platforms, and localized deployment, meeting the diverse needs of different educational scenarios. Evaluation experiments demonstrated that VELM significantly outperformed open-source large models such as MiniCPM-V, DeepSeek-VL, and Yi-VL on standard benchmark datasets like Mathvista, OCRBench, and MMMU. On specialized educational evaluation datasets, the accuracy of VELM was improved by 9.78% compared to the base model Qwen2-VL.http://www.j-bigdataresearch.com.cn/zh/article/111999691/large language modelmultimodalsmart educationRAG technology
spellingShingle XU Yuepeng
XU Chaidi
GUO Jinjun
JIANG Yunqiao
WANG Shijia
LIU Yao
Optimization and application of vision-based large models in educational scenarios
大数据
large language model
multimodal
smart education
RAG technology
title Optimization and application of vision-based large models in educational scenarios
title_full Optimization and application of vision-based large models in educational scenarios
title_fullStr Optimization and application of vision-based large models in educational scenarios
title_full_unstemmed Optimization and application of vision-based large models in educational scenarios
title_short Optimization and application of vision-based large models in educational scenarios
title_sort optimization and application of vision based large models in educational scenarios
topic large language model
multimodal
smart education
RAG technology
url http://www.j-bigdataresearch.com.cn/zh/article/111999691/
work_keys_str_mv AT xuyuepeng optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios
AT xuchaidi optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios
AT guojinjun optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios
AT jiangyunqiao optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios
AT wangshijia optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios
AT liuyao optimizationandapplicationofvisionbasedlargemodelsineducationalscenarios