Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models

With the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem...

Full description

Saved in:

Bibliographic Details
Main Author:	HE Jing, SHEN Yang, XIE Runfeng
Format:	Article
Language:	zho
Published:	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2025-05-01
Series:	Jisuanji kexue yu tansuo
Subjects:	large language model; hallucination recognition; hallucination classification; model optimization
Online Access:	http://fcst.ceaj.org/fileup/1673-9418/PDF/2408080.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850193761252933632
author	HE Jing, SHEN Yang, XIE Runfeng
author_facet	HE Jing, SHEN Yang, XIE Runfeng
author_sort	HE Jing, SHEN Yang, XIE Runfeng
collection	DOAJ
description	With the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem in large language models, greatly restricts their practical application in these fields. At present, there are significant shortcomings in the evaluation and optimization of hallucination phenomena in large language models. Firstly, there is a lack of high-quality and high-precision domain hallucination evaluation datasets. Secondly, most of the existing hallucination assessment methods rely on a single model, which fails to take full advantage of the differences between multiple models. Finally, there are significant differences in the performance of different models in terms of hallucination types and rates, and there is currently no effective method to reduce the hallucination phenomenon in high hallucination rate models. This paper adopts a systematic process of dataset construction, swarm intelligence election, hallucination classification and quantification, and prior knowledge optimization to comprehensively evaluate and optimize the hallucination phenomenon of large language models in the field of medical question answering. Firstly, based on the publicly available dataset Huatuo, a large model illusion evaluation dataset in the medical question answering field is constructed by combining GPT generated question answers and manual annotation. Secondly, advanced big language models such as GPT4o, GPT4, ChatGLM4, Baichuan-13B, and Claude 3.5 are used to generate answers to questions in the dataset. By using a swarm intelligence based method, a LeaderAI is elected, which compares the answers of each model with reference answers to determine the illusion rate of each model. Finally, hallucinations are further divided into two categories: factual hallucinations and fidelity hallucinations. The research results indicate that under the guidance of LeaderAI, the illusion rate of the evaluated large models significantly decreases, especially the fidelity illusion rate.
format	Article
id	doaj-art-87dc648b880947239c19cd2719f19fa5
institution	OA Journals
issn	1673-9418
language	zho
publishDate	2025-05-01
publisher	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format	Article
series	Jisuanji kexue yu tansuo
spelling	doaj-art-87dc648b880947239c19cd2719f19fa52025-08-20T02:14:10ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182025-05-011951295130110.3778/j.issn.1673-9418.2408080Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language ModelsHE Jing, SHEN Yang, XIE Runfeng01. Institute for Advanced Studies in Humanities and Social Sciences, Beihang University, Beijing 100191, China 2. School of Journalism and Communication, Tsinghua University, Beijing 100084, China 3. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaWith the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem in large language models, greatly restricts their practical application in these fields. At present, there are significant shortcomings in the evaluation and optimization of hallucination phenomena in large language models. Firstly, there is a lack of high-quality and high-precision domain hallucination evaluation datasets. Secondly, most of the existing hallucination assessment methods rely on a single model, which fails to take full advantage of the differences between multiple models. Finally, there are significant differences in the performance of different models in terms of hallucination types and rates, and there is currently no effective method to reduce the hallucination phenomenon in high hallucination rate models. This paper adopts a systematic process of dataset construction, swarm intelligence election, hallucination classification and quantification, and prior knowledge optimization to comprehensively evaluate and optimize the hallucination phenomenon of large language models in the field of medical question answering. Firstly, based on the publicly available dataset Huatuo, a large model illusion evaluation dataset in the medical question answering field is constructed by combining GPT generated question answers and manual annotation. Secondly, advanced big language models such as GPT4o, GPT4, ChatGLM4, Baichuan-13B, and Claude 3.5 are used to generate answers to questions in the dataset. By using a swarm intelligence based method, a LeaderAI is elected, which compares the answers of each model with reference answers to determine the illusion rate of each model. Finally, hallucinations are further divided into two categories: factual hallucinations and fidelity hallucinations. The research results indicate that under the guidance of LeaderAI, the illusion rate of the evaluated large models significantly decreases, especially the fidelity illusion rate.http://fcst.ceaj.org/fileup/1673-9418/PDF/2408080.pdflarge language model; hallucination recognition; hallucination classification; model optimization
spellingShingle	HE Jing, SHEN Yang, XIE Runfeng Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models Jisuanji kexue yu tansuo large language model; hallucination recognition; hallucination classification; model optimization
title	Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_full	Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_fullStr	Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_full_unstemmed	Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_short	Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_sort	research on categorical recognition and optimization of hallucination phenomenon in large language models
topic	large language model; hallucination recognition; hallucination classification; model optimization
url	http://fcst.ceaj.org/fileup/1673-9418/PDF/2408080.pdf
work_keys_str_mv	AT hejingshenyangxierunfeng researchoncategoricalrecognitionandoptimizationofhallucinationphenomenoninlargelanguagemodels

Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models

Similar Items