Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models

With the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem...

Full description

Saved in:
Bibliographic Details
Main Author: HE Jing, SHEN Yang, XIE Runfeng
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2025-05-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/fileup/1673-9418/PDF/2408080.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850193761252933632
author HE Jing, SHEN Yang, XIE Runfeng
author_facet HE Jing, SHEN Yang, XIE Runfeng
author_sort HE Jing, SHEN Yang, XIE Runfeng
collection DOAJ
description With the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem in large language models, greatly restricts their practical application in these fields. At present, there are significant shortcomings in the evaluation and optimization of hallucination phenomena in large language models. Firstly, there is a lack of high-quality and high-precision domain hallucination evaluation datasets. Secondly, most of the existing hallucination assessment methods rely on a single model, which fails to take full advantage of the differences between multiple models. Finally, there are significant differences in the performance of different models in terms of hallucination types and rates, and there is currently no effective method to reduce the hallucination phenomenon in high hallucination rate models. This paper adopts a systematic process of dataset construction, swarm intelligence election, hallucination classification and quantification, and prior knowledge optimization to comprehensively evaluate and optimize the hallucination phenomenon of large language models in the field of medical question answering. Firstly, based on the publicly available dataset Huatuo, a large model illusion evaluation dataset in the medical question answering field is constructed by combining GPT generated question answers and manual annotation. Secondly, advanced big language models such as GPT4o, GPT4, ChatGLM4, Baichuan-13B, and Claude 3.5 are used to generate answers to questions in the dataset. By using a swarm intelligence based method, a LeaderAI is elected, which compares the answers of each model with reference answers to determine the illusion rate of each model. Finally, hallucinations are further divided into two categories: factual hallucinations and fidelity hallucinations. The research results indicate that under the guidance of LeaderAI, the illusion rate of the evaluated large models significantly decreases, especially the fidelity illusion rate.
format Article
id doaj-art-87dc648b880947239c19cd2719f19fa5
institution OA Journals
issn 1673-9418
language zho
publishDate 2025-05-01
publisher Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format Article
series Jisuanji kexue yu tansuo
spelling doaj-art-87dc648b880947239c19cd2719f19fa52025-08-20T02:14:10ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182025-05-011951295130110.3778/j.issn.1673-9418.2408080Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language ModelsHE Jing, SHEN Yang, XIE Runfeng01. Institute for Advanced Studies in Humanities and Social Sciences, Beihang University, Beijing 100191, China 2. School of Journalism and Communication, Tsinghua University, Beijing 100084, China 3. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaWith the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem in large language models, greatly restricts their practical application in these fields. At present, there are significant shortcomings in the evaluation and optimization of hallucination phenomena in large language models. Firstly, there is a lack of high-quality and high-precision domain hallucination evaluation datasets. Secondly, most of the existing hallucination assessment methods rely on a single model, which fails to take full advantage of the differences between multiple models. Finally, there are significant differences in the performance of different models in terms of hallucination types and rates, and there is currently no effective method to reduce the hallucination phenomenon in high hallucination rate models. This paper adopts a systematic process of dataset construction, swarm intelligence election, hallucination classification and quantification, and prior knowledge optimization to comprehensively evaluate and optimize the hallucination phenomenon of large language models in the field of medical question answering. Firstly, based on the publicly available dataset Huatuo, a large model illusion evaluation dataset in the medical question answering field is constructed by combining GPT generated question answers and manual annotation. Secondly, advanced big language models such as GPT4o, GPT4, ChatGLM4, Baichuan-13B, and Claude 3.5 are used to generate answers to questions in the dataset. By using a swarm intelligence based method, a LeaderAI is elected, which compares the answers of each model with reference answers to determine the illusion rate of each model. Finally, hallucinations are further divided into two categories: factual hallucinations and fidelity hallucinations. The research results indicate that under the guidance of LeaderAI, the illusion rate of the evaluated large models significantly decreases, especially the fidelity illusion rate.http://fcst.ceaj.org/fileup/1673-9418/PDF/2408080.pdflarge language model; hallucination recognition; hallucination classification; model optimization
spellingShingle HE Jing, SHEN Yang, XIE Runfeng
Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
Jisuanji kexue yu tansuo
large language model; hallucination recognition; hallucination classification; model optimization
title Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_full Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_fullStr Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_full_unstemmed Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_short Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models
title_sort research on categorical recognition and optimization of hallucination phenomenon in large language models
topic large language model; hallucination recognition; hallucination classification; model optimization
url http://fcst.ceaj.org/fileup/1673-9418/PDF/2408080.pdf
work_keys_str_mv AT hejingshenyangxierunfeng researchoncategoricalrecognitionandoptimizationofhallucinationphenomenoninlargelanguagemodels