Research on Categorical Recognition and Optimization of Hallucination Phenomenon in Large Language Models

With the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem...

Full description

Saved in:
Bibliographic Details
Main Author: HE Jing, SHEN Yang, XIE Runfeng
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2025-05-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/fileup/1673-9418/PDF/2408080.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the widespread application of big language models in natural language understanding and generation tasks, their performance in high-precision fields such as healthcare, law, and scientific research has received increasing attention. However, the phenomenon of hallucinations, as a common problem in large language models, greatly restricts their practical application in these fields. At present, there are significant shortcomings in the evaluation and optimization of hallucination phenomena in large language models. Firstly, there is a lack of high-quality and high-precision domain hallucination evaluation datasets. Secondly, most of the existing hallucination assessment methods rely on a single model, which fails to take full advantage of the differences between multiple models. Finally, there are significant differences in the performance of different models in terms of hallucination types and rates, and there is currently no effective method to reduce the hallucination phenomenon in high hallucination rate models. This paper adopts a systematic process of dataset construction, swarm intelligence election, hallucination classification and quantification, and prior knowledge optimization to comprehensively evaluate and optimize the hallucination phenomenon of large language models in the field of medical question answering. Firstly, based on the publicly available dataset Huatuo, a large model illusion evaluation dataset in the medical question answering field is constructed by combining GPT generated question answers and manual annotation. Secondly, advanced big language models such as GPT4o, GPT4, ChatGLM4, Baichuan-13B, and Claude 3.5 are used to generate answers to questions in the dataset. By using a swarm intelligence based method, a LeaderAI is elected, which compares the answers of each model with reference answers to determine the illusion rate of each model. Finally, hallucinations are further divided into two categories: factual hallucinations and fidelity hallucinations. The research results indicate that under the guidance of LeaderAI, the illusion rate of the evaluated large models significantly decreases, especially the fidelity illusion rate.
ISSN:1673-9418