T2F: a domain-agnostic multi-agent framework for unstructured text to factuality evaluation items generation

Abstract Large language models (LLMs) demonstrate exceptional linguistic capabilities in text generation but remain prone to factual errors, particularly in specialized domains. Traditional factuality evaluation methods primarily rely on human annotation, which is costly, time-consuming, and difficu...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Tong, Jingya Wang, Yasen Aizezi, Hanming Zhai, Bo Jin
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00294-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Large language models (LLMs) demonstrate exceptional linguistic capabilities in text generation but remain prone to factual errors, particularly in specialized domains. Traditional factuality evaluation methods primarily rely on human annotation, which is costly, time-consuming, and difficult to generalize across different domains. To address these limitations, this study proposes an innovative multi-agent framework-T2F (Text-to-Factuality)-designed to automatically convert unstructured text into high-quality factuality evaluation datasets. T2F operates through the coordinated efforts of four specialized agents: Analysis, Information Extraction, Generation, and Validation. By systematically processing input data, T2F autonomously generates multiple types of assessment items-including single-choice questions, fill-in-the-blank questions, and true/false statements-without requiring human annotation, while maintaining strong cross-domain applicability. Experimental results demonstrate that T2F achieves data conversion success rates of 99% in the World Heritage domain, 98% in the Medical domain, and 85% in the Film domain. The generated data effectively assess LLMs’ factuality accuracy, highlighting T2F’s capability as a scalable and domain-agnostic factuality evaluation framework.
ISSN:2731-0809