Ensemble-Based Uncertainty Quantification for Reliable Large Language Model Classification in Social Data Applications

Assessing classification confidence is essential for effectively leveraging Large Language Models (LLMs) in automated data labeling, particularly within the sensitive contexts of Computational Social Science (CSS) tasks. In this study, we evaluate five uncertainty quantification (UQ) strategies acro...

Full description

Saved in:

Bibliographic Details
Main Authors:	David T. Farr, Lynnette Hui Xian Ng, Iain J. Cruickshank, Nico Manzonelli, Nicholas Clark, Kate Starbird, Nathaniel D. Bastian, Jevin West
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Data classification large language models uncertainty quantification zero-shot classification
Online Access:	https://ieeexplore.ieee.org/document/11069263/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Assessing classification confidence is essential for effectively leveraging Large Language Models (LLMs) in automated data labeling, particularly within the sensitive contexts of Computational Social Science (CSS) tasks. In this study, we evaluate five uncertainty quantification (UQ) strategies across three CSS classification problems: stance detection, ideology identification, and frame detection. We benchmark these strategies using three different LLMs. To enhance human-in-the-loop classification performance, we introduce an ensemble-based UQ aggregation method, C_ensemble, and propose a novel evaluation metric, Misclassified Recall, designed to better assess model uncertainty on mislabeled or ambiguous data points. Our results show that C_ensemble outperforms existing UQ techniques in six out of nine model-dataset combinations, achieving an average AUC improvement of 8.7%. These findings highlight the potential of UQ-driven methods to significantly improve the reliability and efficiency of human-in-the-loop data annotation pipelines.
ISSN:	2169-3536

Ensemble-Based Uncertainty Quantification for Reliable Large Language Model Classification in Social Data Applications

Similar Items