Ensemble-Based Uncertainty Quantification for Reliable Large Language Model Classification in Social Data Applications
Assessing classification confidence is essential for effectively leveraging Large Language Models (LLMs) in automated data labeling, particularly within the sensitive contexts of Computational Social Science (CSS) tasks. In this study, we evaluate five uncertainty quantification (UQ) strategies acro...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11069263/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Assessing classification confidence is essential for effectively leveraging Large Language Models (LLMs) in automated data labeling, particularly within the sensitive contexts of Computational Social Science (CSS) tasks. In this study, we evaluate five uncertainty quantification (UQ) strategies across three CSS classification problems: stance detection, ideology identification, and frame detection. We benchmark these strategies using three different LLMs. To enhance human-in-the-loop classification performance, we introduce an ensemble-based UQ aggregation method, C_ensemble, and propose a novel evaluation metric, Misclassified Recall, designed to better assess model uncertainty on mislabeled or ambiguous data points. Our results show that C_ensemble outperforms existing UQ techniques in six out of nine model-dataset combinations, achieving an average AUC improvement of 8.7%. These findings highlight the potential of UQ-driven methods to significantly improve the reliability and efficiency of human-in-the-loop data annotation pipelines. |
|---|---|
| ISSN: | 2169-3536 |