Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach

I examine the abilities of large language models (LLMs) to accurately classify topics related to immigration from Spanish-language newspaper articles. I benchmark various LLMs (ChatGPT and Claude) and undergraduate coders with my own codings. I prompt models to label articles with either an 8 label...

Full description

Saved in:

Bibliographic Details
Main Author:	Alexander Tripp
Format:	Article
Language:	English
Published:	SAGE Publishing 2025-04-01
Series:	Research & Politics
Online Access:	https://doi.org/10.1177/20531680251332353
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850172784555065344
author	Alexander Tripp
author_facet	Alexander Tripp
author_sort	Alexander Tripp
collection	DOAJ
description	I examine the abilities of large language models (LLMs) to accurately classify topics related to immigration from Spanish-language newspaper articles. I benchmark various LLMs (ChatGPT and Claude) and undergraduate coders with my own codings. I prompt models to label articles with either an 8 label scheme—directly analogous to the assignment of the undergraduate coders—or a 4 label scheme—aggregating the 8 labels into broader themes. In my analyses, a Few Shot ChatGPT 4o model with 8 labels emerges as the most reliable LLM classifier and comes close to the undergraduate coders, with models using 8 labels generally outperforming their 4 label counterparts. I also find that LLMs tend toward false positive errors. This application provides practical methodological guidance for applied researchers using LLMs in data coding. Overall, I demonstrate how LLMs can be effective supplements to human coders, as well as the continued value of human coding as a benchmark for text classification.
format	Article
id	doaj-art-9c0b529d8fa44421bc3b3b5a63dec1a3
institution	OA Journals
issn	2053-1680
language	English
publishDate	2025-04-01
publisher	SAGE Publishing
record_format	Article
series	Research & Politics
spelling	doaj-art-9c0b529d8fa44421bc3b3b5a63dec1a32025-08-20T02:19:58ZengSAGE PublishingResearch & Politics2053-16802025-04-011210.1177/20531680251332353Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approachAlexander TrippI examine the abilities of large language models (LLMs) to accurately classify topics related to immigration from Spanish-language newspaper articles. I benchmark various LLMs (ChatGPT and Claude) and undergraduate coders with my own codings. I prompt models to label articles with either an 8 label scheme—directly analogous to the assignment of the undergraduate coders—or a 4 label scheme—aggregating the 8 labels into broader themes. In my analyses, a Few Shot ChatGPT 4o model with 8 labels emerges as the most reliable LLM classifier and comes close to the undergraduate coders, with models using 8 labels generally outperforming their 4 label counterparts. I also find that LLMs tend toward false positive errors. This application provides practical methodological guidance for applied researchers using LLMs in data coding. Overall, I demonstrate how LLMs can be effective supplements to human coders, as well as the continued value of human coding as a benchmark for text classification.https://doi.org/10.1177/20531680251332353
spellingShingle	Alexander Tripp Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach Research & Politics
title	Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach
title_full	Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach
title_fullStr	Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach
title_full_unstemmed	Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach
title_short	Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach
title_sort	benchmarking ai and human text classifications in the context of newspaper frames a multi label llm classification approach
url	https://doi.org/10.1177/20531680251332353
work_keys_str_mv	AT alexandertripp benchmarkingaiandhumantextclassificationsinthecontextofnewspaperframesamultilabelllmclassificationapproach

Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach

Similar Items