Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach
I examine the abilities of large language models (LLMs) to accurately classify topics related to immigration from Spanish-language newspaper articles. I benchmark various LLMs (ChatGPT and Claude) and undergraduate coders with my own codings. I prompt models to label articles with either an 8 label...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SAGE Publishing
2025-04-01
|
| Series: | Research & Politics |
| Online Access: | https://doi.org/10.1177/20531680251332353 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850172784555065344 |
|---|---|
| author | Alexander Tripp |
| author_facet | Alexander Tripp |
| author_sort | Alexander Tripp |
| collection | DOAJ |
| description | I examine the abilities of large language models (LLMs) to accurately classify topics related to immigration from Spanish-language newspaper articles. I benchmark various LLMs (ChatGPT and Claude) and undergraduate coders with my own codings. I prompt models to label articles with either an 8 label scheme—directly analogous to the assignment of the undergraduate coders—or a 4 label scheme—aggregating the 8 labels into broader themes. In my analyses, a Few Shot ChatGPT 4o model with 8 labels emerges as the most reliable LLM classifier and comes close to the undergraduate coders, with models using 8 labels generally outperforming their 4 label counterparts. I also find that LLMs tend toward false positive errors. This application provides practical methodological guidance for applied researchers using LLMs in data coding. Overall, I demonstrate how LLMs can be effective supplements to human coders, as well as the continued value of human coding as a benchmark for text classification. |
| format | Article |
| id | doaj-art-9c0b529d8fa44421bc3b3b5a63dec1a3 |
| institution | OA Journals |
| issn | 2053-1680 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | SAGE Publishing |
| record_format | Article |
| series | Research & Politics |
| spelling | doaj-art-9c0b529d8fa44421bc3b3b5a63dec1a32025-08-20T02:19:58ZengSAGE PublishingResearch & Politics2053-16802025-04-011210.1177/20531680251332353Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approachAlexander TrippI examine the abilities of large language models (LLMs) to accurately classify topics related to immigration from Spanish-language newspaper articles. I benchmark various LLMs (ChatGPT and Claude) and undergraduate coders with my own codings. I prompt models to label articles with either an 8 label scheme—directly analogous to the assignment of the undergraduate coders—or a 4 label scheme—aggregating the 8 labels into broader themes. In my analyses, a Few Shot ChatGPT 4o model with 8 labels emerges as the most reliable LLM classifier and comes close to the undergraduate coders, with models using 8 labels generally outperforming their 4 label counterparts. I also find that LLMs tend toward false positive errors. This application provides practical methodological guidance for applied researchers using LLMs in data coding. Overall, I demonstrate how LLMs can be effective supplements to human coders, as well as the continued value of human coding as a benchmark for text classification.https://doi.org/10.1177/20531680251332353 |
| spellingShingle | Alexander Tripp Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach Research & Politics |
| title | Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach |
| title_full | Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach |
| title_fullStr | Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach |
| title_full_unstemmed | Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach |
| title_short | Benchmarking AI and human text classifications in the context of newspaper frames: A multi-label LLM classification approach |
| title_sort | benchmarking ai and human text classifications in the context of newspaper frames a multi label llm classification approach |
| url | https://doi.org/10.1177/20531680251332353 |
| work_keys_str_mv | AT alexandertripp benchmarkingaiandhumantextclassificationsinthecontextofnewspaperframesamultilabelllmclassificationapproach |