Can large language models be used to code text for thematic analysis? An explorative study

Abstract In practice, thematic analysis of text involves six stages, among which text coding is particularly cognitively demanding, labor-intensive, and time-consuming. This study investigates and compares the potential of two large language models (LLMs), namely ChatGPT-4 and OpenAI o1-preview, to...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhiyong Han, Aaron Tavasi, JuYoung Lee, Joshua Luzuriaga, Kevin Suresh, Michael Oppenheim, Fortunato Battaglia, Stanley R. Terlecky
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00441-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332384343785472
author Zhiyong Han
Aaron Tavasi
JuYoung Lee
Joshua Luzuriaga
Kevin Suresh
Michael Oppenheim
Fortunato Battaglia
Stanley R. Terlecky
author_facet Zhiyong Han
Aaron Tavasi
JuYoung Lee
Joshua Luzuriaga
Kevin Suresh
Michael Oppenheim
Fortunato Battaglia
Stanley R. Terlecky
author_sort Zhiyong Han
collection DOAJ
description Abstract In practice, thematic analysis of text involves six stages, among which text coding is particularly cognitively demanding, labor-intensive, and time-consuming. This study investigates and compares the potential of two large language models (LLMs), namely ChatGPT-4 and OpenAI o1-preview, to perform text coding, with the goal of reducing the time and effort required by human researchers. Our results indicate that both models exhibit decreased coding comprehensiveness as document length increases, and both demonstrate low coding accuracy, primarily due to limitations in textual comprehension and reasoning. These findings highlight significant challenges in using LLMs to support thematic analysis, emphasizing the need for human oversight and rigorous validation to ensure analytic accuracy and validity.
format Article
id doaj-art-980eedb1c2254190bbcd8ea3a9464e72
institution Kabale University
issn 2731-0809
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj-art-980eedb1c2254190bbcd8ea3a9464e722025-08-20T03:46:12ZengSpringerDiscover Artificial Intelligence2731-08092025-07-015111710.1007/s44163-025-00441-3Can large language models be used to code text for thematic analysis? An explorative studyZhiyong Han0Aaron Tavasi1JuYoung Lee2Joshua Luzuriaga3Kevin Suresh4Michael Oppenheim5Fortunato Battaglia6Stanley R. Terlecky7Department of Medical Sciences, Hackensack Meridian School of MedicineDepartment of Medical Sciences, Hackensack Meridian School of MedicineDepartment of Medical Sciences, Hackensack Meridian School of MedicineDepartment of Medical Sciences, Hackensack Meridian School of MedicineDepartment of Medical Sciences, Hackensack Meridian School of MedicineDepartment of Medical Sciences, Hackensack Meridian School of MedicineDepartment of Medical Sciences, Hackensack Meridian School of MedicineDepartment of Medical Sciences, Hackensack Meridian School of MedicineAbstract In practice, thematic analysis of text involves six stages, among which text coding is particularly cognitively demanding, labor-intensive, and time-consuming. This study investigates and compares the potential of two large language models (LLMs), namely ChatGPT-4 and OpenAI o1-preview, to perform text coding, with the goal of reducing the time and effort required by human researchers. Our results indicate that both models exhibit decreased coding comprehensiveness as document length increases, and both demonstrate low coding accuracy, primarily due to limitations in textual comprehension and reasoning. These findings highlight significant challenges in using LLMs to support thematic analysis, emphasizing the need for human oversight and rigorous validation to ensure analytic accuracy and validity.https://doi.org/10.1007/s44163-025-00441-3ChatGPTOpenAI o1-previewText codingThematic analysisComprehensionReasoning
spellingShingle Zhiyong Han
Aaron Tavasi
JuYoung Lee
Joshua Luzuriaga
Kevin Suresh
Michael Oppenheim
Fortunato Battaglia
Stanley R. Terlecky
Can large language models be used to code text for thematic analysis? An explorative study
Discover Artificial Intelligence
ChatGPT
OpenAI o1-preview
Text coding
Thematic analysis
Comprehension
Reasoning
title Can large language models be used to code text for thematic analysis? An explorative study
title_full Can large language models be used to code text for thematic analysis? An explorative study
title_fullStr Can large language models be used to code text for thematic analysis? An explorative study
title_full_unstemmed Can large language models be used to code text for thematic analysis? An explorative study
title_short Can large language models be used to code text for thematic analysis? An explorative study
title_sort can large language models be used to code text for thematic analysis an explorative study
topic ChatGPT
OpenAI o1-preview
Text coding
Thematic analysis
Comprehension
Reasoning
url https://doi.org/10.1007/s44163-025-00441-3
work_keys_str_mv AT zhiyonghan canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy
AT aarontavasi canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy
AT juyounglee canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy
AT joshualuzuriaga canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy
AT kevinsuresh canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy
AT michaeloppenheim canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy
AT fortunatobattaglia canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy
AT stanleyrterlecky canlargelanguagemodelsbeusedtocodetextforthematicanalysisanexplorativestudy