Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study

BackgroundInflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence in the general population. Internet-based communities have become vital for communication among patients with IBD, especially throughout the COVID-19 pandemic. However, t...

Full description

Saved in:
Bibliographic Details
Main Authors: Tyler Babinski, Sara Karley, Marita Cooper, Salma Shaik, Y Ken Wang
Format: Article
Language:English
Published: JMIR Publications 2025-07-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2025/1/e53332
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849423142310641664
author Tyler Babinski
Sara Karley
Marita Cooper
Salma Shaik
Y Ken Wang
author_facet Tyler Babinski
Sara Karley
Marita Cooper
Salma Shaik
Y Ken Wang
author_sort Tyler Babinski
collection DOAJ
description BackgroundInflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence in the general population. Internet-based communities have become vital for communication among patients with IBD, especially throughout the COVID-19 pandemic. However, these internet-based patient-to-patient communications remain largely underexplored. ObjectiveThis study aims to analyze community posts from 3 of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pretrained transformer model, and to validate the classification system’s results via comparison to human scoring. MethodsWe collected posts (N=53,333) from subreddits r/CrohnsDisease, r/UlcerativeColitis, and r/IBD and classified them using OpenAI’s GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and mentions of the COVID-19 pandemic. A subset of posts (n=397) was manually scored to measure interrater agreement between human raters and the GPT-3.5 Turbo model. ResultsFleiss κ and Gwet AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. The raters demonstrated almost perfect agreement on the classification of gender, with a Fleiss κ of 0.91 (P<.001). Medications (14,909/53,333) and symptoms (14,939/53,333) emerged as the most discussed topics, and most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily belonged to the 20-29 years (2392/4828) and 30-39 years (859/4828) age groups. Based on self-reported gender, we identified 1509 men and 1502 women among our IBD Reddit users. When comparing the users on the IBD subreddits to the general IBD population, there was a significant difference in gender distribution (N=3,090,011; χ22=69.53; P<.001; φ<0.001). After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic. ConclusionsOur study showcases the potential of generative pretrained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further subanalyses of our validated dataset or use OpenAI’s model to analyze social media data for other conditions, particularly those for which patient experiences are challenging to collect.
format Article
id doaj-art-c28c0afd354847f4b00843bd3bfd84ee
institution Kabale University
issn 1438-8871
language English
publishDate 2025-07-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-c28c0afd354847f4b00843bd3bfd84ee2025-08-20T03:30:45ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-07-0127e5333210.2196/53332Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case StudyTyler Babinskihttps://orcid.org/0009-0009-7458-7186Sara Karleyhttps://orcid.org/0009-0004-9823-5592Marita Cooperhttps://orcid.org/0000-0002-3822-5809Salma Shaikhttps://orcid.org/0000-0001-6416-4598Y Ken Wanghttps://orcid.org/0000-0003-2829-3776 BackgroundInflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence in the general population. Internet-based communities have become vital for communication among patients with IBD, especially throughout the COVID-19 pandemic. However, these internet-based patient-to-patient communications remain largely underexplored. ObjectiveThis study aims to analyze community posts from 3 of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pretrained transformer model, and to validate the classification system’s results via comparison to human scoring. MethodsWe collected posts (N=53,333) from subreddits r/CrohnsDisease, r/UlcerativeColitis, and r/IBD and classified them using OpenAI’s GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and mentions of the COVID-19 pandemic. A subset of posts (n=397) was manually scored to measure interrater agreement between human raters and the GPT-3.5 Turbo model. ResultsFleiss κ and Gwet AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. The raters demonstrated almost perfect agreement on the classification of gender, with a Fleiss κ of 0.91 (P<.001). Medications (14,909/53,333) and symptoms (14,939/53,333) emerged as the most discussed topics, and most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily belonged to the 20-29 years (2392/4828) and 30-39 years (859/4828) age groups. Based on self-reported gender, we identified 1509 men and 1502 women among our IBD Reddit users. When comparing the users on the IBD subreddits to the general IBD population, there was a significant difference in gender distribution (N=3,090,011; χ22=69.53; P<.001; φ<0.001). After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic. ConclusionsOur study showcases the potential of generative pretrained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further subanalyses of our validated dataset or use OpenAI’s model to analyze social media data for other conditions, particularly those for which patient experiences are challenging to collect.https://www.jmir.org/2025/1/e53332
spellingShingle Tyler Babinski
Sara Karley
Marita Cooper
Salma Shaik
Y Ken Wang
Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study
Journal of Medical Internet Research
title Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study
title_full Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study
title_fullStr Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study
title_full_unstemmed Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study
title_short Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study
title_sort exploring inflammatory bowel disease discourse on reddit throughout the covid 19 pandemic using openai s gpt 3 5 turbo model classification model validation and case study
url https://www.jmir.org/2025/1/e53332
work_keys_str_mv AT tylerbabinski exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy
AT sarakarley exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy
AT maritacooper exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy
AT salmashaik exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy
AT ykenwang exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy