Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study
BackgroundInflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence in the general population. Internet-based communities have become vital for communication among patients with IBD, especially throughout the COVID-19 pandemic. However, t...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-07-01
|
| Series: | Journal of Medical Internet Research |
| Online Access: | https://www.jmir.org/2025/1/e53332 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849423142310641664 |
|---|---|
| author | Tyler Babinski Sara Karley Marita Cooper Salma Shaik Y Ken Wang |
| author_facet | Tyler Babinski Sara Karley Marita Cooper Salma Shaik Y Ken Wang |
| author_sort | Tyler Babinski |
| collection | DOAJ |
| description |
BackgroundInflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence in the general population. Internet-based communities have become vital for communication among patients with IBD, especially throughout the COVID-19 pandemic. However, these internet-based patient-to-patient communications remain largely underexplored.
ObjectiveThis study aims to analyze community posts from 3 of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pretrained transformer model, and to validate the classification system’s results via comparison to human scoring.
MethodsWe collected posts (N=53,333) from subreddits r/CrohnsDisease, r/UlcerativeColitis, and r/IBD and classified them using OpenAI’s GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and mentions of the COVID-19 pandemic. A subset of posts (n=397) was manually scored to measure interrater agreement between human raters and the GPT-3.5 Turbo model.
ResultsFleiss κ and Gwet AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. The raters demonstrated almost perfect agreement on the classification of gender, with a Fleiss κ of 0.91 (P<.001). Medications (14,909/53,333) and symptoms (14,939/53,333) emerged as the most discussed topics, and most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily belonged to the 20-29 years (2392/4828) and 30-39 years (859/4828) age groups. Based on self-reported gender, we identified 1509 men and 1502 women among our IBD Reddit users. When comparing the users on the IBD subreddits to the general IBD population, there was a significant difference in gender distribution (N=3,090,011; χ22=69.53; P<.001; φ<0.001). After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic.
ConclusionsOur study showcases the potential of generative pretrained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further subanalyses of our validated dataset or use OpenAI’s model to analyze social media data for other conditions, particularly those for which patient experiences are challenging to collect. |
| format | Article |
| id | doaj-art-c28c0afd354847f4b00843bd3bfd84ee |
| institution | Kabale University |
| issn | 1438-8871 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | Journal of Medical Internet Research |
| spelling | doaj-art-c28c0afd354847f4b00843bd3bfd84ee2025-08-20T03:30:45ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-07-0127e5333210.2196/53332Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case StudyTyler Babinskihttps://orcid.org/0009-0009-7458-7186Sara Karleyhttps://orcid.org/0009-0004-9823-5592Marita Cooperhttps://orcid.org/0000-0002-3822-5809Salma Shaikhttps://orcid.org/0000-0001-6416-4598Y Ken Wanghttps://orcid.org/0000-0003-2829-3776 BackgroundInflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence in the general population. Internet-based communities have become vital for communication among patients with IBD, especially throughout the COVID-19 pandemic. However, these internet-based patient-to-patient communications remain largely underexplored. ObjectiveThis study aims to analyze community posts from 3 of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pretrained transformer model, and to validate the classification system’s results via comparison to human scoring. MethodsWe collected posts (N=53,333) from subreddits r/CrohnsDisease, r/UlcerativeColitis, and r/IBD and classified them using OpenAI’s GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and mentions of the COVID-19 pandemic. A subset of posts (n=397) was manually scored to measure interrater agreement between human raters and the GPT-3.5 Turbo model. ResultsFleiss κ and Gwet AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. The raters demonstrated almost perfect agreement on the classification of gender, with a Fleiss κ of 0.91 (P<.001). Medications (14,909/53,333) and symptoms (14,939/53,333) emerged as the most discussed topics, and most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily belonged to the 20-29 years (2392/4828) and 30-39 years (859/4828) age groups. Based on self-reported gender, we identified 1509 men and 1502 women among our IBD Reddit users. When comparing the users on the IBD subreddits to the general IBD population, there was a significant difference in gender distribution (N=3,090,011; χ22=69.53; P<.001; φ<0.001). After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic. ConclusionsOur study showcases the potential of generative pretrained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further subanalyses of our validated dataset or use OpenAI’s model to analyze social media data for other conditions, particularly those for which patient experiences are challenging to collect.https://www.jmir.org/2025/1/e53332 |
| spellingShingle | Tyler Babinski Sara Karley Marita Cooper Salma Shaik Y Ken Wang Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study Journal of Medical Internet Research |
| title | Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study |
| title_full | Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study |
| title_fullStr | Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study |
| title_full_unstemmed | Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study |
| title_short | Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study |
| title_sort | exploring inflammatory bowel disease discourse on reddit throughout the covid 19 pandemic using openai s gpt 3 5 turbo model classification model validation and case study |
| url | https://www.jmir.org/2025/1/e53332 |
| work_keys_str_mv | AT tylerbabinski exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy AT sarakarley exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy AT maritacooper exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy AT salmashaik exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy AT ykenwang exploringinflammatoryboweldiseasediscourseonredditthroughoutthecovid19pandemicusingopenaisgpt35turbomodelclassificationmodelvalidationandcasestudy |