Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences

BackgroundPatient experience data from social media offer patient-centered perspectives on disease, treatments, and health service delivery. Current guidelines typically rely on systematic reviews, while qualitative health studies are often seen as anecdotal and nongeneraliza...

Full description

Saved in:
Bibliographic Details
Main Authors: Julia Walsh, Jonathan Cave, Frances Griffiths
Format: Article
Language:English
Published: JMIR Publications 2024-12-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2024/1/e54321
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850061753648414720
author Julia Walsh
Jonathan Cave
Frances Griffiths
author_facet Julia Walsh
Jonathan Cave
Frances Griffiths
author_sort Julia Walsh
collection DOAJ
description BackgroundPatient experience data from social media offer patient-centered perspectives on disease, treatments, and health service delivery. Current guidelines typically rely on systematic reviews, while qualitative health studies are often seen as anecdotal and nongeneralizable. This study explores combining personal health experiences from multiple sources to create generalizable evidence. ObjectiveThe study aims to (1) investigate how combining unsupervised natural language processing (NLP) and corpus linguistics can explore patient perspectives from a large unstructured dataset of modafinil experiences, (2) compare findings with Cochrane meta-analyses on modafinil’s effectiveness, and (3) develop a methodology for analyzing such data. MethodsUsing 69,022 posts from 790 sources, we used a variety of NLP and corpus techniques to analyze the data, including data cleaning techniques to maximize post context, Python for NLP techniques, and Sketch Engine for linguistic analysis. We used multiple topic mining approaches, such as latent Dirichlet allocation, nonnegative matrix factorization, and word-embedding methods. Sentiment analysis used TextBlob and Valence Aware Dictionary and Sentiment Reasoner, while corpus methods including collocation, concordance, and n-gram generation. Previous work had mapped topic mining to themes, such as health conditions, reasons for taking modafinil, symptom impacts, dosage, side effects, effectiveness, and treatment comparisons. ResultsKey findings of the study included modafinil use across 166 health conditions, most frequently narcolepsy, multiple sclerosis, attention-deficit disorder, anxiety, sleep apnea, depression, bipolar disorder, chronic fatigue syndrome, fibromyalgia, and chronic disease. Word-embedding topic modeling mapped 70% of posts to predefined themes, while sentiment analysis revealed 65% positive responses, 6% neutral responses, and 28% negative responses. Notably, the perceived effectiveness of modafinil for various conditions strongly contrasts with the findings of existing randomized controlled trials and systematic reviews, which conclude insufficient or low-quality evidence of effectiveness. ConclusionsThis study demonstrated the value of combining NLP with linguistic techniques for analyzing large unstructured text datasets. Despite varying opinions, findings were methodologically consistent and challenged existing clinical evidence. This suggests that patient-generated data could potentially provide valuable insights into treatment outcomes, potentially improving clinical understanding and patient care.
format Article
id doaj-art-5b128e47779d480aa89b176550f93752
institution DOAJ
issn 1438-8871
language English
publishDate 2024-12-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-5b128e47779d480aa89b176550f937522025-08-20T02:50:08ZengJMIR PublicationsJournal of Medical Internet Research1438-88712024-12-0126e5432110.2196/54321Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil ExperiencesJulia Walshhttps://orcid.org/0000-0002-9787-0349Jonathan Cavehttps://orcid.org/0000-0002-9879-6507Frances Griffithshttps://orcid.org/0000-0002-4173-1438 BackgroundPatient experience data from social media offer patient-centered perspectives on disease, treatments, and health service delivery. Current guidelines typically rely on systematic reviews, while qualitative health studies are often seen as anecdotal and nongeneralizable. This study explores combining personal health experiences from multiple sources to create generalizable evidence. ObjectiveThe study aims to (1) investigate how combining unsupervised natural language processing (NLP) and corpus linguistics can explore patient perspectives from a large unstructured dataset of modafinil experiences, (2) compare findings with Cochrane meta-analyses on modafinil’s effectiveness, and (3) develop a methodology for analyzing such data. MethodsUsing 69,022 posts from 790 sources, we used a variety of NLP and corpus techniques to analyze the data, including data cleaning techniques to maximize post context, Python for NLP techniques, and Sketch Engine for linguistic analysis. We used multiple topic mining approaches, such as latent Dirichlet allocation, nonnegative matrix factorization, and word-embedding methods. Sentiment analysis used TextBlob and Valence Aware Dictionary and Sentiment Reasoner, while corpus methods including collocation, concordance, and n-gram generation. Previous work had mapped topic mining to themes, such as health conditions, reasons for taking modafinil, symptom impacts, dosage, side effects, effectiveness, and treatment comparisons. ResultsKey findings of the study included modafinil use across 166 health conditions, most frequently narcolepsy, multiple sclerosis, attention-deficit disorder, anxiety, sleep apnea, depression, bipolar disorder, chronic fatigue syndrome, fibromyalgia, and chronic disease. Word-embedding topic modeling mapped 70% of posts to predefined themes, while sentiment analysis revealed 65% positive responses, 6% neutral responses, and 28% negative responses. Notably, the perceived effectiveness of modafinil for various conditions strongly contrasts with the findings of existing randomized controlled trials and systematic reviews, which conclude insufficient or low-quality evidence of effectiveness. ConclusionsThis study demonstrated the value of combining NLP with linguistic techniques for analyzing large unstructured text datasets. Despite varying opinions, findings were methodologically consistent and challenged existing clinical evidence. This suggests that patient-generated data could potentially provide valuable insights into treatment outcomes, potentially improving clinical understanding and patient care.https://www.jmir.org/2024/1/e54321
spellingShingle Julia Walsh
Jonathan Cave
Frances Griffiths
Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences
Journal of Medical Internet Research
title Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences
title_full Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences
title_fullStr Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences
title_full_unstemmed Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences
title_short Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences
title_sort combining topic modeling sentiment analysis and corpus linguistics to analyze unstructured web based patient experience data case study of modafinil experiences
url https://www.jmir.org/2024/1/e54321
work_keys_str_mv AT juliawalsh combiningtopicmodelingsentimentanalysisandcorpuslinguisticstoanalyzeunstructuredwebbasedpatientexperiencedatacasestudyofmodafinilexperiences
AT jonathancave combiningtopicmodelingsentimentanalysisandcorpuslinguisticstoanalyzeunstructuredwebbasedpatientexperiencedatacasestudyofmodafinilexperiences
AT francesgriffiths combiningtopicmodelingsentimentanalysisandcorpuslinguisticstoanalyzeunstructuredwebbasedpatientexperiencedatacasestudyofmodafinilexperiences