Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network Analysis

BackgroundThe opioid epidemic in the United States remains a major public health concern, with opioid-related deaths increasing more than 8-fold since 1999. Chronic pain, affecting 1 in 5 US adults, is a key contributor to opioid use and misuse. While previous research has ex...

Full description

Saved in:
Bibliographic Details
Main Authors: ShinYe Kim, Winson Fu Zun Yang, Zishan Jiwani, Emily Hamm, Shreya Singh
Format: Article
Language:English
Published: JMIR Publications 2025-05-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2025/1/e67506
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850278743086465024
author ShinYe Kim
Winson Fu Zun Yang
Zishan Jiwani
Emily Hamm
Shreya Singh
author_facet ShinYe Kim
Winson Fu Zun Yang
Zishan Jiwani
Emily Hamm
Shreya Singh
author_sort ShinYe Kim
collection DOAJ
description BackgroundThe opioid epidemic in the United States remains a major public health concern, with opioid-related deaths increasing more than 8-fold since 1999. Chronic pain, affecting 1 in 5 US adults, is a key contributor to opioid use and misuse. While previous research has explored clinical and behavioral predictors of opioid risk, less attention has been given to large-scale linguistic patterns in public discussions of pain. Social media platforms such as X (formerly Twitter) offer real-time, population-level insights into how individuals express pain, distress, and coping strategies. Understanding these linguistic markers matters because they can reveal underlying psychological states, perceptions of health care access, and community-level opioid risk factors, offering new opportunities for early detection and targeted public health response. ObjectiveThis study aimed to examine linguistic markers of pain communication on the social media platform X and assess whether language patterns differ among US states with high and low opioid mortality rates. We also evaluated the predictive power of these linguistic features using machine learning and identified key thematic structures through semantic network analysis. MethodsWe collected 1,438,644 pain-related tweets posted between January and December 2021 using tweepy and snscrape. Tweets from 2 high-opioid mortality states (Ohio and Florida) and 2 low opioid mortality states (South and North Dakota) were selected, resulting in 31,994 tweets from high-death states (HDS) and 750 tweets from low-death states (LDS). Six machine learning algorithms (random forest, k-nearest neighbor, decision tree, naive Bayes, logistic regression, and support vector machine) were applied to predict state-level opioid mortality risk based on linguistic features derived from Linguistic Inquiry and Word Count. Synthetic Minority Oversampling Technique was used to address class imbalance. Semantic network analysis was conducted to visualize co-occurrence patterns and conceptual clustering. ResultsThe random forest model demonstrated the strongest predictive performance, with an accuracy of 94.69%, balanced accuracy of 94.69%, κ of 0.89, and an area under the curve of 0.95 (P<.001). Tweets from HDS contained significantly more affective pain words (t31,992=10.84; P<.001; Cohen d=0.12), health care access references, and expressions of distress. LDS tweets showed greater use of authenticity markers (t31,992=−10.04; P<.001) and proactive health-seeking language. Semantic network analysis revealed denser discourse in HDS (density=0.28) focused on distress and barriers to care, while LDS discourse emphasized recovery and optimism. ConclusionsOur findings demonstrated that linguistic markers in publicly shared pain-related discourse show distinct and predictable differences across regions with varying opioid mortality risks. These linguistic patterns reflect underlying psychological, social, and structural factors that contribute to opioid vulnerability. Importantly, they offer a scalable, real-time resource for identifying at-risk communities. Harnessing social media language analytics can strengthen early detection systems, guide geographically targeted public health messaging, and inform policy efforts aimed at reducing opioid-related harm and improving pain management equity.
format Article
id doaj-art-983c9494ee8140e4a4efdc1e666a9bc2
institution OA Journals
issn 1438-8871
language English
publishDate 2025-05-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-983c9494ee8140e4a4efdc1e666a9bc22025-08-20T01:49:22ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-05-0127e6750610.2196/67506Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network AnalysisShinYe Kimhttps://orcid.org/0000-0002-2323-5692Winson Fu Zun Yanghttps://orcid.org/0000-0001-6208-0067Zishan Jiwanihttps://orcid.org/0000-0001-9054-8982Emily Hammhttps://orcid.org/0000-0002-2954-6694Shreya Singhhttps://orcid.org/0000-0002-2392-2724 BackgroundThe opioid epidemic in the United States remains a major public health concern, with opioid-related deaths increasing more than 8-fold since 1999. Chronic pain, affecting 1 in 5 US adults, is a key contributor to opioid use and misuse. While previous research has explored clinical and behavioral predictors of opioid risk, less attention has been given to large-scale linguistic patterns in public discussions of pain. Social media platforms such as X (formerly Twitter) offer real-time, population-level insights into how individuals express pain, distress, and coping strategies. Understanding these linguistic markers matters because they can reveal underlying psychological states, perceptions of health care access, and community-level opioid risk factors, offering new opportunities for early detection and targeted public health response. ObjectiveThis study aimed to examine linguistic markers of pain communication on the social media platform X and assess whether language patterns differ among US states with high and low opioid mortality rates. We also evaluated the predictive power of these linguistic features using machine learning and identified key thematic structures through semantic network analysis. MethodsWe collected 1,438,644 pain-related tweets posted between January and December 2021 using tweepy and snscrape. Tweets from 2 high-opioid mortality states (Ohio and Florida) and 2 low opioid mortality states (South and North Dakota) were selected, resulting in 31,994 tweets from high-death states (HDS) and 750 tweets from low-death states (LDS). Six machine learning algorithms (random forest, k-nearest neighbor, decision tree, naive Bayes, logistic regression, and support vector machine) were applied to predict state-level opioid mortality risk based on linguistic features derived from Linguistic Inquiry and Word Count. Synthetic Minority Oversampling Technique was used to address class imbalance. Semantic network analysis was conducted to visualize co-occurrence patterns and conceptual clustering. ResultsThe random forest model demonstrated the strongest predictive performance, with an accuracy of 94.69%, balanced accuracy of 94.69%, κ of 0.89, and an area under the curve of 0.95 (P<.001). Tweets from HDS contained significantly more affective pain words (t31,992=10.84; P<.001; Cohen d=0.12), health care access references, and expressions of distress. LDS tweets showed greater use of authenticity markers (t31,992=−10.04; P<.001) and proactive health-seeking language. Semantic network analysis revealed denser discourse in HDS (density=0.28) focused on distress and barriers to care, while LDS discourse emphasized recovery and optimism. ConclusionsOur findings demonstrated that linguistic markers in publicly shared pain-related discourse show distinct and predictable differences across regions with varying opioid mortality risks. These linguistic patterns reflect underlying psychological, social, and structural factors that contribute to opioid vulnerability. Importantly, they offer a scalable, real-time resource for identifying at-risk communities. Harnessing social media language analytics can strengthen early detection systems, guide geographically targeted public health messaging, and inform policy efforts aimed at reducing opioid-related harm and improving pain management equity.https://www.jmir.org/2025/1/e67506
spellingShingle ShinYe Kim
Winson Fu Zun Yang
Zishan Jiwani
Emily Hamm
Shreya Singh
Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network Analysis
Journal of Medical Internet Research
title Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network Analysis
title_full Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network Analysis
title_fullStr Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network Analysis
title_full_unstemmed Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network Analysis
title_short Linguistic Markers of Pain Communication on X (Formerly Twitter) in US States With High and Low Opioid Mortality: Machine Learning and Semantic Network Analysis
title_sort linguistic markers of pain communication on x formerly twitter in us states with high and low opioid mortality machine learning and semantic network analysis
url https://www.jmir.org/2025/1/e67506
work_keys_str_mv AT shinyekim linguisticmarkersofpaincommunicationonxformerlytwitterinusstateswithhighandlowopioidmortalitymachinelearningandsemanticnetworkanalysis
AT winsonfuzunyang linguisticmarkersofpaincommunicationonxformerlytwitterinusstateswithhighandlowopioidmortalitymachinelearningandsemanticnetworkanalysis
AT zishanjiwani linguisticmarkersofpaincommunicationonxformerlytwitterinusstateswithhighandlowopioidmortalitymachinelearningandsemanticnetworkanalysis
AT emilyhamm linguisticmarkersofpaincommunicationonxformerlytwitterinusstateswithhighandlowopioidmortalitymachinelearningandsemanticnetworkanalysis
AT shreyasingh linguisticmarkersofpaincommunicationonxformerlytwitterinusstateswithhighandlowopioidmortalitymachinelearningandsemanticnetworkanalysis