Improving topic modeling performance on social media through semantic relationships within biomedical terminology.

Topic modeling utilizes unsupervised machine learning to detect underlying themes within texts and has been deployed routinely to analyze social media for insights into healthcare issues. However, the inherent messiness of social media hinders the full realization of this technique's potential....

Full description

Saved in:
Bibliographic Details
Main Authors: Yi Xin, Monika E Grabowska, Srushti Gangireddy, Matthew S Krantz, V Eric Kerchberger, Alyson L Dickson, Qiping Feng, Zhijun Yin, Wei-Qi Wei
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0318702
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849317076463779840
author Yi Xin
Monika E Grabowska
Srushti Gangireddy
Matthew S Krantz
V Eric Kerchberger
Alyson L Dickson
Qiping Feng
Zhijun Yin
Wei-Qi Wei
author_facet Yi Xin
Monika E Grabowska
Srushti Gangireddy
Matthew S Krantz
V Eric Kerchberger
Alyson L Dickson
Qiping Feng
Zhijun Yin
Wei-Qi Wei
author_sort Yi Xin
collection DOAJ
description Topic modeling utilizes unsupervised machine learning to detect underlying themes within texts and has been deployed routinely to analyze social media for insights into healthcare issues. However, the inherent messiness of social media hinders the full realization of this technique's potential. As such, we hypothesized that restricting medical concepts in social media texts to specific related semantic types and applying topic modeling to these concepts could be a feasible approach to overcome the challenge of traditional topic modeling for social media texts. Therefore, we developed a semantic-type-based topic modeling pipeline to discover self-reported health-related topics. This pipeline integrated semantic type information and Systematized Medical Nomenclature for Medicine (SNOMED) precoordinated expressions into a traditional topic modeling approach to enhance effectiveness in clustering meaningful, distinct topics. Using social media texts regarding statins for illustration, we evaluated the efficacy of this new approach and validated a newly identified topic using real-world clinical data. Based on expert evaluations, this approach resulted in more novel, distinguishable, and meaningful health-related topics compared to traditional topic modeling. In addition, our electronic health record validation for a newly identified topic in two real-world clinical databases indicated that statin users had a higher prevalence of depression or anxiety compared to matched non-users. Our results indicate that this new topic modeling pipeline can improve the extraction of themes from noisy online discussions, thereby contributing to deeper insights for healthcare research.
format Article
id doaj-art-a95e03b8d4714c3a845a9f04adfcdfde
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-a95e03b8d4714c3a845a9f04adfcdfde2025-08-20T03:51:20ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e031870210.1371/journal.pone.0318702Improving topic modeling performance on social media through semantic relationships within biomedical terminology.Yi XinMonika E GrabowskaSrushti GangireddyMatthew S KrantzV Eric KerchbergerAlyson L DicksonQiping FengZhijun YinWei-Qi WeiTopic modeling utilizes unsupervised machine learning to detect underlying themes within texts and has been deployed routinely to analyze social media for insights into healthcare issues. However, the inherent messiness of social media hinders the full realization of this technique's potential. As such, we hypothesized that restricting medical concepts in social media texts to specific related semantic types and applying topic modeling to these concepts could be a feasible approach to overcome the challenge of traditional topic modeling for social media texts. Therefore, we developed a semantic-type-based topic modeling pipeline to discover self-reported health-related topics. This pipeline integrated semantic type information and Systematized Medical Nomenclature for Medicine (SNOMED) precoordinated expressions into a traditional topic modeling approach to enhance effectiveness in clustering meaningful, distinct topics. Using social media texts regarding statins for illustration, we evaluated the efficacy of this new approach and validated a newly identified topic using real-world clinical data. Based on expert evaluations, this approach resulted in more novel, distinguishable, and meaningful health-related topics compared to traditional topic modeling. In addition, our electronic health record validation for a newly identified topic in two real-world clinical databases indicated that statin users had a higher prevalence of depression or anxiety compared to matched non-users. Our results indicate that this new topic modeling pipeline can improve the extraction of themes from noisy online discussions, thereby contributing to deeper insights for healthcare research.https://doi.org/10.1371/journal.pone.0318702
spellingShingle Yi Xin
Monika E Grabowska
Srushti Gangireddy
Matthew S Krantz
V Eric Kerchberger
Alyson L Dickson
Qiping Feng
Zhijun Yin
Wei-Qi Wei
Improving topic modeling performance on social media through semantic relationships within biomedical terminology.
PLoS ONE
title Improving topic modeling performance on social media through semantic relationships within biomedical terminology.
title_full Improving topic modeling performance on social media through semantic relationships within biomedical terminology.
title_fullStr Improving topic modeling performance on social media through semantic relationships within biomedical terminology.
title_full_unstemmed Improving topic modeling performance on social media through semantic relationships within biomedical terminology.
title_short Improving topic modeling performance on social media through semantic relationships within biomedical terminology.
title_sort improving topic modeling performance on social media through semantic relationships within biomedical terminology
url https://doi.org/10.1371/journal.pone.0318702
work_keys_str_mv AT yixin improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT monikaegrabowska improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT srushtigangireddy improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT matthewskrantz improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT verickerchberger improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT alysonldickson improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT qipingfeng improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT zhijunyin improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology
AT weiqiwei improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology