Improving topic modeling performance on social media through semantic relationships within biomedical terminology.
Topic modeling utilizes unsupervised machine learning to detect underlying themes within texts and has been deployed routinely to analyze social media for insights into healthcare issues. However, the inherent messiness of social media hinders the full realization of this technique's potential....
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0318702 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849317076463779840 |
|---|---|
| author | Yi Xin Monika E Grabowska Srushti Gangireddy Matthew S Krantz V Eric Kerchberger Alyson L Dickson Qiping Feng Zhijun Yin Wei-Qi Wei |
| author_facet | Yi Xin Monika E Grabowska Srushti Gangireddy Matthew S Krantz V Eric Kerchberger Alyson L Dickson Qiping Feng Zhijun Yin Wei-Qi Wei |
| author_sort | Yi Xin |
| collection | DOAJ |
| description | Topic modeling utilizes unsupervised machine learning to detect underlying themes within texts and has been deployed routinely to analyze social media for insights into healthcare issues. However, the inherent messiness of social media hinders the full realization of this technique's potential. As such, we hypothesized that restricting medical concepts in social media texts to specific related semantic types and applying topic modeling to these concepts could be a feasible approach to overcome the challenge of traditional topic modeling for social media texts. Therefore, we developed a semantic-type-based topic modeling pipeline to discover self-reported health-related topics. This pipeline integrated semantic type information and Systematized Medical Nomenclature for Medicine (SNOMED) precoordinated expressions into a traditional topic modeling approach to enhance effectiveness in clustering meaningful, distinct topics. Using social media texts regarding statins for illustration, we evaluated the efficacy of this new approach and validated a newly identified topic using real-world clinical data. Based on expert evaluations, this approach resulted in more novel, distinguishable, and meaningful health-related topics compared to traditional topic modeling. In addition, our electronic health record validation for a newly identified topic in two real-world clinical databases indicated that statin users had a higher prevalence of depression or anxiety compared to matched non-users. Our results indicate that this new topic modeling pipeline can improve the extraction of themes from noisy online discussions, thereby contributing to deeper insights for healthcare research. |
| format | Article |
| id | doaj-art-a95e03b8d4714c3a845a9f04adfcdfde |
| institution | Kabale University |
| issn | 1932-6203 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-a95e03b8d4714c3a845a9f04adfcdfde2025-08-20T03:51:20ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01202e031870210.1371/journal.pone.0318702Improving topic modeling performance on social media through semantic relationships within biomedical terminology.Yi XinMonika E GrabowskaSrushti GangireddyMatthew S KrantzV Eric KerchbergerAlyson L DicksonQiping FengZhijun YinWei-Qi WeiTopic modeling utilizes unsupervised machine learning to detect underlying themes within texts and has been deployed routinely to analyze social media for insights into healthcare issues. However, the inherent messiness of social media hinders the full realization of this technique's potential. As such, we hypothesized that restricting medical concepts in social media texts to specific related semantic types and applying topic modeling to these concepts could be a feasible approach to overcome the challenge of traditional topic modeling for social media texts. Therefore, we developed a semantic-type-based topic modeling pipeline to discover self-reported health-related topics. This pipeline integrated semantic type information and Systematized Medical Nomenclature for Medicine (SNOMED) precoordinated expressions into a traditional topic modeling approach to enhance effectiveness in clustering meaningful, distinct topics. Using social media texts regarding statins for illustration, we evaluated the efficacy of this new approach and validated a newly identified topic using real-world clinical data. Based on expert evaluations, this approach resulted in more novel, distinguishable, and meaningful health-related topics compared to traditional topic modeling. In addition, our electronic health record validation for a newly identified topic in two real-world clinical databases indicated that statin users had a higher prevalence of depression or anxiety compared to matched non-users. Our results indicate that this new topic modeling pipeline can improve the extraction of themes from noisy online discussions, thereby contributing to deeper insights for healthcare research.https://doi.org/10.1371/journal.pone.0318702 |
| spellingShingle | Yi Xin Monika E Grabowska Srushti Gangireddy Matthew S Krantz V Eric Kerchberger Alyson L Dickson Qiping Feng Zhijun Yin Wei-Qi Wei Improving topic modeling performance on social media through semantic relationships within biomedical terminology. PLoS ONE |
| title | Improving topic modeling performance on social media through semantic relationships within biomedical terminology. |
| title_full | Improving topic modeling performance on social media through semantic relationships within biomedical terminology. |
| title_fullStr | Improving topic modeling performance on social media through semantic relationships within biomedical terminology. |
| title_full_unstemmed | Improving topic modeling performance on social media through semantic relationships within biomedical terminology. |
| title_short | Improving topic modeling performance on social media through semantic relationships within biomedical terminology. |
| title_sort | improving topic modeling performance on social media through semantic relationships within biomedical terminology |
| url | https://doi.org/10.1371/journal.pone.0318702 |
| work_keys_str_mv | AT yixin improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT monikaegrabowska improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT srushtigangireddy improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT matthewskrantz improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT verickerchberger improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT alysonldickson improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT qipingfeng improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT zhijunyin improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology AT weiqiwei improvingtopicmodelingperformanceonsocialmediathroughsemanticrelationshipswithinbiomedicalterminology |