Analysis of HIV high-risk population characteristics with Baidu Tieba data

The textual content and temporal pattern of online activities for users gathered in the “Fear of HIV Bar” of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method b...

Full description

Saved in:
Bibliographic Details
Main Authors: Shiyao XIAO, Wei LYU, Saran CHEN, Shuo QIN, Ge HUANG, Mengsi CAI, Yuejin TAN, Xu TAN, Xin LU
Format: Article
Language:zho
Published: China InfoCom Media Group 2019-01-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2019008
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850213370059292672
author Shiyao XIAO
Wei LYU
Saran CHEN
Shuo QIN
Ge HUANG
Mengsi CAI
Yuejin TAN
Xu TAN
Xin LU
author_facet Shiyao XIAO
Wei LYU
Saran CHEN
Shuo QIN
Ge HUANG
Mengsi CAI
Yuejin TAN
Xu TAN
Xin LU
author_sort Shiyao XIAO
collection DOAJ
description The textual content and temporal pattern of online activities for users gathered in the “Fear of HIV Bar” of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method based on key words was used to distinguish the sexual orientation of users who start a discussion in “Fear of HIV Bar”, and calculate the epidemic rate of HIV among groups with different sexual orientations. The techniques used in this paper can be supplemented as an important tool for high-risk populations research. In addition, this paper can be applied to assess the epidemic of HIV in populations with different sexual orientations by using machine learning technique to intelligently classify the sexual orientation of a user, which is of great significance for the public health agencies.
format Article
id doaj-art-ffa8f426258941a7a9c765c9d08de1ee
institution OA Journals
issn 2096-0271
language zho
publishDate 2019-01-01
publisher China InfoCom Media Group
record_format Article
series 大数据
spelling doaj-art-ffa8f426258941a7a9c765c9d08de1ee2025-08-20T02:09:08ZzhoChina InfoCom Media Group大数据2096-02712019-01-015201900859532690Analysis of HIV high-risk population characteristics with Baidu Tieba dataShiyao XIAOWei LYUSaran CHENShuo QINGe HUANGMengsi CAIYuejin TANXu TANXin LUThe textual content and temporal pattern of online activities for users gathered in the “Fear of HIV Bar” of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method based on key words was used to distinguish the sexual orientation of users who start a discussion in “Fear of HIV Bar”, and calculate the epidemic rate of HIV among groups with different sexual orientations. The techniques used in this paper can be supplemented as an important tool for high-risk populations research. In addition, this paper can be applied to assess the epidemic of HIV in populations with different sexual orientations by using machine learning technique to intelligently classify the sexual orientation of a user, which is of great significance for the public health agencies.http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2019008online high-risk populations;MSM;HIV;LDA topic model;Baidu Tieba;machine learning
spellingShingle Shiyao XIAO
Wei LYU
Saran CHEN
Shuo QIN
Ge HUANG
Mengsi CAI
Yuejin TAN
Xu TAN
Xin LU
Analysis of HIV high-risk population characteristics with Baidu Tieba data
大数据
online high-risk populations;MSM;HIV;LDA topic model;Baidu Tieba;machine learning
title Analysis of HIV high-risk population characteristics with Baidu Tieba data
title_full Analysis of HIV high-risk population characteristics with Baidu Tieba data
title_fullStr Analysis of HIV high-risk population characteristics with Baidu Tieba data
title_full_unstemmed Analysis of HIV high-risk population characteristics with Baidu Tieba data
title_short Analysis of HIV high-risk population characteristics with Baidu Tieba data
title_sort analysis of hiv high risk population characteristics with baidu tieba data
topic online high-risk populations;MSM;HIV;LDA topic model;Baidu Tieba;machine learning
url http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2019008
work_keys_str_mv AT shiyaoxiao analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT weilyu analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT saranchen analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT shuoqin analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT gehuang analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT mengsicai analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT yuejintan analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT xutan analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata
AT xinlu analysisofhivhighriskpopulationcharacteristicswithbaidutiebadata