Analysis of HIV high-risk population characteristics with Baidu Tieba data

The textual content and temporal pattern of online activities for users gathered in the “Fear of HIV Bar” of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method b...

Full description

Saved in:
Bibliographic Details
Main Authors: Shiyao XIAO, Wei LYU, Saran CHEN, Shuo QIN, Ge HUANG, Mengsi CAI, Yuejin TAN, Xu TAN, Xin LU
Format: Article
Language:zho
Published: China InfoCom Media Group 2019-01-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2019008
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The textual content and temporal pattern of online activities for users gathered in the “Fear of HIV Bar” of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method based on key words was used to distinguish the sexual orientation of users who start a discussion in “Fear of HIV Bar”, and calculate the epidemic rate of HIV among groups with different sexual orientations. The techniques used in this paper can be supplemented as an important tool for high-risk populations research. In addition, this paper can be applied to assess the epidemic of HIV in populations with different sexual orientations by using machine learning technique to intelligently classify the sexual orientation of a user, which is of great significance for the public health agencies.
ISSN:2096-0271