A topic clustering approach to finding similar questions from large question and answer archives.

With the blooming of Web 2.0, Community Question Answering (CQA) services such as Yahoo! Answers (http://answers.yahoo.com), WikiAnswer (http://wiki.answers.com), and Baidu Zhidao (http://zhidao.baidu.com), etc., have emerged as alternatives for knowledge and information acquisition. Over time, a la...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei-Nan Zhang, Ting Liu, Yang Yang, Liujuan Cao, Yu Zhang, Rongrong Ji
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0071511&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850189824295698432
author Wei-Nan Zhang
Ting Liu
Yang Yang
Liujuan Cao
Yu Zhang
Rongrong Ji
author_facet Wei-Nan Zhang
Ting Liu
Yang Yang
Liujuan Cao
Yu Zhang
Rongrong Ji
author_sort Wei-Nan Zhang
collection DOAJ
description With the blooming of Web 2.0, Community Question Answering (CQA) services such as Yahoo! Answers (http://answers.yahoo.com), WikiAnswer (http://wiki.answers.com), and Baidu Zhidao (http://zhidao.baidu.com), etc., have emerged as alternatives for knowledge and information acquisition. Over time, a large number of question and answer (Q&A) pairs with high quality devoted by human intelligence have been accumulated as a comprehensive knowledge base. Unlike the search engines, which return long lists of results, searching in the CQA services can obtain the correct answers to the question queries by automatically finding similar questions that have already been answered by other users. Hence, it greatly improves the efficiency of the online information retrieval. However, given a question query, finding the similar and well-answered questions is a non-trivial task. The main challenge is the word mismatch between question query (query) and candidate question for retrieval (question). To investigate this problem, in this study, we capture the word semantic similarity between query and question by introducing the topic modeling approach. We then propose an unsupervised machine-learning approach to finding similar questions on CQA Q&A archives. The experimental results show that our proposed approach significantly outperforms the state-of-the-art methods.
format Article
id doaj-art-781b2aff15f74e32b624ca11056c794d
institution OA Journals
issn 1932-6203
language English
publishDate 2014-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-781b2aff15f74e32b624ca11056c794d2025-08-20T02:15:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0193e7151110.1371/journal.pone.0071511A topic clustering approach to finding similar questions from large question and answer archives.Wei-Nan ZhangTing LiuYang YangLiujuan CaoYu ZhangRongrong JiWith the blooming of Web 2.0, Community Question Answering (CQA) services such as Yahoo! Answers (http://answers.yahoo.com), WikiAnswer (http://wiki.answers.com), and Baidu Zhidao (http://zhidao.baidu.com), etc., have emerged as alternatives for knowledge and information acquisition. Over time, a large number of question and answer (Q&A) pairs with high quality devoted by human intelligence have been accumulated as a comprehensive knowledge base. Unlike the search engines, which return long lists of results, searching in the CQA services can obtain the correct answers to the question queries by automatically finding similar questions that have already been answered by other users. Hence, it greatly improves the efficiency of the online information retrieval. However, given a question query, finding the similar and well-answered questions is a non-trivial task. The main challenge is the word mismatch between question query (query) and candidate question for retrieval (question). To investigate this problem, in this study, we capture the word semantic similarity between query and question by introducing the topic modeling approach. We then propose an unsupervised machine-learning approach to finding similar questions on CQA Q&A archives. The experimental results show that our proposed approach significantly outperforms the state-of-the-art methods.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0071511&type=printable
spellingShingle Wei-Nan Zhang
Ting Liu
Yang Yang
Liujuan Cao
Yu Zhang
Rongrong Ji
A topic clustering approach to finding similar questions from large question and answer archives.
PLoS ONE
title A topic clustering approach to finding similar questions from large question and answer archives.
title_full A topic clustering approach to finding similar questions from large question and answer archives.
title_fullStr A topic clustering approach to finding similar questions from large question and answer archives.
title_full_unstemmed A topic clustering approach to finding similar questions from large question and answer archives.
title_short A topic clustering approach to finding similar questions from large question and answer archives.
title_sort topic clustering approach to finding similar questions from large question and answer archives
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0071511&type=printable
work_keys_str_mv AT weinanzhang atopicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT tingliu atopicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT yangyang atopicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT liujuancao atopicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT yuzhang atopicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT rongrongji atopicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT weinanzhang topicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT tingliu topicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT yangyang topicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT liujuancao topicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT yuzhang topicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives
AT rongrongji topicclusteringapproachtofindingsimilarquestionsfromlargequestionandanswerarchives