Observation-Guided Answer Selection in Community-based Question Answering

Abstract

Finding similar questions from historical archives has been applied to question answering, with well theoretical underpinnings and great practical success. Yet each question in the returned candidate pool often associates with multiple answers, and hence users have to painstakingly browse through lots of answers in order to look for the correct one. To alleviate such problem, we present a novel scheme to rank answer candidates via pairwise comparisons. In particular, it consists of one offline learning component and one online search component. In the offline learning component, we first automatically establish the positive, negative, and neural training samples in terms of preference pairs guided by our data-driven observations. We then present a novel model to jointly incorporate these three types of training samples. The analytic solution of this model is derived. In the online search component, we first collect a pool of answer candidates for the given question via finding its similar questions. We then sort the answer candidates by leveraging the offline trained model to judge the preference orders. Extensive experiments on the real-world vertical and general community-based question answering datasets have comparatively demonstrated its sensitivity, robustness, and promising performance. In addition, we have released the codes and data to facilitate other researchers.

Data Description

Here we introduce the Question Answering Dataset, which contains HealthTap data and zhihu.com data:

The data can be download here

HealthTap Data

HealthTap is a vertical cQA site, and all questions in this site are about health. This dataset contains 39,998 questions, together with their corresponding 58,091 answers.

Zhihu.com

Zhihu.com is a general cQA site, questions about all kinds of topics can be published in this site. This dataset contains 114,200 questions, and corresponding 578,874 answers.

Please reload

Code Download

The code of PLANE model and baselines compaired in this work is available on the github. [link ]