HOME | tmvl

Quality Matters:

Assessing cQA Pair Quality via Transductive Multi-view Learning

Abstract

Community-based question answering (cQA) sites have become important knowledge sharing platforms, as massive cQA pairs are archived, but the uneven quality of cQA pairs leaves information seekers unsatisfied. Various efforts have been dedicated to predicting the quality of cQA content. Most of them concatenate different features into single vectors and then feed them into regression models. In fact, the quality of cQA pairs is influenced by different views, and the agreement among them is essential for quality assessment. On the other hand, the lacking of labeled data significantly hinders the quality prediction performance. Toward this end, we present a transductive multi-view learning model. It is designed to find a latent common space by unifying and preserving information from various views, including question, answer, QA relevance, asker, and answerer. Additionally, rich information in the unlabeled test cQA pairs are utilized via transductive learning to enhance the representation ability of the common space. Extensive experiments on real-world datasets have well-validated the proposed model.

TMvL Model Optimization

We adopt the alternating strategy to minimize our objective function. We first fix w and optimize O w.r.t X^(0) with gradient descent. Then we fix X^(0), and calculate the closed-form solution to w. Click here for detailed optimization process.

Data Description

Two datasets crawled from StackExchange were utilized in our experiments. The first one is from English subsite, and the other one is from Game subsite. All these two datasets contains information extracted from questions, answers, askers, and answerers.

The English dataset contains 26,746 questions, and 28,271 users (askers and answerers).
The Game Dataset contains 27,877 questions, and 24,079 users (askers and answerers).

The datasets are well formated with JSON form. Click here to download.

Code Publication

The code of our TMvL model is available. It is implemented with the help of python NumPy.

Click here to download.

Volunteer Annotation Guideline

5: The question is clear, and the answer well resolves the question. The question and the answer are well organized without typos and ambiguities.

4: The answer resolves the question. The question and the answer is understandable with some minor typos.

3: The answer resolves the question, but users need to read the question or answer several times to understand.

2: The question is partly resolved and there are less useful information in the QA pair.

1: The question and answer are difficult to understand. The answer is not related to the question.