
I Know What You Want to Express:
Sentence Element Inference by Incorporating External Knowledge Base
Abstract
Sentence auto-completion is an important feature that saves users many keystrokes in typing the entire sentence by providing suggestions as they type. Despite its value, the existing sentence auto-completion methods, such as query completion models, can hardly be applied to solving the object completion problem in sentences with the form of (subject,verb,object), due to the complex natural language description and the data deficiency problem. Towards this goal, we treat an SVO sentence as a three-element triple (subject, sentence pattern, object), and cast the sentence object completion problem as an element inference problem. These elements in all triples are encoded into a unified low-dimensional embedding space by our proposed TRANSFER model, which leverages the external knowledge base to strengthen the representation learning performance. With such representations, we can provide reliable candidates for the desired missing element by a linear model. Extensive experiments on a real-world dataset have well-validated our model. Meanwhile, we have successfully applied our proposed model to factoid question answering systems for answer candidate selection, which further demonstrates the applicability of the TRANSFER model.
Data Description
Here we introduce the Sentence auto-completion data, which includes:
-
Freebase Subgraph Dataset.
This data contains 5,170,340 entities and 7,152 relations. The data is in the form of triples (head, relation, tail), and there are totally 140,785,671 triples in the data.
-
Wikipedia Sentence Dataset.
This data contains 5,793 English Wikipedia sentences. For each sentence, the object and the subject are matched to KB entities, and relation paths have been calculated. The data is in the JSON form.
-
Question Dataset.
This data contains 254 questions generated from the sentence corpus. For each question, the topic entity and the answer entity are identified and exactly matched to Freebase entities.
The data can be downloaded at: