ThesisPDF Available

Leveraging Distributional and Relational Semantics for Knowledge Extraction from Textual Corpora

Authors:

Abstract and Figures

Understanding natural language still remains one of the big open challenges in artificial intelligence. Recently, many state-of-the-art results in tasks involving natural language processing have been achieved by adopting deep learning models. However, most of these approaches rely on supervised end-to-end complex neural networks with many parameters, therefore they require a large amount of training data in order to obtain an effective generalization capability. This constraint makes these models not suitable in scenarios where there are not enough training examples. The main idea of this dissertation is to learn representations from pre-existing knowledge and transfer them to other domains where only minimal supervision is available. In detail, we exploit both distributional semantics models and structured relational data sources, and their combination, in order to learn representations which help existing models when they work with few or in absence of training data. Firstly, we propose an unsupervised text summarization method that exploits the compositional capabilities of word embeddings. The evaluations on several text summarization datasets show the effectiveness of our approach. Then, we explore the integration of embedding-based features in a learning to rank approach to entity relatedness. Moreover, we describe a novel approach to learn representations of relations expressed by their textual mentions. We propose a method to build a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset has been used to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The experiments show that the proposed model is able to learn how to transfer relational representations across different domains. Finally, we describe two real systems which implement the original contributions described in this dissertation.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
In recent years we are witnessing a growing spread of social media footprints, as the consequence of the wide use of applications such as Facebook, Twitter or LinkedIn, which allow people to share content that might provide information about personal preferences and aptitudes. Among the traits that can be inferred, empathy is the ability to feel and share another person's emotions and we consider it as a relevant aspect for the profiling and recommendation tasks. We propose a method that predicts its level for the user by exploiting her social media data and using linear regression algorithms. The results show which are the most relevant correlations among the different groups of user's features and the empathy level predicted.
Conference Paper
Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks. Our encoder is publicly available.
Conference Paper
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Conference Paper
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large num- ber of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alterna- tive to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example,we present a simplemethod for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Conference Paper
Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multi-instance multi-label problem.In this paper, we introduce a novel neural approach for distant supervised (RE) with specific focus on attention mechanisms.Unlike the feature-based logistic regression model and compositional neural models such as CNN, our approach includes two major attention-based memory components, which is capable of explicitly capturing the importance of each context word for modeling the representation of the entity pair, as well as the intrinsic dependencies between relations.Such importance degree and dependency relationship are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiment on real-world datasets shows that our approach performs significantly and consistently better than various baselines.