Natural language processing (NLP) may face the inexplicable “black-box” problem of parameters and unreasonable modeling for lack of embedding of some characteristics of natural language, while the quantum-inspired models based on quantum theory may provide a potential solution. However, the essential prior knowledge and pretrained text features are often ignored at the early stage of the development of quantum-inspired models. To attacking the above challenges, a pretrained quantum-inspired deep neural network is proposed in this work, which is constructed based on quantum theory for carrying out strong performance and great interpretability in related NLP fields. Concretely, a quantum-inspired pretrained feature embedding (QPFE) method is first developed to model superposition states for words to embed more textual features. Then, a QPFE-ERNIE model is designed by merging the semantic features learned from the prevalent pretrained model ERNIE, which is verified with two NLP downstream tasks: 1) sentiment classification and 2) word sense disambiguation (WSD). In addition, schematic quantum circuit diagrams are provided, which has potential impetus for the future realization of quantum NLP with quantum device. Finally, the experiment results demonstrate QPFE-ERNIE is significantly better for sentiment classification than gated recurrent unit (GRU), BiLSTM, and TextCNN on five datasets in all metrics and achieves better results than ERNIE in accuracy, F1-score, and precision on two datasets (CR and SST), and it also has advantage for WSD over the classical models, including BERT (improves F1-score by
5.2
on average) and ERNIE (improves F1-score by
4.2
on average) and improves the F1-score by
8.7
on average compared with a previous quantum-inspired model QWSD. QPFE-ERNIE provides a novel pretrained quantum-inspired model for solving NLP problems, and it lays a foundation for exploring more quantum-inspired models in the future.
Most of the existing quantum-inspired models are based on amplitude-phase embedding to model natural language, which maps words into Hilbert space. In quantum-computing theory, the vectors corresponding to quantum states are all complex values, so there is a gap between these two areas. Presently, complex-valued neural networks have been studied, but their practical applications are few, let alone in the downstream tasks of natural language processing such as sentiment analysis and language modeling. In fact, the complex-valued neural network can use the imaginary part information to embed hidden information and can express more complex information, which is suitable for modeling complex natural language. Meanwhile, quantum-inspired models are defined in Hilbert space, which is also a complex space. So it is natural to construct quantum-inspired models based on complex-valued neural networks. Therefore, we propose a new quantum-inspired model for NLP, ComplexQNN, which contains a complex-valued embedding layer, a quantum encoding layer, and a measurement layer. The modules of ComplexQNN are fully based on complex-valued neural networks. It is more in line with quantum-computing theory and easier to transfer to quantum computers in the future to achieve exponential acceleration. We conducted experiments on six sentiment-classification datasets comparing with five classical models (TextCNN, GRU, ELMo, BERT, and RoBERTa). The results show that our model has improved by 10% in accuracy metric compared with TextCNN and GRU, and has competitive experimental results with ELMo, BERT, and RoBERTa.
Aiming at classifying the polarities over aspects, aspect-based sentiment analysis (ABSA) is a fine-grained task of sentiment analysis. The vector representations of current models are generally constrained to real values. Based on mathematical formulations of quantum theory, quantum language models have drawn increasing attention. Words in such models can be projected as physical particles in quantum systems, and naturally represented by representation-rich complex-valued vectors in a Hilbert Space, rather than real-valued ones. In this paper, the Hilbert Space representation for ABSA models is investigated and the complexification of three strong real-valued baselines are constructed. Experimental results demonstrate the effectiveness of complexification and the outperformance of our complex-valued models, illustrating that the complex-valued embedding can carry additional information beyond the real embedding. Especially, a complex-valued RoBERTa model outperforms or approaches the previous state-of-the-art on three standard benchmarking datasets.
Motivated by quantum-inspired complex word embedding, interpretable complex-valued word embedding (ICWE) is proposed to design two end-to-end quantum-inspired deep neural networks (ICWE-QNN and CICWE-QNN representing convolutional complex-valued neural network based on ICWE) for binary text classification. They have the proven feasibility and effectiveness in the application of NLP and can solve the problem of text information loss in CE-Mix [1] model caused by neglecting the important linguistic features of text, since linguistic feature extraction is presented in our model with deep learning algorithms, in which gated recurrent unit (GRU) extracts the sequence information of sentences, attention mechanism makes the model focus on important words in sentences and convolutional layer captures the local features of projected matrix. The model ICWE-QNN can avoid random combination of word tokens and CICWE-QNN fully considers textual features of the projected matrix. Experiments conducted on five benchmarking classification datasets demonstrate our proposed models have higher accuracy than the compared traditional models including CaptionRep BOW, DictRep BOW and Paragram-Phrase, and they also have great performance on F1-score. Eespecially, CICWE-QNN model has higher accuracy than the quantum-inspired model CE-Mix as well for four datasets including SST, SUBJ, CR and MPQA.
In recent years, Question Answering systems have become more popular and widely used by users. Despite the increasing popularity of these systems, their performance is not even sufficient for textual data and requires further research. These systems consist of several parts that one of them is the Answer Selection component. This component detects the most relevant answer from a list of candidate answers. The methods presented in previous researches have attempted to provide an independent model to undertake the answer-selection task. An independent model cannot comprehend the syntactic and semantic features of questions and answers with a small training dataset. To fill this gap, language models can be employed in implementing the answer selection part. This action enables the model to have a better understanding of the language in order to understand questions and answers better than previous works. In this research, we will present the 'BAS' stands for BERT Answer Selection that uses the BERT language model to comprehend language. The empirical results of applying the model on the TrecQA Raw, TrecQA Clean, and WikiQA datasets demonstrate that using a robust language model such as BERT can enhance the performance. Using a more robust classifier also enhances the effect of the language model on the answer selection component. The results demonstrate that language comprehension is an essential requirement in natural language processing tasks such as answer selection.
Quantum-inspired language models have been introduced to Information Retrieval due to their transparency and interpretability. While exciting progresses have been made, current studies mainly investigate the relationship between density matrices of difference sentence subspaces of a semantic Hilbert space. The Hilbert space as a whole which has a unique density matrix is lack of exploration. In this paper, we propose a novel Quantum Expectation Value based Language Model (QEV-LM). A unique shared density matrix is constructed for the Semantic Hilbert Space. Words and sentences are viewed as different observables in this quantum model. Under this background, a matching score describing the similarity between a question-answer pair is naturally explained as the quantum expectation value of a joint question-answer observable. In addition to the theoretical soundness, experiment results on the TREC-QA and WIKIQA datasets demonstrate the computational efficiency of our proposed model with excellent performance and low time consumption.
Word embeddings that consider context have attracted great attention for various natural language processing tasks in recent years. In this paper, we utilize contextualized word embeddings with the transformer encoder for sentence similarity modeling in the answer selection task. We present two different approaches (feature-based and fine-tuning-based) for answer selection. In the feature-based approach, we utilize two types of contextualized embeddings, namely the Embeddings from Language Models (ELMo) and the Bidirectional Encoder Representations from Transformers (BERT) and integrate each of them with the transformer encoder. We find that integrating these contextual embeddings with the transformer encoder is effective to improve the performance of sentence similarity modeling. In the second approach, we fine-tune two pre-trained transformer encoder models for the answer selection task. Based on our experiments on six datasets, we find that the fine-tuning approach outperforms the feature-based approach on all of them. Among our fine-tuning-based models, the Robustly Optimized BERT Pretraining Approach (RoBERTa) model results in new state-of-the-art performance across five datasets.
In this paper, we propose a novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TREC-QA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance for both datasets.
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT (Devlin et al., 2019). Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.
Sentiment analysis aims to capture the diverse sentiment information expressed by authors in given natural language texts, and it has been a core research topic in many artificial intelligence areas. The existing machine-learning-based sentiment analysis approaches generally focus on employing popular textual feature representation methods, e.g., term frequency-inverse document frequency (tf-idf), n-gram features, and word embeddings, to construct vector representations of documents. These approaches can model rich syntactic and semantic information, but they largely fail to capture the sentiment information that is central to sentiment analysis. To address this issue, we propose a quantum-inspired sentiment representation (QSR) model. This model can not only represent the semantic content of documents but also capture the sentiment information. Specifically, since adjectives and adverbs are good indicators of subjective expression, this model first extracts sentiment phrases that match the designed sentiment patterns based on adjectives and adverbs. Then, both single words and sentiment phrases in the documents are modeled as a collection of projectors, which are finally encapsulated in density matrices through maximum likelihood estimation. These density matrices successfully integrate the sentiment information into the representations of documents. Extensive experiments are conducted on two widely used Twitter datasets, which are the Obama-McCain Debate (OMD) dataset and the Sentiment140 Twitter dataset. The experimental results show that our model significantly outperforms a number of state-of-the-art baselines and demonstrate the effectiveness of the QSR model for sentiment analysis.
As an alternative to question answering methods based on feature engineering, deep learning approaches such as convolutional neural networks (CNNs) and Long Short-Term Memory Models (LSTMs) have recently been proposed for semantic matching of questions and answers. To achieve good results, however, these models have been combined with additional features such as word overlap or BM25 scores. Without this combination, these models perform significantly worse than methods based on linguistic feature engineering. In this paper, we propose an attention based neural matching model for ranking short answer text. We adopt value-shared weighting scheme instead of position-shared weighting scheme for combining different matching signals and incorporate question term importance learning using question attention network. Using the popular benchmark TREC QA data, we show that the relatively simple aNMM model can significantly outperform other neural network models that have been used for the question answering task, and is competitive with models that are combined with additional features. When aNMM is combined with additional features, it outperforms all baselines.
Although information retrieval models based on Markov Random Fields (MRF), such as Sequential Dependence Model and Weighted Sequential Dependence Model (WSDM), have been shown to out-perform bag-of-words probabilistic and language modeling retrieval models by taking into account term dependencies, it is not known how to eectively account for term dependencies in query expansion methods based on pseudo-relevance feedback (PRF) for retrieval models of this type. In this paper, we propose Semantic Weighted Dependence Model (SWDM), a PRF based query expansion method for WSDM, which utilizes distributed low-dimensional word representations (i.e., word embeddings). Our method nds the closest unigrams to each query term in the embedding space and top retrieved documents and directly incorporates them into the retrieval function of WSDM. Experiments on TREC datasets indicate statistically signiicant improvement of SWDM over state-of-the-art MRF retrieval models, PRF methods for MRF retrieval models and embedding based query expansion methods for bag-of-words retrieval models.
In recent years, deep neural networks have led to exciting breakthroughs in speech recognition, computer vision, and natural language processing (NLP) tasks. However, there have been few positive results of deep models on ad-hoc retrieval tasks. This is partially due to the fact that many important characteristics of the ad-hoc retrieval task have not been well addressed in deep models yet. Typically, the ad-hoc retrieval task is formalized as a matching problem between two pieces of text in existing work using deep models, and treated equivalent to many NLP tasks such as paraphrase identification, question answering and automatic conversation. However, we argue that the ad-hoc retrieval task is mainly about relevance matching while most NLP matching tasks concern semantic matching, and there are some fundamental differences between these two matching tasks. Successful relevance matching requires proper handling of the exact matching signals, query term importance, and diverse matching requirements. In this paper, we propose a novel deep relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model employs a joint deep architecture at the query term level for relevance matching. By using matching histogram mapping, a feed forward matching network, and a term gating network, we can effectively deal with the three relevance matching factors mentioned above. Experimental results on two representative benchmark collections show that our model can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models.
Previous studies have shown that semantically meaningful representations of words and text can be acquired through neural embedding models. In particular, paragraph vector (PV) models have shown impressive performance in some natural language processing tasks by estimating a document (topic) level language model. Integrating the PV models with traditional language model approaches to retrieval, however, produces unstable performance and limited improvements. In this paper, we formally discuss three intrinsic problems of the original PV model that restrict its performance in retrieval tasks. We also describe modifications to the model that make it more suitable for the IR task, and show their impact through experiments and case studies. The three issues we address are (1) the unregulated training process of PV is vulnerable to short document over-fitting that produces length bias in the final retrieval model; (2) the corpus-based negative sampling of PV leads to a weighting scheme for words that overly suppresses the importance of frequent words; and (3) the lack of word-context information makes PV unable to capture word substitution relationships.
Semantic matching, which aims to determine the matching degree between two texts, is a fundamental problem for many NLP applications. Recently, deep learning approach has been applied to this problem and significant improvements have been achieved. In this paper, we propose to view the generation of the global interaction between two texts as a recursive process: i.e. the interaction of two texts at each position is a composition of the interactions between their prefixes as well as the word level interaction at the current position. Based on this idea, we propose a novel deep architecture, namely Match-SRNN, to model the recursive matching structure. Firstly, a tensor is constructed to capture the word level interactions. Then a spatial RNN is applied to integrate the local interactions recursively, with importance determined by four types of gates. Finally, the matching score is calculated based on the global interaction. We show that, after degenerated to the exact matching scenario, Match-SRNN can approximate the dynamic programming process of longest common subsequence. Thus, there exists a clear interpretation for Match-SRNN. Our experiments on two semantic matching tasks showed the effectiveness of Match-SRNN, and its ability of visualizing the learned matching structure.
Learning a similarity function between pairs of objects is at the core of learning to rank approaches. In information retrieval tasks we typically deal with query-document pairs, in question answering -- question-answer pairs. However, before learning can take place, such pairs needs to be mapped from the original space of symbolic words into some feature space encoding various aspects of their relatedness, e.g. lexical, syntactic and semantic. Feature engineering is often a laborious task and may require external knowledge sources that are not always available or difficult to obtain. Recently, deep learning approaches have gained a lot of attention from the research community and industry for their ability to automatically learn optimal feature representation for a given task, while claiming state-of-the-art performance in many tasks in computer vision, speech recognition and natural language processing. In this paper, we present a convolutional neural network architecture for reranking pairs of short texts, where we learn the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data. Our network takes only words in the input, thus requiring minimal preprocessing. In particular, we consider the task of reranking short text pairs where elements of the pair are sentences. We test our deep learning system on two popular retrieval tasks from TREC: Question Answering and Microblog Retrieval. Our model demonstrates strong performance on the first task beating previous state-of-the-art systems by about 3\% absolute points in both MAP and MRR and shows comparable results on tweet reranking, while enjoying the benefits of no manual feature engineering and no additional syntactic parsers.
Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition , our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.
Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. In this study we strive to develop a series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them. The proposed deep structured semantic models are discriminatively trained by maximizing the conditional likelihood of the clicked documents given a query using the clickthrough data. To make our models applicable to large-scale Web search applications, we also use a technique called word hashing, which is shown to effectively scale up our semantic models to handle large vocabularies which are common in such tasks. The new models are evaluated on a Web document ranking task using a real-world data set. Results show that our best model significantly outperforms other latent semantic models, which were considered state-of-the-art in the performance prior to the work presented in this paper.
This paper presents a syntax-driven ap- proach to question answering, specifically the answer-sentence selection problem for short-answer questions. Rather than us- ing syntactic features to augment exist- ing statistical classifiers (as in previous work), we build on the idea that ques- tions and their (correct) answers relate to each other via loose but predictable syntac- tic transformations. We propose a prob- abilistic quasi-synchronous grammar, in- spired by one proposed for machine trans- lation (D. Smith and Eisner, 2006), and pa- rameterized by mixtures of a robust non- lexical syntax/alignment model with a(n optional) lexical-semantics-driven log-linear model. Our model learns soft alignments as a hidden variable in discriminative training. Experimentalresultsusing theTRECdataset are shown to significantly outperform strong state-of-the-art baselines.
In recent years, researchers have developed novel Quantum-Inspired Neural Network (QINN) frameworks for the Natural Language Processing (NLP) tasks, inspired by the theoretical investigations of quantum cognition. However, we have found that the training efficiency of QINNs is significantly lower than that of classical networks. We analyze the unitary transformation modules of existing QINNs based on the time displacement symmetry of quantum mechanics and discover that they are resembling a mathematical form similar to the first-order Euler method. The high truncation error associated with Euler method affects the training efficiency of QINNs. In order to enhance the training efficiency of QINNs, we generalize QINNs' unitary transformation modules to the Quantum-like high-order Runge-Kutta methods (QRKs). Moreover, we present the results of experiments on conversation emotion recognition and text classification tasks to validate the effectiveness of the proposed approach.
Language Modeling (LM) is a fundamental research topic in a range of areas. Recently, inspired by quantum theory, a novel Quantum Language Model (QLM) has been proposed for Information Retrieval (IR). In this paper, we aim to broaden the theoretical and practical basis of QLM. We develop a Neural Network based Quantum-like Language Model (NNQLM) and apply it to Question Answering. Specifically, based on word embeddings, we design a new density matrix, which represents a sentence (e.g., a question or an answer) and encodes a mixture of semantic subspaces. Such a density matrix, together with a joint representation of the question and the answer, can be integrated into neural network architectures (e.g., 2-dimensional convolutional neural networks). Experiments on the TREC-QA and WIKIQA datasets have verified the effectiveness of our proposed models.
Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words.
This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM.
The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM’s performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.
Quantum language models (QLMs) in which words are modeled as a quantum superposition of sememes have demonstrated a high level of model transparency and good post-hoc interpretability. Nevertheless, in the current literature, word sequences are basically modeled as a classical mixture of word states, which cannot fully exploit the potential of a quantum probabilistic description. A quantum-inspired neural network (NN) module is yet to be developed to explicitly capture the nonclassical correlations within the word sequences. We propose a NN model with a novel entanglement embedding (EE) module, whose function is to transform the word sequence into an entangled pure state representation. Strong quantum entanglement, which is the central concept of quantum information and an indication of parallelized correlations among the words, is observed within the word sequences. The proposed QLM with EE (QLM-EE) is proposed to implement on classical computing devices with a quantum-inspired NN structure, and numerical experiments show that QLM-EE achieves superior performance compared with the classical deep NN models and other QLMs on question answering (QA) datasets. In addition, the post-hoc interpretability of the model can be improved by quantifying the degree of entanglement among the word states.
Equipped with quantum probability theory, quantum language models (QLMs) aimed at a principled approach to modeling term dependency have drawn increasing attention. However, even though they are theoretically more general and have effective performance, current QLMs do not take context information into account. The most important element, namely density matrix, is constructed as a summation of word projectors, whose representation is independent of context information. To address this problem, we propose a Context based Quantum Language Model (C-QLM). Between word embedding and sentence density matrix, a bidirectional long short term memory network is adopted to learn the hidden context information. Then a set of vectors is utilized to extract density matrices’s features for question and answer sentences which can be used to calculate the matching score. Experiment results on TREC-QA and WIKI-QA datasets demonstrate the effectiveness of our proposed model.
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.
This paper presents \textttConv-KNRM, a Convolutional Kernel-based Neural Ranking Model that models n-gram soft matches for ad-hoc search. Instead of exact matching query and document n-grams, \textttConv-KNRM uses Convolutional Neural Networks to represent n-grams of various lengths and soft matches them in a unified embedding space. The n-gram soft matches are then utilized by the kernel pooling and learning-to-rank layers to generate the final ranking score. \textttConv-KNRM can be learned end-to-end and fully optimized from user feedback. The learned model»s generalizability is investigated by testing how well it performs in a related domain with small amounts of training data. Experiments on English search logs, Chinese search logs, and TREC Web track tasks demonstrated consistent advantages of \textttConv-KNRM over prior neural IR methods and feature-based methods.
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large num- ber of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alterna- tive to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example,we present a simplemethod for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference dataset can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks.
Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.
The quantum probabilistic framework has recently been applied to Information Retrieval (IR). A representative is the Quantum Language Model (QLM), which is developed for the ad-hoc retrieval with single queries and has achieved significant improvements over traditional language models. In QLM, a density matrix, defined on the quantum probabilistic space, is estimated as a representation of user's search intention with respect to a specific query. However, QLM is unable to capture the dynamics of user's information need in query history. This limitation restricts its further application on the dynamic search tasks, e.g., session search. In this paper, we propose a Session-based Quantum Language Model (SQLM) that deals with multi-query session search task. In SQLM, a transformation model of density matrices is proposed to model the evolution of user's information need in response to the user's interaction with search engine, by incorporating features extracted from both positive feedback (clicked documents) and negative feedback (skipped documents). Extensive experiments conducted on TREC 2013 and 2014 session track data demonstrate the effectiveness of SQLM in comparison with the classic QLM.
User interactions in search system represent a rich source of implicit knowledge about the user’s cognitive state and information need that continuously evolves over time. Despite massive efforts that have been made to exploiting and incorporating this implicit knowledge in information retrieval, it is still a challenge to effectively capture the term dependencies and the user’s dynamic information need (reflected by query modifications) in the context of user interaction. To tackle these issues, motivated by the recent Quantum Language Model (QLM), we develop a QLM based retrieval model for session search, which naturally incorporates the complex term dependencies occurring in user’s historical queries and clicked documents with density matrices. In order to capture the dynamic information within users’ search session, we propose a density matrix transformation framework and further develop an adaptive QLM ranking model. Extensive comparative experiments show the effectiveness of our session quantum language models.
In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are discovered by the model and are then aggregated to form a sentence-level feature vector. Finally, a non-linear transformation is applied to extract high-level semantic information to generate a continuous vector representation for the full text string. The proposed convolutional latent semantic model (CLSM) is trained on clickthrough data and is evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that the proposed model effectively captures salient semantic information in queries and documents for the task while significantly outperforming previous state-of-the-art semantic models.
Recent advances in neural variational inference have spawned a renaissance in
deep latent variable models. In this paper we introduce a generic variational
inference framework for generative and conditional models of text. While
traditional variational methods derive an analytic approximation for the
intractable distributions over latent variables, here we construct an inference
network conditioned on the discrete text input to provide the variational
distribution. We validate this framework on two very different text modelling
applications, generative document modelling and supervised question answering.
Our neural variational document model combines a continuous stochastic document
representation with a bag-of-words generative model and achieves the lowest
reported perplexities on two standard test corpora. The neural answer selection
model employs a stochastic representation layer within an attention mechanism
to extract the semantics between a question and answer pair. On two question
answering benchmarks this model exceeds all previous published benchmarks.
Traditional information retrieval (IR) models use bag-of-words as the basic representation and assume that some form of independence holds between terms. Representing term dependencies and defining a scoring function capable of integrating such additional evidence is theoretically and practically challenging. Recently, Quantum Theory (QT) has been proposed as a possible, more general framework for IR. However, only a limited number of investigations have been made and the potential of QT has not been fully explored and tested. We develop a new, generalized Language Modeling approach for IR by adopting the probabilistic framework of QT. In particular, quantum probability could account for both single and compound terms at once without having to extend the term space artificially as in previous studies. This naturally allows us to avoid the weight-normalization problem, which arises in the current practice by mixing scores from matching compound terms and from matching single terms. Our model is the first practical application of quantum probability to show significant improvements over a robust bag-of-words baseline and achieves better performance on a stronger non bag-of-words baseline.
In his investigations of the mathematical foundations of quantum mechanics, Mackey1 has proposed the following problem: Determine all measures on the closed subspaces of a Hilbert space. A measure on the closed subspaces means a function μ which assigns to every closed subspace a non-negative real number such that if {A
i} is a countable collection of mutually orthogonal subspaces having closed linear span B, then
.
An obstacle to research in automatic paraphrase identification and genera-tion is the lack of large-scale, publicly-available labeled corpora of sentential paraphrases. This paper describes the creation of the recently-released Micro-soft Research Paraphrase Corpus, which contains 5801 sentence pairs, each hand-labeled with a binary judg-ment as to whether the pair constitutes a paraphrase. The corpus was created using heuristic extraction techniques in conjunction with an SVM-based classi-fier to select likely sentence-level para-phrases from a large corpus of topic-clustered news data. These pairs were then submitted to human judges, who confirmed that 67% were in fact se-mantically equivalent. In addition to describing the corpus itself, we explore a number of issues that arose in defin-ing guidelines for the human raters.
Quantum mechanics has opened a vast sector of physics to probability calculus. In fact most of the physical interpretation of the formalism of quantum mechanics is expressed in terms of probability statements.1
Automatic documentation systems which use the words contained in the individual documents as a principal source of document identifications may not perform satisfactorily under all circumstances. Methods have therefore been devised within the last few years for computing association measures between words and between documents, and for using such associated words, or information contained in associated documents, to supplement and refine the original document identifications. It is suggested in this study that bibliographic citations may provide a simple means for obtaining associated documents to be incorporated in an automatic documentation system. The standard associative retrieval techniques are first briefly reviewed. A computer experiment is then described which tends to confirm the hypothesis that documents ex- hibiting similar citation sets also deal with similar subject matter. Finally, a fully auto- matic document retrieval system is proposed which uses bibliographic information in addi- tion to other standard criteria for the identification of document content, and for the detection of relevant information.