Figure 3 - uploaded by Prodromos Malakasiotis
Content may be subject to copyright.
Performance of our method's SVR ranking component with (SVR-REC) and without (SVR-BASE) the additional features of the paraphrase recognizer.

Performance of our method's SVR ranking component with (SVR-REC) and without (SVR-BASE) the additional features of the paraphrase recognizer.

Source publication
Conference Paper
Full-text available
We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also...

Context in source publication

Context 1
... The ρ 2 co- efficient shows how well the scores returned by the SVR are correlated with the desired scores y(x i ); the higher the ρ 2 the higher the agreement. Figure 3 proceed to investigate how well our overall generate- and-rank method (with SVR-REC) compares against a state of the art paraphrase generator. ...

Similar publications

Chapter
Full-text available
Bibliometrics started as a fundamentally quantitative approach which has now begun to integrate qualitative aspects. In this chapter, I present a mixed-methods approach which brings together bibliometrics and Philip Mayring’s qualitative content analysis. With this methodology I tackle the issue of how interculturality has been incorporated into re...
Article
Full-text available
There is a huge amount of research work focusing on the searching, retrieval and re-ranking of images in the image database. The diverse and scattered work in this domain needs to be collected and organized for easy and quick reference. Relating to the above context, this paper gives a brief overview of various image retrieval and re-ranking t...
Conference Paper
Full-text available
Many researchers have different concepts of what historical ecology entails or addresses. Spatial and temporal scales vary considerably. Biological, non-biological and resource environment parameters are diverse. The commonality, however, is the analysis of human-environment relations over time with a keen interest in environmental and cultural imp...
Article
Full-text available
Mixed methods research teams have garnered increased attention for their leveraging of diverse disciplinary and methodological expertise in pursuit of complex problems. We advance our theoretical viewpoint of integrative mixed methods research teamwork as necessary with empirical evidence demonstrating the equipping mixed methods researchers to stu...

Citations

... A variety of paraphrase generation techniques have been proposed and studied (Barzilay and Lee 2003;Bannard and Callison-Burch 2005;Androutsopoulos and Malakasiotis 2010;Madnani and Dorr 2010;Malakasiotis and Androutsopoulos 2011;Li et al. 2019). Recently, Gupta et al. (2018) use a variational autoencoder to generate paraphrases from sentences and Li et al. (2018) use deep reinforcement learning to generate paraphrases. ...
Article
We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2% in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.1
... Less studied is the analysis of changes between drafts – a comparison of revisions and the properties of the differences. Research on this topic can support applications involing revision analysis (Zhang and Litman, 2015), paraphrase (Malakasiotis and Androutsopoulos, 2011) and correction detection (Swanson and Yamangil, 2012;Xue and Hwa, 2014). Although there are some corpora resources for NLP research on writing comparisons, most tend to be between individual sentences/phrases for tasks such as paraphrase comparison (Dolan and Brockett, 2005;Tan and Lee, 2014) or grammar error correction (Dahlmeier et al., 2013;Yannakoudakis et al., 2011). ...
Article
Full-text available
Paraphrase generation is an important task in Natural Language Processing (NLP) and is successfully applied in various applications such as question-answering, information retrieval & extraction, text summarization and augmentation of machine translation training data. A lot of research has been carried out on paraphrase generation but in the language of English only. However, no approach is available for paraphrase generation in Punjabi Language. Hence, this paper aims to plug in the gap by developing a paraphrase generation and evaluation model for the language of Punjabi. The proposed approach is divided into two phases: paraphrase generation and evaluation. To generate paraphrases, the current state-of-the-art transformer with improved encoder is being used as transformers can learn long-term dependencies. For evaluation, the sentence embeddings are used to check whether the generated paraphrase is similar to the given sentence or not. The sentence embeddings have been created using two approaches: Seq2Seq with attention and transformers. The proposed model is compared with the currently available state-of-the-art models on Quora Question pair dataset. However, for Punjabi, the proposed approach is evaluated on three datasets: news headlines, the sentential dataset from news articles and the third dataset is the translation of Quora Question pair into Punjabi. The automatic evaluation metrics BLEU, METEOR and ROUGE are used for depth evaluation along with human judgments. The proposed approach is straightforward and successfully applies for augmenting machine translation training data and sentence compression. The proposed approach establishes a new baseline for paraphrase generation in Indian regional languages in the future.
Article
Paraphrasing is an act of generating similar text to the source text with different expressions. Paraphrase generation is an important task in various Natural Language Processing applications such as machine translation, question-answering, information re
Article
Full-text available
Techniques for generating and recognizing paraphrases, i.e., semantically equivalent expressions, play an important role in a wide range of natural language processing tasks. In the last decade, the task of automatic acquisition of subsentential paraphrases, i.e., words and phrases with (approximately) the same meaning, has been drawing much attention in the research community. The core problem is to obtain paraphrases of high quality in large quantity. This article presents a method for tackling this issue by systematically expanding an initial seed lexicon made up of high-quality paraphrases. This involves automatically capturing morpho-semantic and syntactic generalizations within the lexicon and using them to leverage the power of large-scale monolingual data. Given an input set of paraphrases, our method starts by inducing paraphrase patterns that constitute generalizations over corresponding pairs of lexical variants, such as “amending” and “amendment,” in a fully empirical way. It then searches large-scale monolingual data for new paraphrases matching those patterns. The results of our experiments on English, French, and Japanese demonstrate that our method manages to expand seed lexicons by a large multiple. Human evaluation based on paraphrase substitution tests reveals that the automatically acquired paraphrases are also of high quality.
Article
Query paraphrasing aims to construct a better formulation of user queries in order to enhance retrieval. Formulating search queries remains complicated for a subset of Web users. In a typical situation, a user will not receive satisfactory results from the submitted search query and will subsequently attempt different query paraphrases. The Arabic vocabulary is rich in synonyms and hyponyms. Such richness of synonyms makes automation of the paraphrasing technique crucial for Arabic information retrieval systems in order to facilitate the process of paraphrasing synonyms. In this article, we propose an enhancement for Arabic information retrieval using a query paraphrasing technique. Furthermore, two query paraphrasing optimization techniques are proposed to overcome the time complexity and exhaustive calculation of existing query paraphrasing techniques. One of these techniques uses a genetic algorithm (GA–QP), and the other employs the artificial bee colony algorithm (ABC–QP). The performance of these two algorithms is compared. ABC–QP shows an improvement in Arabic information retrieval performance compared with the genetic algorithm query paraphrasing system.