Preprint

Computational Approaches to the Detection of Lesser-Known Rhetorical Figures: A Systematic Survey and Research Challenges

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Rhetorical figures play a major role in our everyday communication as they make text more interesting, more memorable, or more persuasive. Therefore, it is important to computationally detect rhetorical figures to fully understand the meaning of a text. We provide a comprehensive overview of computational approaches to lesser-known rhetorical figures. We explore the linguistic and computational perspectives on rhetorical figures, emphasizing their significance for the domain of Natural Language Processing. We present different figures in detail, delving into datasets, definitions, rhetorical functions, and detection approaches. We identified challenges such as dataset scarcity, language limitations, and reliance on rule-based methods.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Metaphors are figurative expressions frequently appearing daily. Given its significance in downstream natural language processing tasks such as machine translation and sentiment analysis, computational metaphor processing has led to an upsurge in the community. The progress of Artificial Intelligence has incentivized several technological tools and frameworks in this domain. This article aims to comprehensively summarize and categorize previous computational metaphor processing approaches regarding metaphor identification, interpretation, generation, and application. Though studies on metaphor identification have made significant progress, metaphor understanding, conceptual metaphor processing, and metaphor generation still need in-depth analysis. We hope to identify future directions for prospective researchers based on comparing the strengths and weaknesses of the previous works.
Conference Paper
Full-text available
This paper presents our preliminary work-in-progress on automatic detection of the rhetorical figure, antithesis, including the challenges we have encountered so far.
Chapter
Full-text available
Sentences where two verbs share a single argument represent a complex and highly ambiguous syntactic phenomenon. The argument sharing relations must be considered during the detection process from both a syntactic and semantic perspective. Such expressions can represent ungrammatical constructions, denoted as zeugma, or idiomatic elliptical phrase combinations. Rule-based classification methods prove ineffective because of the necessity to reflect meaning relations of the analyzed sentence constituents. This paper presents the development and evaluation of ZeugBERT, a language model tuned for the sentence classification task using a pre-trained Czech transformer model for language representation. The model was trained with a newly prepared dataset, which is also published with this paper, of 7,849 Czech sentences to classify Czech syntactic structures containing coordinated verbs that share a valency argument (or an optional adjunct) in the context of coordination. ZeugBERT here reaches 88% of test set accuracy. The text describes the process of the new dataset creation and annotation, and it offers a detailed error analysis of the developed classification model.
Article
Full-text available
Abusive language is becoming a problematic issue for our society. The spread of messages that reinforce social and cultural intolerance could have dangerous effects in victims’ life. State-of-the-art technologies are often effective on detecting explicit forms of abuse, leaving unidentified the utterances with very weak offensive language but a strong hurtful effect. Scholars have advanced theoretical and qualitative observations on specific indirect forms of abusive language that make it hard to be recognized automatically. In this work, we propose a battery of statistical and computational analyses able to support these considerations, with a focus on creative and cognitive aspects of the implicitness, in texts coming from different sources such as social media and news. We experiment with transformers, multi-task learning technique, and a set of linguistic features to reveal the elements involved in the implicit and explicit manifestations of abuses, providing a solid basis for computational applications.
Conference Paper
Full-text available
The proliferation of mis/disinformation in the media has had a profound impact on social discourse and politics in the United States. Some argue that democracy itself is threatened by the lies, chicanery, and flimflam - in short, propaganda - emanating from the highest pulpits, podiums, and soapboxes in the land. Propaganda differs from mis/disinformation in that it need not be false, but instead, it relies on rhetorical devices which aim to manipulate the audience into a particular belief or behavior. While falsehoods can be debunked, albeit with disputable efficacy, beliefs are harder to cut through. The detection of “Fake News” has received a lot of attention recently with some impressive results, however, propaganda detection remains challenging. This proposal aims to further the research into propaganda detection by constructing an ontology with this specific goal in mind, while drawing from multiple disciplines within Computer Science and the Social Sciences.
Conference Paper
Full-text available
This paper deals with the different methods, particularly statistical analysis and text mining, which help in stylistic research. The examination of the lexical and semantic features of meiosis and litotes in the novel The Catcher in the Rye by Jerome David Salinger is presented as an example. The examination in question has been carried out with the help of the programming language R. To have a good-quality research, the specific features of litotes and meiosis have been explored thoughtfully. Therefore, the broad range of possible scientific views has been described, and, subsequently, we have made a general assumption of typical linguistic patterns of meiosis and litotes. Using the obtained insights, it is possible to apply different tools of text mining in stylistic research. The present paper outlines in detail the creation of concordances, word frequencies and sentiment analysis. To reach our goal, we have used the programming language R and the R packages which are distributed by members of the community. In the scope of concordances, the concept of Key Word in Context has been discussed as well, and the advantages of using concordances in stylistic research have been introduced. The possible implementation of statistical analysis in the research of litotes has been proposed and discussed. Within the framework of sentiment analysis, we have focused on the negation, and how it affects the opinion orientation. Thus, the present paper also aims to validate the importance of litotes in sentiment analysis, as litotes are directly linked to the effects of negation. The results of each stage of the research have been provided and meticulously discussed.
Conference Paper
Full-text available
We study the usefulness of hateful metaphors as features for the identification of the type and target of hate speech in Dutch Facebook comments. For this purpose, all hateful metaphors in the Dutch LiLaH corpus were annotated and interpreted in line with Conceptual Metaphor Theory and Critical Metaphor Analysis. We provide SVM and BERT/RoBERTa results, and investigate the effect of different metaphor information encoding methods on hate speech type and target detection accuracy. The results of the conducted experiments show that hateful metaphor features improve model performance for the both tasks. To our knowledge, it is the first time that the effectiveness of hateful metaphors as an information source for hate speech classification is investigated.
Conference Paper
Full-text available
The paper is devoted to automatic detection of rhythm in fiction and investigation of how rhythm of prosaic texts changed over 19th-21st centuries, based on results of such detection. The authors developed algorithms, which extract rhythm figures related to word repetitions (anaphora, epiphora, polysyndeton, etc.), and visualized their statistical features in plots and heat maps by decades on the material of British and Russian literature. The experiments allowed to find rhythm changes over periods and give interpretation of their reasons from a linguistic point of view.
Article
Full-text available
In the last decade, the problem of computational metaphor processing has garnered immense attention from the domains of computational linguistics and cognition. A wide panorama of approaches, ranging from a hand-coded rule system to deep learning techniques, have been proposed to automate different aspects of metaphor processing. In this article, we systematically examine the major theoretical views on metaphor and present their classification. We discuss the existing literature to provide a concise yet representative picture of computational metaphor processing. We conclude the article with possible research directions.
Article
Full-text available
Analysis of the functional equivalence of an original text and its translation based on the achievement of rhythm equivalence is an extremely important task of modern linguistics. Moreover, the rhythm component is an integral part of functional equivalence that cannot be achieved without communication of rhythm figures of the text. To analyze rhythm figures in an original literary text and its translation, the authors developed the ProseRhythmDetector software tool that allows to find and visualize lexical and syntactic figures in English- and Russian-language prose texts: anaphora, epiphora, symploce, anadiplosis, epanalepsis, reduplication, epistrophe, polysyndeton, and aposiopesis. The goal of this work is to present the results of ProseRhythmDetector testing on two works by English authors and their translations into Russian: Ch. Bronte “Villette” and I. Murdoch “The Black Prince”. Basing on the results of the tool, the authors compared rhythm figures in an original text and its translation both in aspects of the rhythm and their contexts. This experiment made it possible to identify how the features of the author’s style are communicated by the translator, to detect and explain cases of mismatch of rhythm figures in the original and translated texts. The application of the ProseRhythm-Detector software tool made it possible to significantly reduce the amount of linguistsexperts work by automated detection of lexical and syntactic figures with quite high precision (from 62 % to 93 %) for various rhythm figures.
Article
Full-text available
With the rapid development of the internet, social media has become an essential tool for getting information, and attracted a large number of people join the social media platforms because of its low cost, accessibility and amazing content. It greatly enriches our life. However, its rapid development and widespread also have provided an excellent convenience for the range of fake news, people are constantly exposed to fake news and suffer from it all the time. Fake news usually uses hyperbole to catch people’s eyes with dishonest intention. More importantly, it often misleads the reader and causes people to have wrong perceptions of society. It has the potential for negative impacts on society and individuals. Therefore, it is significative research on detecting fake news. In the paper, we built a model named SMHA-CNN (Self Multi-Head Attention-based Convolutional Neural Networks) that can judge the authenticity of news with high accuracy based only on content by using convolutional neural networks and self multi-head attention mechanism. In order to prove its validity, we conducted experiments on a public dataset and achieved a precision rate of 95.5% with a recall rate of 95.6% under the 5-fold cross-validation. Our experimental result indicates that the model is more effective at detecting fake news.
Article
Full-text available
As online content continues to grow, so does the spread of hate speech. We identify and examine challenges faced by online automatic approaches for hate speech detection in text. Among these difficulties are subtleties in language, differing definitions on what constitutes hate speech, and limitations of data availability for training and testing of these systems. Furthermore, many recent approaches suffer from an interpretability problem—that is, it can be difficult to understand why the systems make the decisions that they do. We propose a multi-view SVM approach that achieves near state-of-the-art performance, while being simpler and producing more easily interpretable decisions than neural methods. We also discuss both technical and practical challenges that remain for this task.
Article
Full-text available
There is a driving need computationally to interrogate large bodies of text for a range of non-denotative meaning (e.g., to plot chains of reasoning, detect sentiment, diagnose genre, and so forth). But such meaning has always proven computationally allusive. It is often implicit, ‘hidden’ meaning, evoked by linguistic cues, stylistic arrangement, or conceptual structure – features that have hitherto been difficult for Natural Language Processing systems to recognize and use. Non-denotative textual effects are the historical concern of rhetorical studies, and we have turned to rhetoric in order to find new ways to advance NLP, especially for sophisticated tasks like Argument Mining. This paper highlights certain rhetorical devices that encode levels of meaning that have been overlooked in Computational Linguistics generally and Argument Mining particularly, and yet lend themselves to automated detection. These devices are the linguistic configurations known as Rhetorical Figures. We argue for the importance of these devices for Argument Mining, especially in collocations, and we present an XML annotation scheme for Rhetorical Figures to make figuration more tractable for computational approaches, particularly with an eye on the improvements they offer Argument Mining. We also discuss the intellectual and technical challenges involved in figure annotation and the implications for Machine Learning.
Article
Full-text available
Rhetorical figures are valuable linguistic data for literary analysis. In this article, we target the detection of three rhetorical figures that belong to the family of repetitive figures: chiasmus (I go where I please, and I please where I go.), epanaphora also called anaphora (“Poor old European Commission! Poor old European Council.”) and epiphora (“This house is mine. This car is mine. You are mine.”). Detecting repetition of words is easy for a computer but detecting only the ones provoking a rhetorical effect is difficult because of many accidental and irrelevant repetitions. For all figures, we train a log-linear classifier on a corpus of political debates. The corpus is only very partially annotated, but we nevertheless obtain good results, with more than 50% precision for all figures. We then apply our models to totally different genres and perform a comparative analysis, by comparing corpora of fiction, science and quotes. Thanks to the automatic detection of rhetorical figures, we discover that chiasmus is more likely to appear in the scientific context whereas epanaphora and epiphora are more common in fiction.
Conference Paper
Full-text available
Figurative language identification is a hard problem for computers. In this paper we handle a subproblem: chiasmus detection. By chiasmus we understand a rhetorical figure that consists in repeating two el- ements in reverse order: “First shall be last, last shall be first”. Chiasmus detec- tion is a needle-in-the-haystack problem with a couple of true positives for millions of false positives. Due to a lack of anno- tated data, prior work on detecting chias- mus in running text has only considered hand-tuned systems. In this paper, we ex- plore the use of machine learning on a partially annotated corpus. With only 31 positive instances and partial annotation of negative instances, we manage to build a system that improves both precision and recall compared to a hand-tuned system using the same features. Comparing the feature weights learned by the machine to those give by the human, we discover common characteristics of chiasmus.
Article
Full-text available
Social media provides a platform for seeking information from a large user base. Information seeking in social media, however, occurs simultaneously with users expressing their viewpoints by making statements. Rhetorical questions have the form of a question but serve the function of a statement and are an important tool employed by users to express their viewpoints. Therefore, rhetorical questions might mislead platforms assisting information seeking in social media. It becomes difficult to identify rhetorical questions as they are not syntactically different from other questions. In this article, we develop a framework to identify rhetorical questions by modeling some motivations of the users to post them. We focus on two motivations of the users drawing from linguistic theories to implicitly convey a message and to modify the strength of a statement previously made. We develop a quantitative framework from these motivations to identify rhetorical questions in social media. We evaluate the framework using two datasets of questions posted on a social media platform Twitter and demonstrate its effectiveness in identifying rhetorical questions. This is the first framework, to the best of our knowledge, to model the possible motivations for posting rhetorical questions to identify them on social media platforms.
Article
Full-text available
Litotes, often confused with meiosis and understatement, has long suffered neglect. By comparing synonymous key words in previous definitions, this essay defines litotes as "a trope in which an affirmative is expressed by the negation of its opposite," and for the first time classifies litotes into three subtypes based upon Aristotle's study of opposition: Contradictory, contrary and relative. Focusing particularly on the strong contradictory type of litotes and its realization in The Analects of Confucius, with its nearly one hundred double negation litotes, we find they prominently serve the logical functions of achieving high probability, providing a sophisticated major premise, and elaborating strict definitions. Further, the essay argues that this strong argumentative function in logic leads to litotes' epistemic power, a promising dimension for future research.
Article
Full-text available
The generalised, automated reconstruction of the reasoning structures underlying persuasive communication is an enormously challenging task. While this work in argument mining is increasingly informed by the rich tradition of argumentation studies outside the computational field, the rhetorical perspective on argumentation is thus far largely ignored. To explore the application of rhetorical insights in argument mining, we conduct a pilot study on the connection between rhetorical figures and argumentation structure. Rhetorical figures are linguistic devices that perform a variety of functions in argumentative discourse. The textual form of some of these figures is easy to identify automatically, such that an established connection between the figure and a preponderance of argumentative content would improve the performance of argument mining techniques. Furthermore, the automated mining of rhetorical figures could be used as an empirical, corpus-based testing ground for the claims made about these figures in the rhetorical literature. In the pilot study, we explore the connection between eight rhetorical figures the forms of which we expect to be relatively easy to identify computationally, and argumentation structure (concretely, we consider the six schemes 'anadiplosis', 'epanaphora', 'epistrophe', 'epizeuxis', 'eutrepismus', and 'polyptoton', and the two tropes 'antithesis' and 'dirimens copulatio', and relate their occurrences to relations of inference and conflict). The data of the study is collected in the MM2012c corpus of 39,694 words of argumentatively annotated transcripts from the BBC Radio 4's MoralMaze discussion program.We show that some of the figures indeed correspond to passages of high argumentative density, relative to the text as a whole..
Conference Paper
Full-text available
Social media provides a platform for seeking information from a large user base. Information seeking in social media, however, occurs simultaneously with users expressing their viewpoints by making statements. Rhetorical questions have the form of a question but serve the function of a statement and might mislead platforms assisting information seeking in social media. It becomes difficult to identify rhetorical questions as they are not syntactically different from other questions. In this paper, we develop a framework to identify rhetorical questions by modeling the motivations of the users to post them. We focus on one motivation of the users drawing from linguistic theories, to implicitly convey a message. We develop a framework from this motivation to identify rhetorical questions in social media and evaluate the framework using questions posted on Twitter. This is the first framework to model the motivations for posting rhetorical questions to identify them on social media platforms.
Article
Zeugma (“The storm sank my boat and my dreams”) is a well-recognized figure of speech whose mechanism of operation is less well understood. We suggest treating zeugma as a breach of syntactic iconicity: the syntactic form of the coordinative construction statement implies an equivalence or semantic proximity between the two objects of the verb (boat and dreams), while the objects supplied are semantically very distant. Unlike nominal metaphors and similes, in zeugmas two metaphorically-related, nonsymmetrical objects are put in syntactically symmetrical positions. This feature, the breach of iconicity, registers as a surprise, an effect wholly different from that of metaphors and similes. Seeing zeugma in these terms makes it possible not just to explain its functioning beyond broad pronouncements about yoking together different items, but to tease apart syntactic and semantic factors that contribute to the level of the breach of iconicity and subsequently to the zeugma’s strength. Moreover, understanding zeugmas as a surprising breach of iconicity leads to the question of how this breach may be accommodated or made sense of. In the second part of the essay, we introduce three types of accommodation strategies, each with a distinct focus: the language, the objects, and the speaker.
Chapter
Rhetorical figures are form/function alignments in which the form (1) serves to convey the function(s) and (2) supports their computational detection; therefore, (3) they are particularly rich for various text mining activities and other Natural Language Understanding purposes. The figures which are especially valuable in these ways are known as schemes, figures that are defined by their material (phonological, orthographical, morpholexical, or syntactic) form, in distinction particularly from tropes, which are defined by their conceptual (semantic) form. Rhetorical schemes, however, have been almost universally ignored by linguists, including computational linguists. This article illustrates form/function alignments for a small handful of rhetorical schemes, with some examples of how they communicate specific meanings. The communicative functions of rhetorical schemes rely not so much on individual schemes as on certain collocations of schemes (sometimes with tropes or other figures as well). These collocations, in turn, are coordinated by linguistic features such that the relevant expressions fit the notion of a construction as understood in the Construction Grammar framework. Examples are drawn from epanalepsis, as well as antimetabole, mesodiplosis, and parison, which collate frequently as a group and also collectively with the trope, antithesis. The communicative functions explored include Semantic-Feature Promotion, Reciprocal Specification, Reciprocal Energy, Irrelevance of Order/Rank, Subclassification, and Reject-Replace. Rhetorical figure detection is also discussed in some detail with respect to figural and grammatical collocation.
Conference Paper
GRhOOT, the German RhetOrical OnTology, is a domain ontology of 110 rhetorical figures in the German language. The overall goal of building an ontology of rhetorical figures in German is not only the formal representation of different rhetorical figures, but also allowing for their easier detection, thus improving sentiment analysis, argument mining, detection of hate speech and fake news, machine translation, and many other tasks in which recognition of non-literal language plays an important role. The challenge of building such ontologies lies in classifying the figures and assigning adequate characteristics to group them, while considering their distinctive features. The ontology of rhetorical figures in the Serbian language was used as a basis for our work. Besides transferring and extending the concepts of the Serbian ontology, we ensured completeness and consistency by using description logic and SPARQL queries. Furthermore, we show a decision tree to identify figures and suggest a usage scenario on how the ontology can be utilized to collect and annotate data.
Article
Metonymy resolution (MR) is a challenging task in the field of natural language processing. The task of MR aims to identify the metonymic usage of a word that employs an entity name to refer to another target entity. Recent BERT-based methods yield state-of-the-art performances. However, they neither make full use of the entity information nor explicitly consider syntactic structure. In contrast, in this paper, we argue that the metonymic process should be completed in a collaborative manner, relying on both lexical semantics and syntactic structure (syntax). This paper proposes a novel approach to enhancing BERT-based MR models with hard and soft syntactic constraints by using different types of convolutional neural networks to model dependency parse trees. Experimental results on benchmark datasets (e.g., ReLocaR , SemEval 2007 and WiMCor ) confirm that leveraging syntactic information into fine pre-trained language models benefits MR tasks.
Conference Paper
Automatic detection of stylistic devices is an important tool for literary studies, e.g., for stylometric analysis or argument mining. A particularly striking device is the rhetorical figure called chiasmus, which involves the inversion of semantically or syntactically related words. Existing works focus on a special case of chiasmi that involve identical words in an A B B A pattern, so-called antimetaboles. In contrast, we propose an approach targeting the more general and challenging case A B B’ A’, where the words A, A’ and B, B’ constituting the chiasmus do not need to be identical but just related in meaning. To this end, we generalize the established candidate phrase mining strategy from antimetaboles to general chiasmi and propose novel features based on word embeddings and lemmata for capturing both semantic and syntactic information. These features serve as input for a logistic regression classifier, which learns to distinguish between rhetorical chiasmi and coincidental chiastic word orders without special meaning. We evaluate our approach on two datasets consisting of classical German dramas, four texts with annotated chiasmi and 500 unannotated texts. Compared to previous methods for chiasmus detection, our novel features improve the average precision from 17% to 28% and the precision among the top 100 results from 13% to 35%.
Conference Paper
Ploke is a rhetorical device of lexical repetition, with multiple variations contingent on place of occurrence. It is widespread in all natural and artificial languages because it manages stability of reference and predication. Syllogisms, for instance, are heavily dependent on positional repetition. Ploke also influences the reader’s/hearer’s attention because of its appeal to neurocognitive affinities. A formal knowledge representation of ploke is therefore valuable for any AI/NLP system. This paper proposes an ontological model for ploke. We discuss components of different types of plokes and rhetorical figures in general, in terms of their form, their function, and the associated neurocognitive affinities that affect attention.
Article
The aim of this paper is to clarify the distinctive and the shared features of the three phenomena: irony, understatement, and litotes. These rhetorical figures have been defined as synonymous, distinct or overlapping in various accounts. This indicates an interrelation but also a need for clearer definitions. Here, each of these rhetorical figures is defined via two jointly necessary conditions. This approach sharpens the categories, enables clear-cut distinctions and helps to explain cases of overlap. German corpus data and examples from the literature as a basis, allow differentiating between cases of understatement as a means of irony, and cases of litotes as a means of understatement. Beyond that, litotes and understatement allow for non-ironic uses. Interestingly, litotic irony is built on litotic understatement. This is due to the overt contrast necessary for both understatement and irony.
Chapter
In recent years widespread rumors and fake news has given rise to many social and political problems. Most of the information today is acquired from digital sources. In Digital media it is difficult to assign accountability to the opinion due to which the data received cannot be authenticated. Lack of constant supervision has motivated the miscreants to spread fake information. Fake news articles that are planted over digital media shares important linguistic features such as immoderate usage of unconfirmed hyperbole and non-verified quotes. It is necessary to invent an automated mechanism to identify fake news and also to minimize its impact by restricting its spread. This survey comprehensively and systematically studies different methodologies in the detection of fake news in digital media. The survey identifies and specifies fundamental theories in Machine Learning, to facilitate and enhance the research of fake news detection. By understanding the different methodologies in fake news studies, we highlight some potential research gaps at the end of this survey.
Article
The frequent usage of figurative language on online social networks, especially on Twitter, has the potential to mislead traditional sentiment analysis and recommender systems. Due to the extensive use of slangs, bashes, flames, and non-literal texts, tweets are a great source of figurative language, such as sarcasm, irony, metaphor, simile, hyperbole, humor, and satire. Starting with a brief introduction of figurative language and its various categories, this article presents an in-depth survey of the state-of-the-art techniques for computational detection of seven different figurative language categories, mainly on Twitter. For each figurative language category, we present details about the characterizing features, datasets, and state-of-the-art computational detection approaches. Finally, we discuss open challenges and future directions of research for each figurative language category.
Article
This article contributes to our knowledge about the prosodic realisation of rhetorical questions (RQs) as compared to information-seeking questions (ISQs). It reports on a production experiment testing the prosody of English wh- and polar RQs and ISQs in a Canadian variety. In previous literature, the contribution of prosody to the distinction between the two illocution types has often been limited to the intonational realisation of the terminus of the utterance, i.e. whether it ends in a rise or a fall. Along with edge tones, we tested other phonological and phonetic parameters. Our results are as follows: (i) The intonational terminus was distinctive only for polar questions (rise vs plateau), not for wh -questions (low throughout). (ii) Moreover, the semantic difference between RQs and ISQs is signalled by pitch accents. It is reflected in nuclear pitch accent type for wh -questions, and accent type and position for polar questions. (iii) Phonetically, RQs are produced with longer constituent durations and – for wh -questions – a softer voice quality in the wh -word. Taken together, several intonational categories and phonetic parameters contribute to the distinction between RQs and ISQs. A simple distinction between rising and falling intonation is in any case insufficient. First view at: http://dx.doi.org/10.1017/S1360674319000157
Article
Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4K sentences), Cochrane reviews (91K sentences), PubMed abstracts (20K sentences), clinical trial texts (48K sentences), and English and Simple English Wikipedia articles for different medical topics (60K and 6K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p<.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes’ classifier achieved the highest accuracy at 77% (9% higher than the majority baseline). NOTE: Download the parser here: http://nlp.lab.arizona.edu/content/resources
Article
(An updated version of this paper has been 'accepted with minor revisions' at ACM Computing Surveys journal) Automatic detection of sarcasm has witnessed interest from the sentiment analysis research community. With diverse approaches, datasets and analyses that have been reported, there is an essential need to have a collective understanding of the research in this area. In this survey of automatic sarcasm detection, we describe datasets, approaches (both supervised and rule-based), and trends in sarcasm detection research. We also present a research matrix that summarizes past work, and list pointers to future work.