Conference Paper

The Elephant in the Room: Ten Challenges of Computational Detection of Rhetorical Figures

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Preprint
Full-text available
Rhetorical figures play an important role in our communication. They are used to convey subtle, implicit meaning, or to emphasize statements. We notice them in hate speech, fake news, and propaganda. By improving the systems for computational detection of rhetorical figures, we can also improve tasks such as hate speech and fake news detection, sentiment analysis, opinion mining, or argument mining. Unfortunately, there is a lack of annotated data, as well as qualified annotators that would help us build large corpora to train machine learning models for the detection of rhetorical figures. The situation is particularly difficult in languages other than English, and for rhetorical figures other than metaphor, sarcasm, and irony. To overcome this issue, we develop a web application called "Find your Figure" that facilitates the identification and annotation of German rhetorical figures. The application is based on the German Rhetorical ontology GRhOOT which we have specially adapted for this purpose. In addition, we improve the user experience with Retrieval Augmented Generation (RAG). In this paper, we present the restructuring of the ontology, the development of the web application, and the built-in RAG pipeline. We also identify the optimal RAG settings for our application. Our approach is one of the first to practically use rhetorical ontologies in combination with RAG and shows promising results.
Conference Paper
Full-text available
This paper deals with the different methods, particularly statistical analysis and text mining, which help in stylistic research. The examination of the lexical and semantic features of meiosis and litotes in the novel The Catcher in the Rye by Jerome David Salinger is presented as an example. The examination in question has been carried out with the help of the programming language R. To have a good-quality research, the specific features of litotes and meiosis have been explored thoughtfully. Therefore, the broad range of possible scientific views has been described, and, subsequently, we have made a general assumption of typical linguistic patterns of meiosis and litotes. Using the obtained insights, it is possible to apply different tools of text mining in stylistic research. The present paper outlines in detail the creation of concordances, word frequencies and sentiment analysis. To reach our goal, we have used the programming language R and the R packages which are distributed by members of the community. In the scope of concordances, the concept of Key Word in Context has been discussed as well, and the advantages of using concordances in stylistic research have been introduced. The possible implementation of statistical analysis in the research of litotes has been proposed and discussed. Within the framework of sentiment analysis, we have focused on the negation, and how it affects the opinion orientation. Thus, the present paper also aims to validate the importance of litotes in sentiment analysis, as litotes are directly linked to the effects of negation. The results of each stage of the research have been provided and meticulously discussed.
Conference Paper
Full-text available
We study the usefulness of hateful metaphors as features for the identification of the type and target of hate speech in Dutch Facebook comments. For this purpose, all hateful metaphors in the Dutch LiLaH corpus were annotated and interpreted in line with Conceptual Metaphor Theory and Critical Metaphor Analysis. We provide SVM and BERT/RoBERTa results, and investigate the effect of different metaphor information encoding methods on hate speech type and target detection accuracy. The results of the conducted experiments show that hateful metaphor features improve model performance for the both tasks. To our knowledge, it is the first time that the effectiveness of hateful metaphors as an information source for hate speech classification is investigated.
Article
Full-text available
In the last decade, the problem of computational metaphor processing has garnered immense attention from the domains of computational linguistics and cognition. A wide panorama of approaches, ranging from a hand-coded rule system to deep learning techniques, have been proposed to automate different aspects of metaphor processing. In this article, we systematically examine the major theoretical views on metaphor and present their classification. We discuss the existing literature to provide a concise yet representative picture of computational metaphor processing. We conclude the article with possible research directions.
Article
Full-text available
Analysis of the functional equivalence of an original text and its translation based on the achievement of rhythm equivalence is an extremely important task of modern linguistics. Moreover, the rhythm component is an integral part of functional equivalence that cannot be achieved without communication of rhythm figures of the text. To analyze rhythm figures in an original literary text and its translation, the authors developed the ProseRhythmDetector software tool that allows to find and visualize lexical and syntactic figures in English- and Russian-language prose texts: anaphora, epiphora, symploce, anadiplosis, epanalepsis, reduplication, epistrophe, polysyndeton, and aposiopesis. The goal of this work is to present the results of ProseRhythmDetector testing on two works by English authors and their translations into Russian: Ch. Bronte “Villette” and I. Murdoch “The Black Prince”. Basing on the results of the tool, the authors compared rhythm figures in an original text and its translation both in aspects of the rhythm and their contexts. This experiment made it possible to identify how the features of the author’s style are communicated by the translator, to detect and explain cases of mismatch of rhythm figures in the original and translated texts. The application of the ProseRhythm-Detector software tool made it possible to significantly reduce the amount of linguistsexperts work by automated detection of lexical and syntactic figures with quite high precision (from 62 % to 93 %) for various rhythm figures.
Article
Full-text available
Social media provides a platform for seeking information from a large user base. Information seeking in social media, however, occurs simultaneously with users expressing their viewpoints by making statements. Rhetorical questions have the form of a question but serve the function of a statement and are an important tool employed by users to express their viewpoints. Therefore, rhetorical questions might mislead platforms assisting information seeking in social media. It becomes difficult to identify rhetorical questions as they are not syntactically different from other questions. In this article, we develop a framework to identify rhetorical questions by modeling some motivations of the users to post them. We focus on two motivations of the users drawing from linguistic theories to implicitly convey a message and to modify the strength of a statement previously made. We develop a quantitative framework from these motivations to identify rhetorical questions in social media. We evaluate the framework using two datasets of questions posted on a social media platform Twitter and demonstrate its effectiveness in identifying rhetorical questions. This is the first framework, to the best of our knowledge, to model the possible motivations for posting rhetorical questions to identify them on social media platforms.
Article
Full-text available
The generalised, automated reconstruction of the reasoning structures underlying persuasive communication is an enormously challenging task. While this work in argument mining is increasingly informed by the rich tradition of argumentation studies outside the computational field, the rhetorical perspective on argumentation is thus far largely ignored. To explore the application of rhetorical insights in argument mining, we conduct a pilot study on the connection between rhetorical figures and argumentation structure. Rhetorical figures are linguistic devices that perform a variety of functions in argumentative discourse. The textual form of some of these figures is easy to identify automatically, such that an established connection between the figure and a preponderance of argumentative content would improve the performance of argument mining techniques. Furthermore, the automated mining of rhetorical figures could be used as an empirical, corpus-based testing ground for the claims made about these figures in the rhetorical literature. In the pilot study, we explore the connection between eight rhetorical figures the forms of which we expect to be relatively easy to identify computationally, and argumentation structure (concretely, we consider the six schemes 'anadiplosis', 'epanaphora', 'epistrophe', 'epizeuxis', 'eutrepismus', and 'polyptoton', and the two tropes 'antithesis' and 'dirimens copulatio', and relate their occurrences to relations of inference and conflict). The data of the study is collected in the MM2012c corpus of 39,694 words of argumentatively annotated transcripts from the BBC Radio 4's MoralMaze discussion program.We show that some of the figures indeed correspond to passages of high argumentative density, relative to the text as a whole..
Article
Full-text available
This paper surveys ontological modeling of rhetorical concepts, developed for use in argument mining and other applications of computational rhetoric, projecting their future directions. We include ontological models of argument schemes applying Rhetorical Structure Theory (RST); the RhetFig proposal for modeling; the related RetFig Ontology of Rhetorical Figures for Serbian (developed by two of the authors); and the Lassoing Rhetoric project (developed by another of the authors). The Lassoing Rhetoric venture is interesting for its multifaceted approach to linguistic devices, prominently including rhetorical figures, but also RST relations and stylistic models, like the use of historic present. This application takes a natural language text input and uses syntactic parsing tools to produce a knowledge base of linguistic entities using references to an OWL ontological framework, locating these devices using Semantic Web Rule Language (SWRL) logic rules. The paper also reports on a similar approach in research into detecting ironic tweets in a Serbian twitter corpus. The rhetorical schemes used for argument mining are also presented, as well as some suggestions for novel argument schemes based on the ontological approach to rhetorical figuration.
Conference Paper
Full-text available
The paper presents RetFig, a formal domain ontology of rhetorical figures for Serbian. This ontology is one of the necessary steps in developing tools for Natural Language Processing in the Serbian language, especially for tools pertinent to discourse analysis, sentiment analysis and opinion mining. The RetFig ontology was developed taking into account a plethora of rhetorical figures in the morphologically rich Serbian language, as well as in regard to various classifications of rhetorical figures that exist. We propose a system of linguistic classes and properties that are best suited for this ontology, as well as some of the possible usages for this particular ontology of rhetorical figures. Keywords: domain ontology, rhetorical figures, Semantic web
Article
Full-text available
Metaphor is highly frequent in language, which makes its computational processing indispensable for real-world NLP applications addressing semantic tasks. Previous approaches to metaphor modeling rely on task-specific hand-coded knowledge and operate on a limited domain or a subset of phenomena. We present the first integrated open-domain statistical model of metaphor processing in unrestricted text. Our method first identifies metaphorical expressions in running text and then paraphrases them with their literal paraphrases. Such a text-to-text model of metaphor interpretation is compatible with other NLP applications that can benefit from metaphor resolution. Our approach is minimally supervised, relies on the state-of-the-art parsing and lexical acquisition technologies (distributional clustering and selectional preference induction), and operates with a high accuracy.
Conference Paper
GRhOOT, the German RhetOrical OnTology, is a domain ontology of 110 rhetorical figures in the German language. The overall goal of building an ontology of rhetorical figures in German is not only the formal representation of different rhetorical figures, but also allowing for their easier detection, thus improving sentiment analysis, argument mining, detection of hate speech and fake news, machine translation, and many other tasks in which recognition of non-literal language plays an important role. The challenge of building such ontologies lies in classifying the figures and assigning adequate characteristics to group them, while considering their distinctive features. The ontology of rhetorical figures in the Serbian language was used as a basis for our work. Besides transferring and extending the concepts of the Serbian ontology, we ensured completeness and consistency by using description logic and SPARQL queries. Furthermore, we show a decision tree to identify figures and suggest a usage scenario on how the ontology can be utilized to collect and annotate data.
Conference Paper
Automatic detection of stylistic devices is an important tool for literary studies, e.g., for stylometric analysis or argument mining. A particularly striking device is the rhetorical figure called chiasmus, which involves the inversion of semantically or syntactically related words. Existing works focus on a special case of chiasmi that involve identical words in an A B B A pattern, so-called antimetaboles. In contrast, we propose an approach targeting the more general and challenging case A B B’ A’, where the words A, A’ and B, B’ constituting the chiasmus do not need to be identical but just related in meaning. To this end, we generalize the established candidate phrase mining strategy from antimetaboles to general chiasmi and propose novel features based on word embeddings and lemmata for capturing both semantic and syntactic information. These features serve as input for a logistic regression classifier, which learns to distinguish between rhetorical chiasmi and coincidental chiastic word orders without special meaning. We evaluate our approach on two datasets consisting of classical German dramas, four texts with annotated chiasmi and 500 unannotated texts. Compared to previous methods for chiasmus detection, our novel features improve the average precision from 17% to 28% and the precision among the top 100 results from 13% to 35%.
Conference Paper
Ploke is a rhetorical device of lexical repetition, with multiple variations contingent on place of occurrence. It is widespread in all natural and artificial languages because it manages stability of reference and predication. Syllogisms, for instance, are heavily dependent on positional repetition. Ploke also influences the reader’s/hearer’s attention because of its appeal to neurocognitive affinities. A formal knowledge representation of ploke is therefore valuable for any AI/NLP system. This paper proposes an ontological model for ploke. We discuss components of different types of plokes and rhetorical figures in general, in terms of their form, their function, and the associated neurocognitive affinities that affect attention.
Conference Paper
Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
Article
(An updated version of this paper has been 'accepted with minor revisions' at ACM Computing Surveys journal) Automatic detection of sarcasm has witnessed interest from the sentiment analysis research community. With diverse approaches, datasets and analyses that have been reported, there is an essential need to have a collective understanding of the research in this area. In this survey of automatic sarcasm detection, we describe datasets, approaches (both supervised and rule-based), and trends in sarcasm detection research. We also present a research matrix that summarizes past work, and list pointers to future work.
Article
Irony is a fundamental rhetorical device. It is a uniquely human mode of communication, curious in that the speaker says something other than what he or she intends. Recently, computationally detecting irony has attracted attention from the natural language processing (NLP) and machine learning (ML) communities. While some progress has been made toward this end, I argue that current machine learning methods rely too heavily on shallow, unstructured, syntactic modeling of text to consistently discern ironic intent. Irony detection is an interesting machine learning problem because, in contrast to most text classification tasks, it requires a semantics that cannot be inferred directly from word counts over documents alone. To support this position, I survey the large body of existing philosophical/literary work investigating ironic communication. I then survey more recent computational efforts to operationalize irony detection in the fields of NLP and ML. I identify the disparities of the latter with respect to the former. Specifically, I highlight a major conceptual problem in all existing computational models of irony: none maintain an explicit model of the speaker/environment. I argue that without such an internal model of the speaker, irony detection is hopeless, as this model is necessary to represent expectations, which play a key role in ironic communication. I sketch possible means of embedding such models into computational approaches to irony detection. In particular, I introduce the pragmatic context model, which looks to operationalize computationally existing theories of irony. This work is a step toward unifying work on irony from literary, empirical and philosophical perspectives with modern computational models.
Multilingual domain ontologies of rhetorical figures and their applications
  • Ramona Kühn
  • Jelena Mitrović
Ramona Kühn and Jelena Mitrović. 2023. Multilingual domain ontologies of rhetorical figures and their applications. In UniDive 1st General Meeting.
Status Quo der Entwicklungen von Ontologien Rhetorischer Figuren in Englisch, Deutsch und Serbisch
  • Ramona Kühn
  • Jelena Mitrović
Ramona Kühn and Jelena Mitrović. 2024. Status Quo der Entwicklungen von Ontologien Rhetorischer Figuren in Englisch, Deutsch und Serbisch. In Book of Abstracts -DHd2024. Zenodo.
Hidden in plain sight: Can german wiktionary and wordnets facilitate the detection of antithesis?
  • Ramona Kühn
  • Jelena Mitrović
  • Michael Granitzer
Ramona Kühn, Jelena Mitrović, and Michael Granitzer. 2023. Hidden in plain sight: Can german wiktionary and wordnets facilitate the detection of antithesis? In Proceedings of the 12th Global Wordnet Conference, pages 106-116.
Using pre-trained language models in an end-to-end pipeline for antithesis detection
  • Ramona Kühn
  • Khouloud Saadi
  • Jelena Mitrović
  • Michael Granitzer
Ramona Kühn, Khouloud Saadi, Jelena Mitrović, and Michael Granitzer. 2024. Using pre-trained language models in an end-to-end pipeline for antithesis detection. In Proceedings of the 14th Language Resources and Evaluation Conference. European Language Resources Association.
Automatic detection of zeugma
  • Helena Medková
Helena Medková. 2020. Automatic detection of zeugma. In RASLAN, pages 79-86.
Kelabteam: A statistical approach on figurative language sentiment analysis in twitter
  • Trung Hoang Long Nguyen
  • Dosam Duc Nguyen
  • Jason J Hwang
  • Jung
Hoang Long Nguyen, Trung Duc Nguyen, Dosam Hwang, and Jason J Jung. 2015. Kelabteam: A statistical approach on figurative language sentiment analysis in twitter. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pages 679-683.
  • Hamed Yaghoobian
  • Khaled Hamid R Arabnia
  • Rasheed
Hamed Yaghoobian, Hamid R Arabnia, and Khaled Rasheed. 2021. Sarcasm detection: A comparative study. arXiv preprint arXiv:2107.02276.
Configure: Exploring discourse-level chinese figures of speech
  • Dawei Zhu
  • Qiusi Zhan
  • Zhejian Zhou
  • Yifan Song
  • Jiebin Zhang
  • Sujian Li
Dawei Zhu, Qiusi Zhan, Zhejian Zhou, Yifan Song, Jiebin Zhang, and Sujian Li. 2022. Configure: Exploring discourse-level chinese figures of speech. arXiv preprint arXiv:2209.07678.