Lucie Flek's research while affiliated with Philipps University of Marburg and other places

Publications (25)

Chapter
With synthetic data generation, the required amount of human-generated training data can be reduced significantly. In this work, we explore the usage of automatic paraphrasing models such as GPT-2 and CVAE to augment template phrases for task-oriented dialogue systems while preserving the slots. Additionally, we systematically analyze how far manua...
Preprint
Full-text available
Clinical NLP tasks such as mental health assessment from text, must take social constraints into account - the performance maximization must be constrained by the utmost importance of guaranteeing privacy of user data. Consumer protection regulations, such as GDPR, generally handle privacy by restricting data availability, such as requiring to limi...
Preprint
Studies on interpersonal conflict have a long history and contain many suggestions for conflict typology. We use this as the basis of a novel annotation scheme and release a new dataset of situations and conflict aspect annotations. We then build a classifier to predict whether someone will perceive the actions of one individual as right or wrong i...
Preprint
Large pre-trained neural language models have supported the effectiveness of many NLP tasks, yet are still prone to generating toxic language hindering the safety of their use. Using empathetic data, we improve over recent work on controllable text generation that aims to reduce the toxicity of generated text. We find we are able to dramatically re...
Preprint
Full-text available
Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users wit...
Preprint
Full-text available
Medical diagnosis is the process of making a prediction of the disease a patient is likely to have, given a set of symptoms and observations. This requires extensive expert knowledge, in particular when covering a large variety of diseases. Such knowledge can be coded in a knowledge graph -- encompassing diseases, symptoms, and diagnosis paths. Sin...
Preprint
Full-text available
There is an increasing need for the ability to model fine-grained opinion shifts of social media users, as concerns about the potential polarizing social effects increase. However, the lack of publicly available datasets that are suitable for the task presents a major challenge. In this paper, we introduce an innovative annotated dataset for modeli...
Preprint
Full-text available
The performance cost of differential privacy has, for some applications, been shown to be higher for minority groups; fairness, conversely, has been shown to disproportionally compromise the privacy of members of such groups. Most work in this area has been restricted to computer vision and risk assessment. In this paper, we evaluate the impact of...
Preprint
We introduce the problem of proficiency modeling: Given a user's posts on a social media platform, the task is to identify the subset of posts or topics for which the user has some level of proficiency. This enables the filtering and ranking of social media posts on a given topic as per user proficiency. Unlike experts on a given topic, proficient...
Preprint
Full-text available
Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between the author and the audience can be equally relevant for the sarcasm usage and interpretation. In this work, we propose a framework jointly leveraging (1) a user context from their hi...
Chapter
Natural Language Generation (NLG) has received much attention with rapidly developing models and ever-more available data. As a result, a growing amount of work attempts to personalize these systems for better human interaction experience. Still, diverse sets of research across multiple dimensions and numerous levels of depth exist and are scattere...

Citations

... Based on such reasoning, they argued that since the human annotators need context, then contextual information should also be provided to the machine learning algorithms. Many future studies were encouraged by the findings of Wallace et al. [145] to include context in their sarcasm detection algorithms ( [146], [102], [147]). We came across several architectures that took context into account. ...
... State-of-the-art models for this task typically employ large contextual models (Matero et al. 2019;Losada, Crestani, and Parapar 2019;Zirikly et al. 2019). Leveraging additional data, such as user's history on social media, augments the predictive power (Zirikly et al. 2019;Sawhney et al. 2021). However, as every machine learning model, these can be prone to learning undesired data artifacts, for example users mentioning certain locations, per-Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). ...
... The recent advances in graph representation learning (Wu et al., 2021) in various domains provide a promising, under-explored research direction in the context of fake news spreader detection. More specifically, Graph Attention Networks (GAT) (Veličković et al., 2018) have achieved state-of-the-art-results in various natural language processing tasks (Plepi and Flek, 2021;Sawhney et al., 2021;Kacupaj et al., 2021;Ren and Zhang, 2020). However, this method has not been explored on user graphs in the context of fake news spreader detection. ...
... The plausible extensions include the inclusion of more affective phenomenon correlated to hate speech such as sarcasm/irony[53], "big five" personality traits[54], and emotion role labeling[55]. ...