
Dumitru-Clementin CercelPolytechnic University of Bucharest | UPB · Faculty of Automatic Control and System Engineering
Dumitru-Clementin Cercel
About
50
Publications
6,812
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
193
Citations
Citations since 2017
Publications
Publications (50)
In recent times, the detection of hate-speech, offensive, or abusive language in online media has become an important topic in NLP research due to the exponential growth of social media and the propagation of such messages, as well as their impact. Misogyny detection, even though it plays an important part in hate-speech detection, has not received...
Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes...
As transfer learning from large-scale pre-trained language models has become prevalent in Natural Language Processing, running these models in computationally constrained environments remains a challenging problem yet to address. Several solutions including knowledge distillation, network quantization or network pruning have been proposed; however,...
The 2020 outbreak of coronavirus pandemic generated a wave of rumours, misinformation, and conspiracy theories; these theories and un-informed speculations gained significant traction through social media platforms. In this paper, we focus on a particular conspiracy theory, related to the unfounded connection between 5G networks and the spread of C...
Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining. This work presents our architectures used for the VarDial 2021 Romanian Dialect Identification subtask. We introduced a series of solutions based on Romanian or multilingual Transformers, as well as adversaria...
The real-world impact of polarization and toxicity in the online sphere marked the end of 2020 and the beginning of this year in a negative way. Semeval-2021, Task 5 - Toxic Spans Detection is based on a novel annotation of a subset of the Jigsaw Unintended Bias dataset and is the first language toxicity detection task dedicated to identifying the...
Reading is a complex process which requires proper understanding of texts in order to create coherent mental representations. However, comprehension problems may arise due to hard-to-understand sections, which can prove troublesome for readers, while accounting for their specific language skills. As such, steps towards simplifying these sections ca...
Detecting humor is a challenging task since words might share multiple valences and, depending on the context, the same words can be even used in offensive expressions. Neural network architectures based on Transformer obtain state-of-the-art results on several Natural Language Processing tasks, especially text classification. Adversarial learning,...
Extracting semantic information on measurements and counts is an important topic in terms of analyzing scientific discourses. The 8th task of SemEval-2021: Counts and Measurements (MeasEval) aimed to boost research in this direction by providing a new dataset on which participants train their models to extract meaningful information on measurements...
Certain events or political situations determine users from the online environment to express themselves by using different modalities. One of them is represented by Internet memes, which combine text with a representative image to entail a wide range of emotions, from humor to sarcasm and even hate. In this paper, we describe our approach for the...
Dialect identification represents a key aspect for improving a series of tasks, such as opinion mining, considering that the location of the speaker can greatly influence the attitude towards a subject. In this work, we describe the systems developed by our team for VarDial 2020: Romanian Dialect Identification, a task specifically created for chal...
Financial causality detection is centered on identifying connections between different assets from financial news in order to improve trading strategies. FinCausal 2020-Causality Identification in Financial Documents-is a competition targeting to boost results in financial causality by obtaining an explanation of how different individual events or...
This paper describes our models for the Mol-davian vs. Romanian Cross-Topic Identification (MRC) evaluation campaign, part of the VarDial 2019 workshop. We focus on the three subtasks for MRC: binary classification between the Moldavian (MD) and the Romanian (RO) dialects and two cross-dialect multi-class classification between six news topics, MD...
The aim of this study is to detect flooding events by analyzing both texts published by African online news outlets as well as the accompanying article images. The data is provided by MediaEval 2019 within the Multimedia Satellite Task. Our contributions are related to the image-and text-based subtasks. In order to solve the required classification...
Aggressiveness and several other related problems, such as hate speech, offensive language, or harassment, are experiencing a growing online presence in the context of contemporary social media platforms. The research efforts towards detecting, isolating, and stopping these disturbing behaviors have intensified, in tight relation with the increasin...
Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic, Danish, Greek, and...
Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers would usually find difficult to understand. Our approach uses zero-shot, one-shot, and few-shot learning techni...
Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people. Propaganda is a frequently employed manipulation method having as goal to influence readers by spreading ideas meant to distort or manipulate their opinions. This paper describes our...
This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks (DeftEval). This competition consists of three subtasks with different levels of granularity: (1) classification of sentences as definitional or non-definitional, (2) labeling of definitional sentences, and (3) rela...
Users from the online environment can create different ways of expressing their thoughts, opinions, or conception of amusement. Internet memes were created specifically for these situations. Their main purpose is to transmit ideas by using combinations of images and texts such that they will create a certain state for the receptor, depending on the...
Sentiment analysis is a process widely used in opinion mining campaigns conducted today. This phenomenon presents applications in a variety of fields, especially in collecting information related to the attitude or satisfaction of users concerning a particular subject. However, the task of managing such a process becomes noticeably more difficult w...
Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people. Propaganda is a frequently employed manipulation method having as goal to influence readers by spreading ideas meant to distort or manipulate their opinions. This paper describes our...
The overwhelming amount of online text information available today has increased the need for more research on its automatic summarization. In this work, we describe our participation in GermEval-2020, Task 3: German Text Summariza-tion. We compare two BERT-based met-rics, Sentence-BERT and BERTScore, to automatically evaluate the quality of summar...
In this paper, we describe our participation to GermEval-2019 Task 2, which requires identifying and classifying offensive content in German tweets. For all three challenging subtasks, i.e. i) Subtask 1-a binary classification between Offensive and Non-Offensive tweets, ii) Subtask 2-a fine-grained classification into three different categories: Pr...
In this paper, we focus on the Natural Language Processing (NLP)
techniques that influence the precision of the opinion mining results. We analyze the
challenges in opinion mining from a NLP perspective in order to describe a method
with a better precision of the results. In this way, we select the different NLP
techniques that can be used in opini...
This paper treats the phenomenon of opinion influence in online forum threads. Influence among users' opinions is analyzed by taking into consideration the changes in their opinions. Therefore, a change in a user's opinion is modeled as a change of his/her posts' polarity. The hypothesis that underlies our research is that users' opinions may chang...
Online discussions such as forums are very popular and enable participants to read other users’ previous interventions and also to express their own opinions on various subjects of interest. In online discussion forums, there is often a mixture of positive and negative opinions because users may have similar or conflicting opinions on the same subj...
In recent years, opinion propagation in online social networks has become a widespread phenomenon. There are several applications of this phenomenon such as viral marketing and election campaigns, and thus the detection of opinion propagation is a topical issue. Some models have been proposed in order to explain this phenomenon. In this paper, we p...
Rezumat. Adnotarea cu etichete morfo-sintactice ("Part-of-speech tagging -POS tagging") este procesul de etichetare gramaticală a fiecărui cuvânt dintr-o propoziţie, frază sau paragraf cu partea de vorbire corespunzătoare. Acest proces este o componentă a altor aplicaţii din prelucrarea limbajului natural şi, prin urmare, rezultatele trebuie să fie...