Dumitru-Clementin Cercel

Dumitru-Clementin Cercel
Polytechnic University of Bucharest | UPB · Faculty of Automatic Control and System Engineering

About

50
Publications
6,812
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
193
Citations
Citations since 2017
43 Research Items
187 Citations
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080

Publications

Publications (50)
Preprint
Full-text available
In recent times, the detection of hate-speech, offensive, or abusive language in online media has become an important topic in NLP research due to the exponential growth of social media and the propagation of such messages, as well as their impact. Misogyny detection, even though it plays an important part in hate-speech detection, has not received...
Preprint
Full-text available
Complex word identification (CWI) is a cornerstone process towards proper text simplification. CWI is highly dependent on context, whereas its difficulty is augmented by the scarcity of available datasets which vary greatly in terms of domains and languages. As such, it becomes increasingly more difficult to develop a robust model that generalizes...
Preprint
Full-text available
As transfer learning from large-scale pre-trained language models has become prevalent in Natural Language Processing, running these models in computationally constrained environments remains a challenging problem yet to address. Several solutions including knowledge distillation, network quantization or network pruning have been proposed; however,...
Article
Full-text available
The 2020 outbreak of coronavirus pandemic generated a wave of rumours, misinformation, and conspiracy theories; these theories and un-informed speculations gained significant traction through social media platforms. In this paper, we focus on a particular conspiracy theory, related to the unfounded connection between 5G networks and the spread of C...
Conference Paper
Full-text available
Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining. This work presents our architectures used for the VarDial 2021 Romanian Dialect Identification subtask. We introduced a series of solutions based on Romanian or multilingual Transformers, as well as adversaria...
Preprint
Full-text available
The real-world impact of polarization and toxicity in the online sphere marked the end of 2020 and the beginning of this year in a negative way. Semeval-2021, Task 5 - Toxic Spans Detection is based on a novel annotation of a subset of the Jigsaw Unintended Bias dataset and is the first language toxicity detection task dedicated to identifying the...
Preprint
Full-text available
Reading is a complex process which requires proper understanding of texts in order to create coherent mental representations. However, comprehension problems may arise due to hard-to-understand sections, which can prove troublesome for readers, while accounting for their specific language skills. As such, steps towards simplifying these sections ca...
Preprint
Full-text available
Detecting humor is a challenging task since words might share multiple valences and, depending on the context, the same words can be even used in offensive expressions. Neural network architectures based on Transformer obtain state-of-the-art results on several Natural Language Processing tasks, especially text classification. Adversarial learning,...
Preprint
Full-text available
Extracting semantic information on measurements and counts is an important topic in terms of analyzing scientific discourses. The 8th task of SemEval-2021: Counts and Measurements (MeasEval) aimed to boost research in this direction by providing a new dataset on which participants train their models to extract meaningful information on measurements...
Conference Paper
Full-text available
Certain events or political situations determine users from the online environment to express themselves by using different modalities. One of them is represented by Internet memes, which combine text with a representative image to entail a wide range of emotions, from humor to sarcasm and even hate. In this paper, we describe our approach for the...
Conference Paper
Full-text available
Dialect identification represents a key aspect for improving a series of tasks, such as opinion mining, considering that the location of the speaker can greatly influence the attitude towards a subject. In this work, we describe the systems developed by our team for VarDial 2020: Romanian Dialect Identification, a task specifically created for chal...
Conference Paper
Full-text available
Financial causality detection is centered on identifying connections between different assets from financial news in order to improve trading strategies. FinCausal 2020-Causality Identification in Financial Documents-is a competition targeting to boost results in financial causality by obtaining an explanation of how different individual events or...
Article
Full-text available
This paper describes our models for the Mol-davian vs. Romanian Cross-Topic Identification (MRC) evaluation campaign, part of the VarDial 2019 workshop. We focus on the three subtasks for MRC: binary classification between the Moldavian (MD) and the Romanian (RO) dialects and two cross-dialect multi-class classification between six news topics, MD...
Conference Paper
Full-text available
The aim of this study is to detect flooding events by analyzing both texts published by African online news outlets as well as the accompanying article images. The data is provided by MediaEval 2019 within the Multimedia Satellite Task. Our contributions are related to the image-and text-based subtasks. In order to solve the required classification...
Conference Paper
Full-text available
Aggressiveness and several other related problems, such as hate speech, offensive language, or harassment, are experiencing a growing online presence in the context of contemporary social media platforms. The research efforts towards detecting, isolating, and stopping these disturbing behaviors have intensified, in tight relation with the increasin...
Preprint
Full-text available
Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic, Danish, Greek, and...
Preprint
Full-text available
Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers would usually find difficult to understand. Our approach uses zero-shot, one-shot, and few-shot learning techni...
Preprint
Full-text available
Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people. Propaganda is a frequently employed manipulation method having as goal to influence readers by spreading ideas meant to distort or manipulate their opinions. This paper describes our...
Preprint
Full-text available
This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks (DeftEval). This competition consists of three subtasks with different levels of granularity: (1) classification of sentences as definitional or non-definitional, (2) labeling of definitional sentences, and (3) rela...
Preprint
Full-text available
Users from the online environment can create different ways of expressing their thoughts, opinions, or conception of amusement. Internet memes were created specifically for these situations. Their main purpose is to transmit ideas by using combinations of images and texts such that they will create a certain state for the receptor, depending on the...
Preprint
Full-text available
Sentiment analysis is a process widely used in opinion mining campaigns conducted today. This phenomenon presents applications in a variety of fields, especially in collecting information related to the attitude or satisfaction of users concerning a particular subject. However, the task of managing such a process becomes noticeably more difficult w...
Preprint
Full-text available
Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people. Propaganda is a frequently employed manipulation method having as goal to influence readers by spreading ideas meant to distort or manipulate their opinions. This paper describes our...
Conference Paper
Full-text available
The overwhelming amount of online text information available today has increased the need for more research on its automatic summarization. In this work, we describe our participation in GermEval-2020, Task 3: German Text Summariza-tion. We compare two BERT-based met-rics, Sentence-BERT and BERTScore, to automatically evaluate the quality of summar...
Conference Paper
Full-text available
In this paper, we describe our participation to GermEval-2019 Task 2, which requires identifying and classifying offensive content in German tweets. For all three challenging subtasks, i.e. i) Subtask 1-a binary classification between Offensive and Non-Offensive tweets, ii) Subtask 2-a fine-grained classification into three different categories: Pr...
Article
Full-text available
In this paper, we focus on the Natural Language Processing (NLP) techniques that influence the precision of the opinion mining results. We analyze the challenges in opinion mining from a NLP perspective in order to describe a method with a better precision of the results. In this way, we select the different NLP techniques that can be used in opini...
Article
Full-text available
This paper treats the phenomenon of opinion influence in online forum threads. Influence among users' opinions is analyzed by taking into consideration the changes in their opinions. Therefore, a change in a user's opinion is modeled as a change of his/her posts' polarity. The hypothesis that underlies our research is that users' opinions may chang...
Conference Paper
Full-text available
Online discussions such as forums are very popular and enable participants to read other users’ previous interventions and also to express their own opinions on various subjects of interest. In online discussion forums, there is often a mixture of positive and negative opinions because users may have similar or conflicting opinions on the same subj...
Article
Full-text available
In recent years, opinion propagation in online social networks has become a widespread phenomenon. There are several applications of this phenomenon such as viral marketing and election campaigns, and thus the detection of opinion propagation is a topical issue. Some models have been proposed in order to explain this phenomenon. In this paper, we p...
Article
Full-text available
Rezumat. Adnotarea cu etichete morfo-sintactice ("Part-of-speech tagging -POS tagging") este procesul de etichetare gramaticală a fiecărui cuvânt dintr-o propoziţie, frază sau paragraf cu partea de vorbire corespunzătoare. Acest proces este o componentă a altor aplicaţii din prelucrarea limbajului natural şi, prin urmare, rezultatele trebuie să fie...

Network

Cited By