Paolo Rosso's research while affiliated with Universitat Politècnica de València and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (577)
Detecting offensive language in under-resourced languages presents a significant real-world challenge for social media platforms. This paper is the first work focused on the issue of offensive language detection in Arabizi, an under-explored topic in an under-resourced form of Arabic. For the first time, a comprehensive and critical overview of the...
This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 49 available datasets and their corresponding papers and conducted thorough analysis, considering 16 criteria across three primary dimensions: content, annotation process, and reusability. This analysis enabled us to i...
Bias in news search engines has been shown to influence users' perceptions of a news topic and contribute to the polarisation of society. As a result, there is a need for news search engines that increase user awareness of biases in the search results. While technical approaches have been developed to mitigate biases in search, very few studies hav...
The paper gives a brief overview of three shared tasks which have been organized at the PAN 2023 lab on digital text forensics and stylometry hosted at the CLEF 2023 conference. The tasks include authorship verification across discourse types, multi-author writing style analysis, profiling cryptocurrency influencers with few-shot learning, and trig...
Linguistic literature on irony discusses sarcasm as a form of irony characterized by its biting nature and the intention to mock a victim. This particular trait makes sarcasm apt to convey hate speech and not only humour. Previous works on abusive language stressed the need to address ironic language to lead the system to recognize correctly hate s...
This work studies the generalization capabilities of supervised Machine-Generated Text (MGT) detectors across model families and parameter scales of text generation models. In addition, we explore the feasibility of identifying the family and scale of the generator behind an MGT, instead of attributing the text to a particular language model. We le...
In recent years, the rapid increase in the dissemination of offensive and discriminatory material aimed at women through social media platforms has emerged as a significant concern. This trend has had adverse effects on women’s well-being and their ability to freely express themselves. The EXIST campaign has been promoting research in online sexism...
This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, makin...
Internet and social media have revolutionised the way news is distributed and consumed. However, the constant flow of massive amounts of content has made it difficult to discern between truth and falsehood, especially in online platforms plagued with malicious actors who create and spread harmful stories. Debunking disinformation is costly, which h...
The possibility that social networks offer to attach audio, video, and images to textual information has led many users to create messages with multimodal irony. Over the last years, a series of approaches have emerged trying to leverage all these formats to address the problem of multimodal irony detection. The question that the present work tries...
Irony is a complex linguistic phenomenon that has been extensively studied in computational linguistics across many languages. Existing research has relied heavily on annotated corpora, which are inherently biased due to their creation process. This study focuses on the problem of bias in cross-domain and cross-language irony detection and aims to...
p>Depression has long been studied in the NLP field, with most works focusing on individuals' negative emotions. People with depression also experience happiness, but this was not extensively studied.Previous works have shown that approaches relying on sentiment or emotion classification are unsuitable for extracting the expressions of feelings tha...
p>Depression has long been studied in the NLP field, with most works focusing on individuals' negative emotions. People with depression also experience happiness, but this was not extensively studied.Previous works have shown that approaches relying on sentiment or emotion classification are unsuitable for extracting the expressions of feelings tha...
The spread of health misinformation has the potential to cause serious harm to public health, from leading to vaccine hesitancy to adoption of unproven disease treatments. In addition, it could have other effects on society such as an increase in hate speech towards ethnic groups or medical experts. To counteract the sheer amount of misinformation,...
This study uses natural language processing (NLP) tools to analyze how politicians recreate the stereotype of immigrants in the Spanish Parliament. An interdisciplinary approach from computational linguistics and social psychology has been used to construct a variety of indices about content and linguistic styles. The analysis of 2,516 parliamentar...
Depression detection from user-generated content on the internet has been a long-lasting topic of interest in the research community, providing valuable screening tools for psychologists. The ubiquitous use of social media platforms lays out the perfect avenue for exploring mental health manifestations in posts and interactions with other users. Cu...
This paper describes our participation in the shared task of hate speech detection, which is one of the subtasks of the CERIST NLP Challenge 2022. Our experiments evaluate the performance of six transformer models and their combination using 2 ensemble approaches. The best results on the training set, in a five-fold cross validation scenario, were...
Check-worthiness detection is the task of identifying claims, worthy to be investigated by fact-checkers. Resource scarcity for non-world languages and model learning costs remain major challenges for the creation of models supporting multilingual check-worthiness detection.This paper proposes cross-training adapters on a subset of world languages,...
The paper describes the lab on Sexism identification in social networks (EXIST 2023) that will be hosted as a lab at the CLEF 2023 conference. The lab consists of three tasks, two of which are continuation of EXIST 2022 (sexism detection and sexism categorization) and a third and novel one on source intention identification. For this edition new te...
The paper gives a brief overview of the four shared tasks organized at the PAN 2023 lab on digital text forensics and stylometry to be hosted at the CLEF 2023 conference. The general goal of the PAN lab is to advance the state-of-the-art in text forensics and stylometry while ensuring objective evaluation of new and established methods on newly dev...
We present experiments on detecting hyperpartisanship in news using a ‘masking’ method that allows us to assess the role of style vs. content for the task at hand. Our results corroborate previous research on this task in that topic related features yield better results than stylistic ones. We additionally show that competitive results can be achie...
Depression detection from user-generated content on the internet has been a long-lasting topic of interest in the research community, providing valuable screening tools for psychologists. The ubiquitous use of social media platforms lays out the perfect avenue for exploring mental health manifestations in posts and interactions with other users. Cu...
We describe several approaches to text-and graph-based classification for detecting COVID-19 conspiracies on Twitter. We tackle the tasks of text classification with and without graph data, and classification of Twitter users based on user graph. To this end, we experiment with large transformer ensembles, GPT-3-based techniques, and a variety of g...
Check-worthiness detection is the task of identifying claims, worthy to be investigated by fact-checkers. Resource scarcity for non-world languages and model learning costs remain major challenges for the creation of models supporting multilingual check-worthiness detection. This paper proposes cross-training adapters on a subset of world languages...
In this paper we raise the research question of whether fake news and hate speech spreaders share common patterns in language. We compute a novel index, the ingroup vs outgroup index, in three different datasets and we show that both phenomena share an "us vs them" narrative.
Irony is nowadays a pervasive phenomenon in social networks. The multimodal functionalities of these platforms (i.e., the possibility to attach audio, video, and images to textual information) are increasingly leading their users to employ combinations of information in different formats to express their ironic thoughts. The present work focuses on...
The paper describes the organization, goals, and results of the sEXism Identification in Social neTworks (EXIST)2022 challenge, a shared task proposed for the second year at IberLEF. EXIST 2022 consists of two challenges: sexism identification and sexism categorization of tweets and gabs, both in Spanish and English. We have received a total of 45...
Mental disorders are an important public health issue. Computational methods have the potential to aid with the detection of risky behaviors online, through extracting information from social media in order to detect users at risk of developing mental disorders. In this domain, understanding the behavior of the computational models used is crucial....
This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the...
Abusive language is becoming a problematic issue for our society. The spread of messages that reinforce social and cultural intolerance could have dangerous effects in victims’ life. State-of-the-art technologies are often effective on detecting explicit forms of abuse, leaving unidentified the utterances with very weak offensive language but a str...
The paper gives a brief overview of three shared tasks which have been organized at the PAN 2022 lab on digital text forensics and stylometry hosted at the CLEF 2022 conference. The tasks include authorship verification across discourse types, multi-author writing style analysis and author profiling. Some of the tasks continue and advance past edit...
Tracking news stories in documents is a way to deal with the large amount of information that surrounds us everyday, to reduce the noise and to detect emergent topics in news. Since the Covid-19 outbreak, the world has known a new problem: infodemic. News article titles are massively shared on social networks and the analysis of trends and growing...
Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inference...
This overview paper describes the first shared task on fake news detection in Urdu language. The task was posed as a binary classification task, in which the goal is to differentiate between real and fake news. We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing. The dataset contained news i...
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. This is a binary classification task in which the goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health...
This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the...
In recent years, aspect category detection has become popular due to the rapid growth in customer reviews data on e-commerce and other online platforms. Aspect Category Detection, a sub-task of Aspect-Based Sentiment Analysis, categorizes the reviews based on the features of a product such as a laptop’s display, or an aspect of an entity such as th...
This paper describes our participation in the shared task Fine-Grained Hate Speech Detection on Arabic Twitter at the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT). The shared task is divided into three detection subtasks: (i) Detect whether a tweet is offensive or not; (ii) Detect whether a tweet contains hate speech or n...
The high prevalence of depression in society has given rise to the need for new digital tools to assist in its early detection. To this end, existing research has mainly focused on detecting depression in the domain of social media, where there is a sufficient amount of data. However, with the rise of conversational agents like Siri or Alexa, the c...
Facebook's advertising platform provides political parties with an electoral tool that enables them to reach an extremely detailed audience. Unlike television, the sponsored content on Facebook is seen only by the targeted users. This opacity was an obstacle to political communications research until Facebook released advertiser-sponsored content i...
Zero-shot text classifiers based on label descriptions embed an input text and a set of labels into the same space: measures such as cosine similarity can then be used to select the most similar label description to the input text as the predicted label. In a true zero-shot setup, designing good label descriptions is challenging because no developm...
Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usual...
Hate speech detection has received a lot of attention in recent years. However, there are still a number of challenges to monitor hateful content in social media, especially in scenarios with few data. In this paper we propose HaGNN, a convolutional graph neural network that is capable of performing an accurate text classification in a supervised w...
Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users wit...
Recent years have seen a tremendous increase in the propagation of different types of misinformation and disinformation, including among others fake news, rumours, clickbait and conspiracy theories. Misinformation involved in satire and clickbait, among others, has a different intention from disinformation. Despite many attempts by the research com...
Fake news is a threat for the society and can create a lot of confusion to people regarding what is true and what not. Fake news usually contain manipulated content, such as text or images that attract the interest of the readers with the aim to convince them on their truthfulness. In this article, we propose SceneFND (Scene Fake News Detection), a...
The high prevalence of depression in society has given rise to the need for new digital tools to assist in its early detection. To this end, existing research has mainly focused on detecting depression in the domain of social media, where there is a sufficient amount of data. However, with the rise of conversational agents like Siri or Alexa, the c...
Zero-shot text classifiers based on label descriptions embed an input text and a set of labels into the same space: measures such as cosine similarity can then be used to select the most similar label description to the input text as the predicted label. In a true zero-shot setup, designing good label descriptions is challenging because no developm...
Despite recent achievements in predicting personality traits and some other human psychological features with digital traces, prediction of subjective well-being (SWB) appears to be a relatively new task with few solutions. COVID-19 pandemic has added both a stronger need for rapid SWB screening and new opportunities for it, with online mental heal...
The paper gives a brief overview of the four shared tasks to be organized at the PAN 2022 lab on digital text forensics and stylometry hosted at the CLEF 2022 conference. The tasks include authorship verification across discourse types, multi-author writing style analysis, author profiling, and content profiling. Some of the tasks continue and adva...
In the last decade, the need to detect automatically irony to correctly recognize the sentiment and hate speech involved in online texts increased the investigation on humorous figures of speech in NLP. The slight boundaries among various types of irony lead to think of irony as a linguistic phenomenon that covers sarcasm, satire, humor and parody...
In recent years, SenticNet and OntoSenticNet have represented important developments in the novel interdisciplinary field of research known as sentic computing, enabling the development of a variety of Sentic applications. In this paper, we propose an extension of the OntoSenticNet ontology, named DomainSenticNet, and contribute an unsupervised met...
Health misinformation on search engines is a significant problem that could negatively affect individuals or public health. To mitigate the problem, TREC organizes a health misinformation track. This paper presents our submissions to this track. We use a BM25 and a domain-specific semantic search engine for retrieving initial documents. Later, we e...
Fake news is a threat to society. A huge amount of fake news is posted every day on social networks which is read, believed and sometimes shared by a number of users. On the other hand, with the aim to raise awareness, some users share posts that debunk fake news by using information from fact-checking websites. In this paper, we are interested in...
Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up,...
Making machines understand language and reasoning on it has been one of the most challenging problems addressed by Artificial Intelligent researchers. This challenge increases when figurative language is used for communicating complex meanings, intentions, emotions and attitudes in creative and funny ways. In fact, sentiment analysis approaches str...
Identifying check-worthy claims is often the first step of automated fact-checking systems. Tackling this task in a multilingual setting has been understudied. Encoding inputs with multilingual text representations could be one approach to solve the multilingual check-worthiness detection. However, this approach could suffer if cultural bias exists...
The paper gives a brief overview of the three shared tasks organized at the PAN 2021 lab on digital text forensics and stylometry hosted at the CLEF conference. The tasks include authorship verification across domains, author profiling for hate speech spreaders, and style change detection for multi-author documents. In part the tasks are new and in...
The history of journalism and news diffusion is tightly coupled with the effort to dispel hoaxes, misinformation, propaganda, unverified rumours, poor reporting, and messages containing hate and divisions. With the explosive growth of online social media and billions of individuals engaged with consuming, creating, and sharing news, this ancient pr...
The proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the ana...
Fake news are spread by exploiting specific linguistic patterns aimed at triggering negative emotions and persuading the consumers. A way to contrast this phenomenon is to analyse the psychological factors underlying consumers’ vulnerabilities. This paper is situated in this research context: first, we study the correlation between psycho-linguisti...
The automatic detection of figurative language, such as irony and sarcasm, is one of the most challenging tasks of Natural Language Processing (NLP). In this paper, we investigate the generalization capabilities of figurative language detection models, focusing on the case of irony and sarcasm. Firstly, we compare the most promising approaches of t...
Mental disorders are an important public health issue, and computational methods have the potential to aid with detection of risky behaviors online, through extracting information from social media in order to retrieve users at risk of developing mental disorders. At the same time, state-of-the-art machine learning models are based on neural networ...
Mental disorders can severely affect quality of life, constitute a major predictive factor of suicide, and are usually underdiagnosed and undertreated. Early detection of signs of mental health problems is particularly important, since unattended, they can be life-threatening. This is why a deep understanding of the complex manifestations of mental...
Fake news is considered one of the main threats of our society. The aim of fake news is usually to confuse readers and trigger intense emotions to them in an attempt to be spread through social networks. Even though recent studies have explored the effectiveness of different linguistic patterns for fake news detection, the role of emotional signals...
Stereotype is a type of social bias massively present in texts that computational models use. There are stereotypes that present special difficulties because they do not rely on personal attributes. This is the case of stereotypes about immigrants, a social category that is a preferred target of hate speech and discrimination. We propose a new appr...
The paper gives a brief overview of the three shared tasks to be organized at the PAN 2021 lab on digital text forensics and stylometry hosted at the CLEF conference. The tasks include authorship verification across domains, author profiling for hate speech spreaders, and style change detection for multi-author documents. In part the tasks are new...
Misogyny is a multifaceted phenomenon and can be linguistically manifested in numerous ways. The evaluation campaigns of EVALITA and IberEval in 2018 proposed a shared task of Automatic Misogyny Identification (AMI) based on Italian, English and Spanish tweets. Since the participating teams’ results were pretty low in the misogynistic behaviour cat...
The rise of social media has offered a fast and easy way for the propagation of conspiracy theories and other types of disinformation. Despite the research attention that has received, fake news detection remains an open problem and users keep sharing articles that contain false statements but which they consider real. In this article, we focus on...
Fake news articles often stir the readers' attention by means of emotional appeals that arouse their feelings. Unlike in short news texts, authors of longer articles can exploit such affective factors to manipulate readers by adding exaggerations or fabricating events, in order to affect the readers' emotions. To capture this, we propose in this pa...