Cristina Bosco

Cristina Bosco
Università degli Studi di Torino | UNITO · Dipartimento di Informatica

PhD in computer science

About

113
Publications
11,345
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,720
Citations
Additional affiliations
January 2000 - present
Università degli Studi di Torino
Position
  • Research Assistant

Publications

Publications (113)
Article
In this work, we propose BERT-WMAL, a hybrid model that brings together information coming from data through the recent transformer deep learning model and those obtained from a polarized lexicon. The result is a model for sentence polarity that manages to have performances comparable with those at the state-of-the-art, but with the advantage of be...
Chapter
The Hate and Morality (HaMor) submission for the Profiling Hate Speech Spreaders on Twitter task at PAN 2021 ranked as the 19th position - over 67 participating teams - according to the averaged accuracy value of \(73\%\) over the two languages - English (\(62\%\)) and Spanish (\(84\%\)). The method proposed four types of features for inferring use...
Preprint
Full-text available
Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are sti...
Article
This article describes an ongoing project for the development of a novel Italian treebank in Universal Dependencies format: VALICO-UD. It consists of texts written by Italian L2 learners of different mother tongues (German, French, Spanish and English) drawn from VALICO, an Italian learner corpus elicited by comic strips. Aiming at building a paral...
Article
Full-text available
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number...
Article
In the last decade, the need to detect automatically irony to correctly recognize the sentiment and hate speech involved in online texts increased the investigation on humorous figures of speech in NLP. The slight boundaries among various types of irony lead to think of irony as a linguistic phenomenon that covers sarcasm, satire, humor and parody...
Article
Full-text available
The present work introduces MoralConvITA, the first Italian corpus of conversations on Twitter about immigration whose annotation is focused on how moral beliefs shape users interactions. The corpus currently consists of a set of 1,724 tweets organized in adjacency pairs and annotated by referring to a pluralistic social psychology theory about mor...
Poster
Full-text available
In this paper we describe the largest corpus annotated with hate speech in the political domain in Italian. Policycorpus XL has 7000 tweets, manually annotated, and a presence of hate labels above 40%, while in other corpora of the same type is usually below 30%. Here we describe the collection of data and test some baseline with simple classificat...
Conference Paper
Full-text available
In this paper we describe the Hate and Morality (HaMor) submission for the Profiling Hate Speech Spreaders on Twitter task at PAN 2021. We ranked as the 19th position-over 66 participating teams-according to the averaged accuracy value of 73% reached by our proposed models over the two languages. We obtained the 43th higher accuracy for English (62...
Article
Full-text available
Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an impor...
Conference Paper
Full-text available
Polarity imbalance is an asymmetric situation that occurs while using para-metric threshold values in lexicon-based Sentiment-Analysis (SA). The variation across the thresholds may have an opposite impact on the prediction of negative and positive polarity. We hypothesize that this may be due to asymmetries in the data or in the lexicon, or both. W...
Conference Paper
Full-text available
SardiStance is the first shared task for Italian on the automatic classification of stance in tweets. It is articulated in two different settings: A) Textual Stance Detection, exploiting only the information provided by the tweet, and B) Contextual Stance Detection, with the addition of information on the tweet itself such as the number of retweets...
Conference Paper
Full-text available
The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detection of hateful content in Italian Twitter messages. HaSpeeDe 2 is composed of a Main task (hate speech detection) and two Pilot tasks, (stereotype and nominal utterance detection). Systems were challenged along two dimensions: (i) time, with test data co...
Article
Starting from the first edition held in 2007, EVALITA is the initiative for the evaluation of Natural Language Processing tools for Italian. This paper describes the EVALITA4ELG project, whose main aim is at systematically collecting the resources released as benchmarks for this evaluation campaign, and making them easily accessible through the Eur...
Preprint
Full-text available
This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Depe...
Preprint
Full-text available
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number...
Article
Full-text available
The paper describes the Web platform built within the project “Contro l’Odio”, for monitoring and contrasting discrimination and hate speech against immigrants in Italy. It applies a combination of computational linguistics techniques for hate speech detection and data visualization tools on data drawn from Twitter.It allows users to access a huge...
Article
Full-text available
Stance Detection is the task of automatically determining whether the author of a text is in favor, against, or neutral towards a given target. In this paper we investigate the portability of tools performing this task across different languages, by analyzing the results achieved by a Stance Detection system (i.e. MultiTACOS) trained and tested in...
Chapter
The paper describes the first task on Part of Speech tagging of spoken language held at the Evalita evaluation campaign, KIPoS. Benefiting from the availability of a resource of transcribed spoken Italian (i.e. the KIParla corpus), which has been newly annotated and released for KIPoS, the task includes three evaluation exercises focused on formal...
Conference Paper
Full-text available
The paper describes the Web platform built within the project "Contro l'odio", for monitoring and contrasting discrimination and hate speech against immigrants in Italy. It applies a combination of computational linguistics techniques for hate speech detection and data visualization tools on data drawn from Twitter. It allows users to access a huge...
Conference Paper
Full-text available
Sentiment Analysis (SA) based on an affective lexicon is popular because straightforward to implement and robust against data in specific, narrow domains. However, the morpho-syntactic pre-processing needed to match words in the affective lexicon (lemmatization in particular) may be prone to errors. In this paper, we show how such errors have a sub...
Conference Paper
In the present paper we describe the UPV-28-UNITO system's submission to the Ru-morEval 2019 shared task. The approach we applied for addressing both the subtasks of the contest exploits both classical machine learning algorithms and word embeddings, and it is based on diverse groups of features: stylistic, lexical, emotional, sentiment, meta-struc...
Article
Provided the difficulties that still affect a correct identification of irony within the context of Sentiment Analysis tasks, in this paper we describe the main issues emerged during the development of a novel resource for Italian annotated for irony. The project mainly consists in the application on the Twitter corpus TWITTIRÒ of a multi-layered s...
Conference Paper
In this paper we present a data visualization platform designed to support the Natural Language Processing (NLP) scholar to study and analyze different corpora collected with the purpose to understand the hate speech phenomenon in social media. The project started with the creation of a corpus which collects tweets addressed to specific groups of e...
Conference Paper
Full-text available
This document contains the Guidelinesfor Participants to the task IronITA (Irony Detection in Italian Tweets) @ EVALITA 2018. The task consists in automatically annotating messages from Twitter for irony and sarcasm and it is organized in a main task (Task A) centered on irony, and a subtask (Subtask B) centered on sarcasm, whose results will be s...
Conference Paper
Full-text available
In this paper we describe the main issues emerged within the application of a multi-layered scheme for the fine-grained annotation of irony (Karoui et al., 2017) on an Italian Twitter corpus, i.e. TWITTIRÒ, which is composed of about 1,500 tweets with various provenance. A discussion is proposed about the limits and advantages of the application of...
Chapter
The presence of figurative language represents a big challenge for sentiment analysis. In this work, we address the task of assigning sentiment polarity to Twitter texts when figurative language is employed, with a special focus on the presence of ironic devices. We introduce a pipeline model which aims to assign a polarity value exploiting, on the...
Chapter
Full-text available
The purpose of this paper is the analysis of the auxiliary selection in intransitive verbs in Italian. The applied methodology consists in comparing the linguistic theory with the data extracted from two different annotated corpora: UD-IT and PoSTWITA-UD. The analyzed verbs have been classified in different semantic categories depending on the ling...
Conference Paper
Full-text available
In this paper we describe our work concerning the application of a multi-layered scheme for the fine-grained annotation of irony (Karoui et al., 2017) on a new Italian social media corpus. In applying the annotation on this corpus containing tweets, i.e. TWITTIRÒ, we outlined both strengths and weaknesses of the scheme when applied on Italian, thus...
Article
During the last decade, the development of Artificial Intelligence techniques for human language processing has opened new frontiers in the analysis of the data generated by users through social media. The possibility of investigating the human behavior with large scale data provides not only new opportunities for research and industry, but also ne...
Conference Paper
Full-text available
This article describes a Twitter corpus of social media contents in the Subjective Well-Being domain. A multi-layered manual annotation for exploring attitudes on fertility and parenthood has been applied. The corpus was further analysed by using sentiment and emotion lexicons in order to highlight relationships between the use of affective languag...
Conference Paper
Full-text available
The paper introduces a new annotated French data set for Sentiment Analysis, which is a currently missing resource. It focuses on the collection from Twitter of data related to the socio-political debate about the reform of the bill for wedding in France. The design of the annotation scheme is described, which extends a polarity label set by making...
Conference Paper
Full-text available
The paper introduces a new annotated Spanish and Catalan data set for Sentiment Analysis about the Catalan separatism and the related debate held in social media at the end of 2015. It focuses on the collection of data, where we dealt with the exploitation in the debate of two languages, i.e. Spanish and Catalan, and on the design of the annotation...
Conference Paper
The paper describes a research about the socio-political debate on the reform of the education sector in Italy. It includes the development of an Italian dataset for sentiment analysis from two different comparable sources: Twitter and the online institutional platform implemented for supporting the debate. We describe the collection methodology, w...
Conference Paper
Full-text available
The paper describes a project for the development of a French corpus for sentiment analysis focused on the texts generated by the participants to a debate bout a political reform, i.e. the bill on homosexual wedding in France. Beyond the description of the data set, the paper shows the methodologies applied in the collection and annotation of data....
Conference Paper
Full-text available
The paper proposes a new evaluation exercise, meant to shed light on the syntax-semantics interface for the analysis of written Italian and resulting from the combination of the EVALITA 2014 dependency parsing and event extraction tasks. It aims at investigating the cross-fertilization of tasks, generating a new resource combining dependency and ev...
Data
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Ma...
Conference Paper
Abstract—Political debates about a reform may sparkle national controversies, by leading members of the community to polarize their opinions and sentiment about the topic addressed. With the rise of social media like Twitter users are encouraged to voice and share their strong and polarized views and in general people are exposed to broader viewpoi...
Article
Full-text available
Shared task evaluation campaigns represent a well established form of competitive evaluation, an important opportunity to propose and tackle new challenges for a specific research area and a way to foster the development of benchmarks, tools and resources. The advantages of this approach are evident in any experimental field, including the area of...
Article
Full-text available
In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and targ...
Article
In this paper, we introduce an ongoing project for the development of a parallel treebank for Italian, English and French. The treebank is annotated in a dependency format, namely the one designed in the Turin University Treebank (TUT), hence the choice to call such new resource Par(allel)TUT. The project aims at creating a resource which can be us...
Article
This paper describes some results about the way syntactic representations and parsing methodologies affect the performance of systems for parsing Italian. Italian has a rich morphology, especially with respect to Verbal suffixes, that can provide a parser with useful information for making the correct choices. With respect to syntactic representati...