About
314
Publications
169,461
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
21,311
Citations
Introduction
Social computing and web mining: https://chato.cl/
Additional affiliations
September 2017 - present
February 2016 - August 2017
Eurecat, Barcelona, Spain
Position
- Managing Director
September 2015 - January 2016
Education
July 2000 - November 2004
January 1994 - July 2000
Publications
Publications (314)
Social media is becoming more and more integrated in the distribution and consumption of news. How is news in social media different from mainstream news? This paper presents a comparative analysis covering a span of 17 months and hundreds of news events, using a method that combines automatic and manual annotations. We focus on climate change, a t...
The use of social media to communicate timely information during crisis situations has become a common practice in recent years. In particular, the one-to-many nature of Twitter has created an opportunity for stakeholders to disseminate crisis-relevant messages, and to access vast amounts of information they may not otherwise have. Our goal is to u...
Social media platforms provide active communication channels during mass
convergence and emergency events such as disasters caused by natural hazards.
As a result, first responders, decision makers, and the public can use this
information to gain insight into the situation as it unfolds. In particular,
many social media messages communicated during...
Although freelancing work has grown substantially in recent years, in part facilitated by a number of online labor marketplaces, %(e.g., Guru, Freelancer, Amazon Mechanical Turk), traditional forms of "in-sourcing" work continue being the dominant form of employment. % in most companies. This means that, at least for the time being, freelancing and...
User-generated content online is shaped by many factors, including endogenous elements such as platform affordances and norms, as well as exogenous elements, in particular significant events. These impact what users say, how they say it, and when they say it. In this paper, we focus on quantifying the impact of violent events on various types of ha...
The work of Emergency Management (EM) agencies requires timely collection of relevant data to inform decision-making for operations and public communication before, during, and after a disaster. However, the limited human resources available to deploy for field data collection is a persistent problem for EM agencies. Thus, many of these agencies ha...
In this paper, we study the effects of using an algorithm-based risk assessment instrument (RAI) to support the prediction of risk of violent recidivism upon release. The instrument we used is a machine learning version of RiskCanvi used by the Justice Department of Catalonia, Spain. It was hypothesized that people can improve their performance on...
Recommender systems typically suggest to users content similar to what they consumed in the past. A user, if happening to be exposed to strongly polarized content, might be steered towards more and more radicalized content by subsequent recommendations, eventually being trapped in what we call a "radicalization pathway". In this paper, we investiga...
We present the results of a 12-week longitudinal user study wherein the participants, 110 subjects from Southern Europe, received on a daily basis Electronic Music (EM) diversified recommendations. By analyzing their explicit and implicit feedback, we show that exposure to specific levels of music recommendation diversity may be responsible for lon...
Among the seven key requirements to achieve trustworthy AI proposed by the High-Level Expert Group on Artificial Intelligence (AI-HLEG) established by the European Commission, the fifth requirement ("Diversity, non-discrimination and fairness") declares: "In order to achieve Trustworthy AI, we must enable inclusion and diversity throughout the enti...
The COVID-19 pandemic has fueled the spread of misinformation on social media and the Web as a whole. The phenomenon dubbed `infodemic' has taken the challenges of information veracity and trust to new heights by massively introducing seemingly scientific and technical elements into misleading content. Despite the existing body of work on modeling...
Among the seven key requirements to achieve trustworthy AI proposed by the High-Level Expert Group on Artificial Intelligence (AI-HLEG) established by the European Commission (EC), the fifth requirement ("Diversity, non-discrimination and fairness") declares: "In order to achieve Trustworthy AI, we must enable inclusion and diversity throughout the...
We present the results of a 12-week longitudinal user study wherein the participants, 110 subjects from Southern Europe, received on a daily basis Electronic Music (EM) diversified recommendations. By analyzing their explicit and implicit feedback, we show that exposure to specific levels of music recommendation diversity may be responsible for lon...
Before urban flooding actually happens, weather forecasts with varying degrees of precision are available to emergency managers. In the aftermath of the event, authoritative information including Earth Observation (EO) data can be used to estimate precisely the flood extent, possibly after several hours. This study aims to determine how social medi...
We study university admissions under a centralized system that uses grades and standardized test scores to match applicants to university programs. In the context of this system, we explore affirmative action policies that seek to narrow the gap between the admission rates of different socio-demographic groups while still accepting students with hi...
Digital mental health applications promise scalable and cost-effective solutions to mitigate the gap between the demand and supply of mental healthcare services. However, very little attention is paid on differential impact and potential discrimination in digital mental health services with respect to different sensitive user groups (e.g., race, ag...
Social media has been described as a mechanism for understanding a situation using information spread across many minds, i.e., a form of distributed cognition (Hutchins 1995). Gaining situational awareness in a disaster is critical and time-sensitive. Social media provides a vast data source that might help improve response in the early hours and d...
People recommender systems may affect the exposure that users receive in social networking platforms, influencing attention dynamics and potentially strengthening pre-existing inequalities that disproportionately affect certain groups. In this paper we introduce a model to simulate the feedback loop created by multiple rounds of interactions betwee...
The COVID-19 pandemic has fueled the spread of misinformation on social media and the Web as a whole. The phenomenon dubbed `infodemic' has taken the challenges of information veracity and trust to new heights by massively introducing seemingly scientific and technical elements into misleading content. Despite the existing body of work on modeling...
Shared practices to assess the diversity of retrieval system results are still debated in the Information Retrieval community, partly because of the challenges of determining what diversity means in specific scenarios, and of understanding how diversity is perceived by end-users. The field of Music Information Retrieval is not exempt from this issu...
Artificial Intelligence (AI) is increasingly used to build Decision Support Systems (DSS) across many domains. This paper describes a series of experiments designed to observe human response to different characteristics of a DSS such as accuracy and bias, particularly the extent to which participants rely on the DSS, and the performance they achiev...
In this paper, we study the effects of using an algorithm-based risk assessment instrument to support the prediction of risk of criminalrecidivism. The instrument we use in our experiments is a machine learning version ofRiskEval(name changed for double-blindreview), which is the main risk assessment instrument used by the Justice Department ofCoun...
Music listening in today's digital spaces is highly characterized by the availability of huge music catalogues, accessible by people all over the world. In this scenario, recommender systems are designed to guide listeners in finding tracks and artists that best fit their requests, having therefore the power to influence the diversity of the music...
High-quality human annotations are necessary for creating effective machine learning-driven stream processing systems. We study hybrid stream processing systems based on a Human-In-The-Loop Machine Learning (HITL-ML) paradigm, in which one or many human annotators and an automatic classifier (trained at least partially by the human annotators) labe...
Ranking items or people is a fundamental operation at the basis of several processes and services, not all of them happening online. Ranking is required for different tasks, including search, personalization, recommendation, and filtering. While traditionally ranking has been aimed solely at maximizing some global utility function, recently the awa...
People recommender systems may affect the exposure that users receive in social networking platforms, influencing attention dynamics and potentially strengthening pre-existing inequalities that disproportionately affect certain groups. In this paper we introduce a model to simulate the feedback loop created by multiple rounds of interactions betwee...
Music Recommender Systems (Music RS) are nowadays pivotal in shaping the listening experience of people all around the world. Partly driven by the commercial application of this technology, music recommendation research has gained increasing attention both within and outside the Music Information Retrieval (MIR) community. Thanks also to the widesp...
This paper describes SciClops, a method to help combat online scientific misinformation. Although automated fact-checking methods have gained significant attention recently, they require pre-existing ground-truth evidence, which, in the scientific context, is sparse and scattered across a constantly-evolving scientific literature. Existing methods...
Social media is becoming more and more integrated in the distribution and consumption of news. How is news in social media different from mainstream news? This paper presents a comparative analysis covering a span of 17 months and hundreds of news events, using a method that combines automatic and manual annotations. We focus on climate change, a t...
In this paper we investigate risk prediction of criminal re-offense among juvenile defendants using general-purpose machine learning (ML) algorithms. We show that in our dataset, containing hundreds of cases, ML models achieve better predictive power than a structured professional risk assessment tool, the Structured Assessment of Violence Risk in...
Artificial Intelligence (AI) and its relation with societies has become an increasingly interesting subject of study for the social sciences. Nevertheless, there is still an important lack of interdisciplinary and empirical research applying social theories to the field of AI. We here aim to shed light on the interactions between humans and autonom...
This paper summarizes key opportunities and challenges identified during the workshop "Social Media for Disaster Risk Management: Researchers Meet Practitioners" which took place online in November 2020. It constitutes a work-in-progress towards identifying new directions for research and development of systems that can better serve the information...
In this paper, we consider the prediction of violent recidivism in
criminal justice as currently done through machine learning methods.
Specifically, we consider sequential evaluations performed
on jail inmates with a state-of-the-art risk assessment instrument,
RisCanvi. In this protocol, evaluations are done periodically every
six months to all i...
In this work, the problem of predicting dropout risk in undergraduate studies is addressed from a perspective of algorithmic fairness. We develop a machine learning method to predict the risks of university dropout and underperformance. The objective is to understand if such a system can identify students at risk while avoiding potential discrimina...
Research in adversarial machine learning has shown how the performance of machine learning models can be seriously compromised by injecting even a small fraction of poisoning points into the training data. While the effects on model accuracy of such poisoning attacks have been widely studied, their potential effects on other model performance metri...
Shared practices to assess the diversity of retrieval system results are still debated in the Information Retrieval community, partly because of the challenges of determining what diversity means in specific scenarios, and of understanding how diversity is perceived by end-users. The field of Music Information Retrieval is not exempt from this issu...
In this report we provide an improvement of the significance adjustment from the FA*IR algorithm of Zehlike et al., which did not work for very short rankings in combination with a low minimum proportion $p$ for the protected group. We show how the minimum number of protected candidates per ranking position can be calculated exactly and provide a m...
Social media can be used for disaster risk reduction as a complement to traditional information sources, and the literature has suggested numerous ways to achieve this. In the case of floods, for instance, data collection from social media can be triggered by a severe weather forecast and/or a flood prediction. By way of contrast, in this paper we...
Social media can be used for disaster risk reduction as a complement to traditional information sources, and the literature has suggested numerous ways to achieve this. In the case of floods, for instance, data collection from social media can be triggered by a severe weather forecast and/or a flood prediction. By way of contrast, in this paper we...
Social media has become an alternative communication mechanism for the public to reach out to emergency services during time-sensitive events. However, the information overload of social media experienced by these services, coupled with their limited human resources, challenges them to timely identify, prioritize, and organize critical requests for...
Music Recommender Systems (mRS) are designed to give personalised and meaningful recommendations of items (i.e. songs, playlists or artists) to a user base, thereby reflecting and further complementing individual users' specific music preferences. Whilst accuracy metrics have been widely applied to evaluate recommendations in mRS literature, evalua...
We demonstrate the SciLens News Platform, a novel system for evaluating the quality of news articles. The SciLens News Platform automatically collects contextual information about news articles in real-time and provides quality indicators about their validity and trustworthiness. These quality indicators derive from i) social media discussions rega...
We demonstrate the SciLens News Platform, a novel system for evaluating the quality of news articles. The SciLens News Platform automatically collects contextual information about news articles in real-time and provides quality indicators about their validity and trustworthiness. These quality indicators derive from i) social media discussions rega...
We study the problem of selecting the top-k candidates from a pool of applicants, where each candidate is associated with a score indicating his/her aptitude. Depending on the specific scenario, such as job search or college admissions, these scores may be the results of standardized tests or other predictors of future performance and utility. We c...
Artificial Intelligence (AI) and its relation with societies has become an increasingly interesting subject of study for the social sciences. Nevertheless, there is still an important lack of interdisciplinary and empirical research applying social theories to the field of AI. We here aim to shed light on the interactions between humans and autonom...
High-quality human annotations are necessary for creating effective machine learning-driven stream processing systems. We study hybrid stream processing systems based on a Human-In-The-Loop Machine Learning (HITL-ML) paradigm, in which one or many human annotators and an automatic classifier (trained at least partially by the human annotators) labe...
In this paper, we study university admissions under a centralized system that uses grades and standardized test scores to match applicants to university programs. We consider affirmative action policies that seek to increase the number of admitted applicants from underrepresented groups. Since such a policy has to be announced before the start of t...
This document presents the contributions discussed at the second institutional workshop on Artificial Intelligence (AI), organized by the Joint Research Centre (JRC) of the European Commission. This workshop was held on 05th July 2019 at the premises of the JRC in Ispra (Italy), with video-conference to all JRC's sites. The workshop aimed
to gather...
Evaluating (and mitigating) the potential negative effects of algorithms has become a central issue in computer science. While research on algorithmic bias in ranking systems has dealt with disparate exposure of products or individuals, less attention has been devoted to the analysis of the disparate exposure of subgroups of online users.In this pa...
This article describes a method for early detection of disaster-related damage to cultural heritage. It is based on data from social media, a timely and large-scale data source that is nevertheless quite noisy. First, we collect images posted on social media that may refer to a cultural heritage site. Then, we automatically categorize these images...
Research in adversarial machine learning has shown how the performance of machine learning models can be seriously compromised by injecting even a small fraction of poisoning points into the training data. While the effects on model accuracy of such poisoning attacks have been widely studied, their potential effects on other model performance metri...
The Fairness, Accountability, and Transparency in Machine Learning (FAT-ML) literature proposes a varied set of group fairness metrics to measure discrimination against socio-demographic groups that are characterized by a protected feature, such as gender or race.Such a system can be deemed as either fair or unfair depending on the choice of the me...
Although freelancing work has grown substantially in recent years, in part facilitated by a number of online labor marketplaces, (e.g., Guru, Freelancer, Amazon Mechanical Turk), traditional forms of "in-sourcing" work continue being the dominant form of employment. This means that, at least for the time being, freelancing and salaried employment w...
The usage of non-authoritative data for disaster management presents the opportunity of accessing timely information that might not be available through other means, as well as the challenge of dealing with several layers of biases. Wikipedia, a collaboratively-produced encyclopedia, includes in-depth information about many natural and human-made d...
Despite the significant efforts made by the research community in recent years, automatically acquiring valuable information about high impact-events from social media remains challenging. We present EviDense, a graph-based approach for finding high-impact events (such as disaster events) in social media. One of the challenges we address in our wor...
Terror attacks have been linked in part to online extremist content. Online conversations are cloaked in religious ambiguity, with deceptive intentions, often twisted from mainstream meaning to serve a malevolent ideology. Although tens of thousands of Islamist extremism supporters consume such content, they are a small fraction relative to peacefu...
Music recommendations are increasingly part of the listening experience of people all over the world, especially in the context of streaming services. In this scenario, rec-ommender systems' role is to help users in finding music that can fit their interests and tastes. However, Western-centric perspectives in systems' design are often subject to c...
Artificial Intelligence (AI) and its relation with societies is increasingly becoming an interesting object of study from the perspective of sociology and other disciplines. Theories such as the Economy of Conventions (EC) are usually applied in the context of interpersonal relations but there is still a clear lack of studies around how this and ot...
Terror attacks have been linked in part to online extremist content. Although tens of thousands of Islamist extremism supporters consume such content, they are a small fraction relative to peaceful Muslims. The efforts to contain the ever-evolving extremism on social media platforms have remained inadequate and mostly ineffective. Divergent extremi...
This document presents the contributions presented at the first internal workshop on Artificial Intelligence (AI), organized by the Joint Research Centre (JRC) of the European Commission. This workshop was held on 23rd May at the premises of the JRC in Ispra (Italy), with video-conference to all JRC's sites. The workshop aimed to gather JRC special...