Felipe Bravo-Marquez

Felipe Bravo-Marquez
University of Chile · Departamento de Ciencias de la Computación

PhD in Computer Science

About

58
Publications
27,253
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,634
Citations
Additional affiliations
June 2017 - February 2019
University of Waikato
Position
  • Research Associate
March 2011 - December 2013
Yahoo! Labs Santiago
Position
  • Engineer
Education
March 2014 - July 2017
University of Waikato
Field of study
  • Computer Science
March 2011 - September 2013
University of Chile
Field of study
  • Computer Science
March 2003 - October 2010
University of Chile
Field of study
  • Industrial Engineering

Publications

Publications (58)
Conference Paper
Full-text available
We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring the affectual state of a person from their tweet. For each task, we created labeled data from English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. emotion intensity ordinal classification, 3. valence (...
Conference Paper
Full-text available
Word embeddings are known to exhibit stereotyp-ical biases towards gender, race, religion, among other criteria. Several fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective , the relationship between them is by no means clear. Two issues that prevent a clean comparison i...
Conference Paper
Full-text available
Chile experienced a series of important protests between Oc-tober and December 2019. This social unrest, as it was called, was fueled by social inequity and radically affected the na-tion's status quo. A large portion of the population demanded a new Constitution and changes to the current government, whereas another part of the population rejected...
Conference Paper
Full-text available
Word embeddings (WE) have been shown to capture biases from the text they are trained on, which has led to the development of several bias measurement metrics and bias mitigation algorithms (i.e., methods that transform the embedding space to reduce bias). This study identifies three confounding factors that hinder the comparison of bias mitigation...
Conference Paper
Full-text available
Word embeddings have become essential components in various information retrieval and natural language processing tasks, such as ranking, document classification, and question answering. However, despite their widespread use, traditional word embedding models present a limitation in their static nature, which hampers their ability to adapt to the c...
Conference Paper
Full-text available
Participatory society has often been regarded positively , frequently associated with the ideals of a more democratic and equitable civilization. Nevertheless, the idea of participation may act as a two-sided phenomenon in terms of empowerment, especially in the realm of social media platforms. This dichotomy is evident as increased participation o...
Conference Paper
Full-text available
Word embeddings (WEs) often reflect biases present in their training data, and various bias mitigation and evaluation techniques have been proposed to address this. Existing benchmarks for comparing different debiasing methods overlook two factors: the choice of training words and model hyper-parameters. We propose a robust comparison methodology t...
Article
Full-text available
There has been extensive work on human word sense annotation, i.e., manually labeling word uses in natural texts according to their senses. Such labels were primarily created for the tasks of Word Sense Disambiguation (WSD) and Word Sense Induction (WSI). However, almost all datasets annotated with word senses are synchronic datasets, i.e., contain...
Article
Full-text available
Los modernos modelos de lenguaje, representados por asistentes virtuales y chatbots como ChatGPT y Google Bard, han transformado la manera en la que nos relacionamos con las máquinas, permitiéndonos interactuar con ellas de la misma forma con la que interactuamos con nuestros pares humanos, usando el lenguaje. Estas impresionantes capacidades no so...
Conference Paper
Full-text available
Temporal video grounding is a fundamental task in computer vision, aiming to localize a natural language query in a long, untrimmed video. It has a key role in the scientific community, in part due to the large amount of video generated every day. Although we find extensive work in this task, we note that research remains focused on a small selecti...
Article
Full-text available
Music inpainting is a sub-task of automated music generation that aims to infill incomplete musical pieces to help musicians in their musical composition process. Many methods have been developed for this task. However, we observe a tendency for each method to be evaluated using different datasets and metrics in the papers where they are presented....
Article
Full-text available
Here we describe a new clinical corpus rich in nested entities and a series of neural models to identify them. The corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 5,000 referrals (58.6% medical and 41.4% dental) was manually annotated with 10 types of entities, six attributes, and pairs of rel...
Article
Full-text available
Chile experienced a series of important protests between October and December 2019. This social unrest, as it was called, was fueled by social inequity and radically affected the nation's status quo. A large portion of the population demanded a new Constitution and changes to the current government, whereas another part of the population rejected t...
Preprint
Full-text available
We present the first shared task on semantic change discovery and detection in Spanish and create the first dataset of Spanish words manually annotated for semantic change using the DURel framework (Schlechtweg et al., 2018). The task is divided in two phases: 1) Graded Change Discovery, and 2) Binary Change Detection. In addition to introducing a...
Preprint
Full-text available
In recent years there have been considerable advances in pre-trained language models, where non-English language versions have also been made available. Due to their increasing use, many lightweight versions of these models (with reduced parameters) have also been released to speed up training and inference times. However, versions of these lighter...
Preprint
Full-text available
Due to the success of pre-trained language models, versions of languages other than English have been released in recent years. This fact implies the need for resources to evaluate these models. In the case of Spanish, there are few ways to systematically assess the models' quality. In this paper, we narrow the gap by building two evaluation benchm...
Article
Full-text available
The popularity of mobile devices with GPS capabilities, along with the worldwide adoption of social media, have created a rich source of text data combined with spatio-temporal information. Text data collected from location-based social networks can be used to gain space–time insights into human behavior and provide a view of time and space from th...
Article
Full-text available
A sentiment lexicon is a list of expressions annotated according to affect categories such as positive, negative, anger and fear. Lexicons are widely used in sentiment classification of tweets, especially when labeled messages are scarce. Sentiment lexicons are prone to obsolescence due to: 1) the arrival of new sentiment-conveying expressions such...
Article
Full-text available
Background Three popular application domains of sentiment and emotion analysis are: 1) the automatic rating of movie reviews, 2) extracting opinions and emotions on Twitter, and 3) inferring sentiment and emotion associations of words. The textual elements of these domains differ in their length i.e., movie reviews are usually longer than tweets an...
Article
Full-text available
We describe an ecosystem for teaching data science (DS) to engineers that blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences (FCFM is its Spanish acronym), Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from...
Preprint
Full-text available
To avoid the "meaning conflation deficiency" of word embeddings, a number of models have aimed to embed individual word senses. These methods at one time performed well on tasks such as word sense induction (WSI), but they have since been overtaken by task-specific techniques which exploit contextualized embeddings. However, sense embeddings and co...
Preprint
Full-text available
We describe an ecosystem for teaching data science (DS) to engineers which blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences, Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional en...
Article
Full-text available
GPS-enabled devices and social media popularity have created an unprecedented opportunity for researchers to collect, explore, and analyze text data with fine-grained spatial and temporal metadata. In this sense, text, time and space are different domains with their own representation scales and methods. This poses a challenge on how to detect rele...
Article
Full-text available
Twitter constitutes a rich resource for investigating language contact phenomena. In this paper, we report findings from the analysis of a large-scale diachronic corpus of over one million tweets, containing loanwords from te reo Māori, the indigenous language spoken in New Zealand, into (primarily, New Zealand) English. Our analysis focuses on has...
Conference Paper
Full-text available
This paper describes a submission to the Word-in-Context competition for the IJCAI 2019 SemDeep-5 workshop. The task is to determine whether a given focus word is used in the same or different senses in two contexts. We took an ELMo-inspired approach similar to the baseline model in the task description paper, where contextualized representations a...
Conference Paper
Full-text available
Māori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the Māori community. Motivated by the lack of linguistic resources for studying how Māori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected Māori...
Article
Full-text available
AffectiveTweets is a set of programs for analyzing emotion and sentiment of social media messages such as tweets. It is implemented as a package for the Weka machine learning workbench and provides methods for calculating state-of-the-art affect analysis features from tweets that can be fed into machine learning algorithms implemented in Weka. It a...
Article
Full-text available
Deep learning is a branch of machine learning that generates multi-layered representations of data, commonly using artificial neural networks, and has improved the state-of-the-art in various machine learning tasks (e.g., image classification, object detection, speech recognition, and document classification). However, most popular deep learning fr...
Conference Paper
Full-text available
This article presents WekaCoin, a peer-to-peer cryptocurrency based on a new distributed consensus protocol called Proof-of-Learning. Proof-of-learning achieves distributed consensus by ranking machine learning systems for a given task. The aim of this protocol is to alleviate the computational waste involved in hashing-based puzzles and to create...
Article
Full-text available
Message-level and word-level polarity classification are two popular tasks in Twitter sentiment analysis. They have been commonly addressed by training supervised models from labelled data. The main limitation of these models is the high cost of data annotation. Transferring existing labels from a related problem domain is one possible solution for...
Article
Full-text available
We present the first shared task on detecting the intensity of emotion felt by the speaker of a tweet. We create the first datasets of tweets annotated for anger, fear, joy, and sadness intensities using a technique called best--worst scaling (BWS). We show that the annotations lead to reliable fine-grained intensity scores (rankings of tweets by i...
Article
Full-text available
This paper examines the task of detecting intensity of emotion from text. We create the first datasets of tweets annotated for anger, fear, joy, and sadness intensities. We use a technique called best--worst scaling (BWS) that improves annotation consistency and obtains reliable fine-grained scores. We show that emotion-word hashtags often impact e...
Conference Paper
Full-text available
Message-level and word-level polarity classification are two popular tasks in Twitter sentiment analysis. They have been commonly addressed by training supervised models from labelled data. The main limitation of these models is the high cost of data annotation. Transferring existing labels from a related problem domain is one possible solution for...
Conference Paper
Full-text available
The automatic detection of emotions in Twitter posts is a challenging task due to the informal nature of the language used in this platform. In this paper, we propose a methodology for expanding the NRC word-emotion association lexicon for the language used in Twitter. We perform this expansion using multi-label classification of words and compare...
Conference Paper
Full-text available
The classification of tweets into polarity classes is a popular task in sentiment analysis. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. A drawback of these approaches is the high cost involved in data annotation. Two freely available resources that can be explo...
Article
Full-text available
Opinion lexicons, which are lists of terms labelled by sentiment, are widely used resources to support automatic sentiment analysis of textual passages. However, existing resources of this type exhibit some limitations when applied to social media messages such as tweets (posts in Twitter), because they are unable to capture the diversity of inform...
Conference Paper
Full-text available
In this article, we propose a word-level classification model for automatically generating a Twitter-specific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: a bag-of-words vector and a semantic vector based on word-clusters. We propose a distributional representation for words by treat...
Conference Paper
Full-text available
We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains part-of-speech (POS) disambiguated entries with a three-dimensional probability distribution for positive , negative, and neutral polarities. To obtain this distribution using machine learning, we propose word-level attributes based on POS tags and i...
Article
Full-text available
Plagiarism refers to the act of presenting external words, thoughts, or ideas as one’s own, without providing references to the sources from which they were taken. The exponential growth of different digital document sources available on the Web has facilitated the spread of this practice, making the accurate detection of it a crucial task for educ...
Article
Full-text available
This work proposes an extension of Bing Liu’s aspect-based opinion mining approach in order to apply it to the tourism domain. The extension concerns with the fact that users refer differently to different kinds of products when writing reviews on the Web. Since Liu’s approach is focused on physical product reviews, it could not be directly applied...
Article
Full-text available
People react to events, topics and entities by expressing their personal opinions and emotions. These reactions can correspond to a wide range of intensities, from very mild to strong. An adequate processing and understanding of these expressions has been the subject of research in several fields, such as business and politics. In this context, Twi...
Conference Paper
Full-text available
In this paper, we propose OpinionZoom, a modular software that helps users in an easy manner to understand the vast amount of tourism opinions disposed all over the Web. We also successfully implemented and tested OpinionZoom, encompassing the situation of the tourism industry in Los Lagos, also known as the Lake District, in Chile. Results showed...
Conference Paper
Full-text available
In this study we extend Bing Liu’s aspect-based opinion mining technique to apply it to the tourism domain. Using this extension, we also offer an approach for considering a new alternative to discover consumer preferences about tourism products, particularly hotels and restaurants, using opinions available on the Web as reviews. An experiment is a...
Conference Paper
Full-text available
Twitter sentiment analysis or the task of automatically retrieving opinions from tweets has received an increasing interest from the web mining community. This is due to its importance in a wide range of fields such as business and politics. People express sentiments about specific topics or entities with different strengths and intensities, where...
Conference Paper
Full-text available
Usually time series are controlled by generative processes which display changes over time. On many occasions, two or more generative processes may switch forcing the abrupt replacement of a fitted time series model by another one. We claim that the incorporation of past data can be useful in the presence of concept shift. We believe that history t...
Conference Paper
Full-text available
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models...
Conference Paper
Full-text available
In this work we conduct an empirical study of opinion time series created from Twitter data regarding the 2008 U.S. elections. The focus of our proposal is to establish whether a time series is appropriate or not for generating a reliable predictive model. We analyze time series obtained from Twitter messages related to the 2008 U.S. elections usin...
Conference Paper
Full-text available
This work presents a sentence ranking strategy based on dis-tant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose build-ing a training and a testing corpus from Wikinews. Wikinews articles are modeled...
Conference Paper
Full-text available
The retrieval of similar documents from the Web using documents as input instead of key-term queries is not currently supported by traditional Web search engines. One approach for solving the problem consists of fingerprint the document's content into a set of queries that are submitted to a list of Web search engines. Afterward, results are merged...
Conference Paper
Full-text available
Reading comprehension is one of the main concerns for educational institutions, as it forges the students' ability to comprehend and learn accurately a given information source (e.g. textbooks, articles, papers, etc.). However, there are few approaches that integrates digital sources of educational information with automated systems to detect wheth...
Conference Paper
Full-text available
The retrieval of similar documents from large scale datasets has been the one of the main concerns in knowledge management environments, such as plagiarism detection, news impact analysis, and the matching of ideas within sets of documents. In all of these applications, a light-weight architecture can be considered as fundamental for the large scal...

Network

Cited By