Felipe Bravo-Marquez

Felipe Bravo-Marquez
University of Chile · Departamento de Ciencias de la Computación

PhD in Computer Science

About

48
Publications
20,993
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,672
Citations
Additional affiliations
June 2017 - February 2019
The University of Waikato
Position
  • Research Associate
March 2011 - December 2013
Yahoo! Labs Santiago
Position
  • Engineer
Education
March 2014 - July 2017
The University of Waikato
Field of study
  • Computer Science
March 2011 - September 2013
University of Chile
Field of study
  • Computer Science
March 2003 - October 2010
University of Chile
Field of study
  • Industrial Engineering

Publications

Publications (48)
Conference Paper
Full-text available
We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring the affectual state of a person from their tweet. For each task, we created labeled data from English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. emotion intensity ordinal classification, 3. valence (...
Article
Full-text available
Deep learning is a branch of machine learning that generates multi-layered representations of data, commonly using artificial neural networks, and has improved the state-of-the-art in various machine learning tasks (e.g., image classification, object detection, speech recognition, and document classification). However, most popular deep learning fr...
Article
Full-text available
AffectiveTweets is a set of programs for analyzing emotion and sentiment of social media messages such as tweets. It is implemented as a package for the Weka machine learning workbench and provides methods for calculating state-of-the-art affect analysis features from tweets that can be fed into machine learning algorithms implemented in Weka. It a...
Conference Paper
Full-text available
Word embeddings are known to exhibit stereotyp-ical biases towards gender, race, religion, among other criteria. Several fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective , the relationship between them is by no means clear. Two issues that prevent a clean comparison i...
Article
Full-text available
GPS-enabled devices and social media popularity have created an unprecedented opportunity for researchers to collect, explore, and analyze text data with fine-grained spatial and temporal metadata. In this sense, text, time and space are different domains with their own representation scales and methods. This poses a challenge on how to detect rele...
Article
Full-text available
Here we describe a new clinical corpus rich in nested entities and a series of neural models to identify them. The corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 5,000 referrals (58.6% medical and 41.4% dental) was manually annotated with 10 types of entities, six attributes, and pairs of rel...
Conference Paper
Full-text available
Chile experienced a series of important protests between Oc-tober and December 2019. This social unrest, as it was called, was fueled by social inequity and radically affected the na-tion's status quo. A large portion of the population demanded a new Constitution and changes to the current government, whereas another part of the population rejected...
Preprint
We present the first shared task on semantic change discovery and detection in Spanish and create the first dataset of Spanish words manually annotated for semantic change using the DURel framework (Schlechtweg et al., 2018). The task is divided in two phases: 1) Graded Change Discovery, and 2) Binary Change Detection. In addition to introducing a...
Preprint
Full-text available
In recent years there have been considerable advances in pre-trained language models, where non-English language versions have also been made available. Due to their increasing use, many lightweight versions of these models (with reduced parameters) have also been released to speed up training and inference times. However, versions of these lighter...
Preprint
Due to the success of pre-trained language models, versions of languages other than English have been released in recent years. This fact implies the need for resources to evaluate these models. In the case of Spanish, there are few ways to systematically assess the models' quality. In this paper, we narrow the gap by building two evaluation benchm...
Article
Full-text available
The popularity of mobile devices with GPS capabilities, along with the worldwide adoption of social media, have created a rich source of text data combined with spatio-temporal information. Text data collected from location-based social networks can be used to gain space–time insights into human behavior and provide a view of time and space from th...
Article
Full-text available
A sentiment lexicon is a list of expressions annotated according to affect categories such as positive, negative, anger and fear. Lexicons are widely used in sentiment classification of tweets, especially when labeled messages are scarce. Sentiment lexicons are prone to obsolescence due to: 1) the arrival of new sentiment-conveying expressions such...
Article
Full-text available
Background Three popular application domains of sentiment and emotion analysis are: 1) the automatic rating of movie reviews, 2) extracting opinions and emotions on Twitter, and 3) inferring sentiment and emotion associations of words. The textual elements of these domains differ in their length i.e., movie reviews are usually longer than tweets an...
Article
Full-text available
We describe an ecosystem for teaching data science (DS) to engineers that blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences (FCFM is its Spanish acronym), Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from...
Preprint
Full-text available
To avoid the "meaning conflation deficiency" of word embeddings, a number of models have aimed to embed individual word senses. These methods at one time performed well on tasks such as word sense induction (WSI), but they have since been overtaken by task-specific techniques which exploit contextualized embeddings. However, sense embeddings and co...
Preprint
Full-text available
We describe an ecosystem for teaching data science (DS) to engineers which blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences, Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional en...
Article
Full-text available
Twitter constitutes a rich resource for investigating language contact phenomena. In this paper, we report findings from the analysis of a large-scale diachronic corpus of over one million tweets, containing loanwords from te reo Māori, the indigenous language spoken in New Zealand, into (primarily, New Zealand) English. Our analysis focuses on has...
Conference Paper
Full-text available
This paper describes a submission to the Word-in-Context competition for the IJCAI 2019 SemDeep-5 workshop. The task is to determine whether a given focus word is used in the same or different senses in two contexts. We took an ELMo-inspired approach similar to the baseline model in the task description paper, where contextualized representations a...
Conference Paper
Full-text available
Māori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the Māori community. Motivated by the lack of linguistic resources for studying how Māori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected Māori...
Conference Paper
Full-text available
This article presents WekaCoin, a peer-to-peer cryptocurrency based on a new distributed consensus protocol called Proof-of-Learning. Proof-of-learning achieves distributed consensus by ranking machine learning systems for a given task. The aim of this protocol is to alleviate the computational waste involved in hashing-based puzzles and to create...
Article
Full-text available
Message-level and word-level polarity classification are two popular tasks in Twitter sentiment analysis. They have been commonly addressed by training supervised models from labelled data. The main limitation of these models is the high cost of data annotation. Transferring existing labels from a related problem domain is one possible solution for...
Article
Full-text available
We present the first shared task on detecting the intensity of emotion felt by the speaker of a tweet. We create the first datasets of tweets annotated for anger, fear, joy, and sadness intensities using a technique called best--worst scaling (BWS). We show that the annotations lead to reliable fine-grained intensity scores (rankings of tweets by i...
Article
Full-text available
This paper examines the task of detecting intensity of emotion from text. We create the first datasets of tweets annotated for anger, fear, joy, and sadness intensities. We use a technique called best--worst scaling (BWS) that improves annotation consistency and obtains reliable fine-grained scores. We show that emotion-word hashtags often impact e...
Conference Paper
Full-text available
Message-level and word-level polarity classification are two popular tasks in Twitter sentiment analysis. They have been commonly addressed by training supervised models from labelled data. The main limitation of these models is the high cost of data annotation. Transferring existing labels from a related problem domain is one possible solution for...
Conference Paper
Full-text available
The automatic detection of emotions in Twitter posts is a challenging task due to the informal nature of the language used in this platform. In this paper, we propose a methodology for expanding the NRC word-emotion association lexicon for the language used in Twitter. We perform this expansion using multi-label classification of words and compare...
Conference Paper
Full-text available
The classification of tweets into polarity classes is a popular task in sentiment analysis. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. A drawback of these approaches is the high cost involved in data annotation. Two freely available resources that can be explo...
Article
Full-text available
Opinion lexicons, which are lists of terms labelled by sentiment, are widely used resources to support automatic sentiment analysis of textual passages. However, existing resources of this type exhibit some limitations when applied to social media messages such as tweets (posts in Twitter), because they are unable to capture the diversity of inform...
Conference Paper
Full-text available
In this article, we propose a word-level classification model for automatically generating a Twitter-specific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: a bag-of-words vector and a semantic vector based on word-clusters. We propose a distributional representation for words by treat...
Conference Paper
Full-text available
We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains part-of-speech (POS) disambiguated entries with a three-dimensional probability distribution for positive , negative, and neutral polarities. To obtain this distribution using machine learning, we propose word-level attributes based on POS tags and i...
Article
Full-text available
Plagiarism refers to the act of presenting external words, thoughts, or ideas as one’s own, without providing references to the sources from which they were taken. The exponential growth of different digital document sources available on the Web has facilitated the spread of this practice, making the accurate detection of it a crucial task for educ...
Article
Full-text available
This work proposes an extension of Bing Liu’s aspect-based opinion mining approach in order to apply it to the tourism domain. The extension concerns with the fact that users refer differently to different kinds of products when writing reviews on the Web. Since Liu’s approach is focused on physical product reviews, it could not be directly applied...
Article
Full-text available
People react to events, topics and entities by expressing their personal opinions and emotions. These reactions can correspond to a wide range of intensities, from very mild to strong. An adequate processing and understanding of these expressions has been the subject of research in several fields, such as business and politics. In this context, Twi...
Conference Paper
Full-text available
In this paper, we propose OpinionZoom, a modular software that helps users in an easy manner to understand the vast amount of tourism opinions disposed all over the Web. We also successfully implemented and tested OpinionZoom, encompassing the situation of the tourism industry in Los Lagos, also known as the Lake District, in Chile. Results showed...
Conference Paper
Full-text available
In this study we extend Bing Liu’s aspect-based opinion mining technique to apply it to the tourism domain. Using this extension, we also offer an approach for considering a new alternative to discover consumer preferences about tourism products, particularly hotels and restaurants, using opinions available on the Web as reviews. An experiment is a...
Conference Paper
Full-text available
Twitter sentiment analysis or the task of automatically retrieving opinions from tweets has received an increasing interest from the web mining community. This is due to its importance in a wide range of fields such as business and politics. People express sentiments about specific topics or entities with different strengths and intensities, where...
Conference Paper
Full-text available
Usually time series are controlled by generative processes which display changes over time. On many occasions, two or more generative processes may switch forcing the abrupt replacement of a fitted time series model by another one. We claim that the incorporation of past data can be useful in the presence of concept shift. We believe that history t...
Conference Paper
Full-text available
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models...
Conference Paper
Full-text available
In this work we conduct an empirical study of opinion time series created from Twitter data regarding the 2008 U.S. elections. The focus of our proposal is to establish whether a time series is appropriate or not for generating a reliable predictive model. We analyze time series obtained from Twitter messages related to the 2008 U.S. elections usin...
Conference Paper
Full-text available
This work presents a sentence ranking strategy based on dis-tant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose build-ing a training and a testing corpus from Wikinews. Wikinews articles are modeled...
Conference Paper
Full-text available
The retrieval of similar documents from the Web using documents as input instead of key-term queries is not currently supported by traditional Web search engines. One approach for solving the problem consists of fingerprint the document's content into a set of queries that are submitted to a list of Web search engines. Afterward, results are merged...
Conference Paper
Full-text available
Reading comprehension is one of the main concerns for educational institutions, as it forges the students' ability to comprehend and learn accurately a given information source (e.g. textbooks, articles, papers, etc.). However, there are few approaches that integrates digital sources of educational information with automated systems to detect wheth...
Conference Paper
Full-text available
The retrieval of similar documents from large scale datasets has been the one of the main concerns in knowledge management environments, such as plagiarism detection, news impact analysis, and the matching of ideas within sets of documents. In all of these applications, a light-weight architecture can be considered as fundamental for the large scal...

Projects

Projects (2)
Project
Implement a Weka package for sentiment analysis of tweets. More info at: https://github.com/felipebravom/AffectiveTweets