Ricardo Ribeiro

Ricardo Ribeiro
Iscte – University Institute of Lisbon | ISCTE

About

135
Publications
22,160
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,346
Citations
Additional affiliations
January 2002 - December 2009
Inesc-ID

Publications

Publications (135)
Article
Full-text available
Sentiment analysis, or opinion mining, is an important task of natural language processing (NLP) that extracts opinions, attitudes, and emotions from text. With the growth of digital platforms like blogs and social networks, opinion mining has become a key tool for organizations to understand public sentiment. In recent research, machine learning a...
Article
Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a C...
Article
Full-text available
The rapid proliferation of hate speech on social media poses significant challenges to maintaining a safe and inclusive digital environment. This paper presents a comprehensive review of automatic hate speech detection methods, with a particular focus on the evolution of approaches from traditional machine learning and deep learning models to the m...
Preprint
Full-text available
The article presents a Multivocal Literature Review (MLR) on the use of Artificial Intelligence (AI) in the real estate sector, aiming to analyze existing applications, literature gaps, and current challenges. The methodology involved defining keywords, searching online repositories, and applying inclusion/exclusion criteria. In total, 185 document...
Article
Full-text available
The rapid rise of social media has brought about new ways of digital communication, along with a worrying increase in online hate speech (HS), which, in turn, has led researchers to develop several Natural Language Processing methods for its detection. Although significant strides have been made in automating HS detection, research focusing on the...
Chapter
Full-text available
In the current digital era, language technologies are playing an increasingly vital role in the legal domain, assisting users, lawyers, judges, and legal professionals to solve many real-world problems. While open datasets and innovative deep learning methodologies have led to recent breakthroughs in the area, significant efforts are still being ma...
Article
Full-text available
Social media platforms offer cost-effective digital marketing opportunities to monitor the market, create user communities, and spread positive opinions. They allow companies with fewer budgets, like startups, to achieve their goals and grow. In fact, studies found that startups with active engagement on those platforms have a higher chance of succ...
Conference Paper
Full-text available
From the perspective of a dialog system, the identification of the intention behind the segments in a dialog is important, as it provides cues regarding the information present in the segments and how they should be interpreted. The ISO 24617-2 standard for dialog act annotation defines a hierarchically organized set of general-purpose communicativ...
Article
This paper addresses the specificities of online hate speech against the Afro-descendant, Roma, and LGBTQ+ communities in Portugal. The research is based on the analysis of CO-HATE, a corpus composed of 20,590 YouTube comments, which were manually annotated following detailed guidelines created for that purpose. We applied methods from corpus lingu...
Preprint
Full-text available
Social media platforms offer cost-effective digital marketing opportunities to monitor the market, create user communities, and spread positive opinions. They allow companies with fewer budgets, like startups, to achieve their goals and grow.In fact, studies found that startups with active engagement on those platforms have a higher chance of succe...
Article
Sentiment analysis of stock-related tweets is a challenging task, not only due to the specificity of the domain but also because of the short nature of the texts. This work proposes SA-MAIS, a two-step lightweight methodology, specially adapted to perform sentiment analysis in domain-constrained short-text messages. To tackle the issue of domain sp...
Chapter
This paper aims to provide to all entities involved in Lisbon tourism activities a geospatial, statistical, and longitudinal analysis tool based on data provided by a mobile operator in cooperation with Lisbon City council, which allows obtaining knowledge about the behaviors and habits of tourists and visitors of the city. The main intention is to...
Article
Full-text available
Social media platforms have become powerful tools for startups, helping them find customers and raise funding. In this study, we applied a social media intelligence-based methodology to analyze startups’ content and to understand how their communication strategies may differ during their scaling process. To understand if a startup’s social media co...
Preprint
Social media platforms have become powerful tools for startups, helping them find customers and raise funding. Analysing the contents posted through social media would help them make the best use of this communication and scale their business. To understand if a startup’s social media content reflects its position in its business maturation, we sta...
Article
Full-text available
Consumers use technologies to share their experiences, leading to the creation of online platforms where the main objective is to allow users to share their opinion about products or services, such as hotels, books, restaurants, and search for the opinions of other users. The emergence of these online platforms has changed the business dynamics, th...
Article
The rapid spread of COVID-19 around the world had a significant impact on daily life. As in other countries, measures were taken in Portugal to combat the exponential increase of cases, such as curfews and the use of masks. Thus, in parallel with the direct consequences on health and the healthcare sector, the pandemic also caused changes in human...
Article
Given Airbnb's changes since its inception and the dynamism of customer preferences, a study that sheds light on how customer satisfaction is evolving is relevant. An automated method is proposed for identifying these satisfaction tendencies at a large scale. This study follows a text mining approach to analyse 590,070 reviews posted between 2010 a...
Article
Purpose Considering the importance of the content created by the host for Airbnb consumers while making purchasing decisions, this study aims to analyze how the Airbnb hosts promote their properties by revealing the predominant attributes considered by hosts when advertising them. Design/methodology/approach The unstructured textual content of onl...
Preprint
Full-text available
In a previous paper, we have proposed a set of concepts, axiom schemata and algorithms that can be used by agents to learn to describe their behaviour, goals, capabilities, and environment. The current paper proposes a new set of concepts, axiom schemata and algorithms that allow the agent to learn new descriptions of an observed behaviour (e.g., p...
Preprint
Full-text available
The development of artificial agents able to learn through dialog without domain restrictions has the potential to allow machines to learn how to perform tasks in a similar manner to humans and change how we relate to them. However, research in this area is practically nonexistent. In this paper, we identify the modifications required for a dialog...
Article
Full-text available
From the perspective of a dialog system, it is important to identify the intention behind the segments in a dialog, since it provides an important cue regarding the information that is present in the segments and how they should be interpreted. ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-pu...
Article
Full-text available
Embodied Cognition (EC) states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience, making this biological semantic machinery noisy with resp...
Article
Purpose Real estate agents are professionals who need up-to-date and accurate information about their clients in order to maintain profitable and long-lasting relationships with each of them. A satisfied customer can be very valuable and profitable in the long term. This research focuses on solving the problem of the lack of a mobile Customer Relat...
Article
Full-text available
Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We explore this aspect of cognition, by considering dance as an overt expression of semantic aspects of mus...
Technical Report
Full-text available
Este relatório foi desenvolvido por uma equipa de investigadores do Iscte - Instituto Universitário de Lisboa e do LNEG - Laboratório Nacional de Energia e Geologia para a ANI - Agência Nacional de Inovação, no âmbito de uma aquisição de serviços de consultoria para a realização de um estudo com o objetivo de conhecer o mercado da contratação públi...
Article
Restaurant management requires customer responsiveness to deal with increasingly higher expectations and market competitiveness. This study proposes an approach to simplify the decision-making process of restaurant managers by combining both live social media customer feedback and historical sales data in a sales forecast model (based on TripAdviso...
Article
Full-text available
Different forms of verbal aggression are often present in cyberbullying, which may impair executive function skills that enable the regulation of emotions and behavior. Emotion and behavioral regulation has been associated with better social adjustment and more positive interactions between peers. This study aimed to understand if fostering emotion...
Preprint
Full-text available
Context: Profiling developers is challenging since many factors, such as their skills, experience, development environment and behaviors, may influence a detailed analysis and the delivery of coherent interpretations. Objective: We aim at profiling software developers by mining their software development process. To do so, we performed a controll...
Conference Paper
Football player’s performance can be measured in an objective way (e. g. Goals scored, assists, interceptions), this being seldom a method to compare and rank the best players by categories. Over years of study, many other factors that can influence the players performance were discovered and studied, considering not only objective factors, but als...
Article
Full-text available
This paper presents DisBot, the first Portuguese speaking chatbot that uses social media retrieved knowledge to support citizens and first-responders in disaster scenarios, in order to improve community resilience and decision-making. It was developed and tested using Design Science Research Methodology (DSRM), being progressively matured with fiel...
Article
Full-text available
Resources such as FrameNet, which provide sets of semantic frame definitions and annotated textual data that maps into the evoked frames, are important for several NLP tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised mann...
Article
Full-text available
This research is aimed at creating and presenting DisKnow, a data extraction system with the capability of filtering and abstracting tweets, to improve community resilience and decision-making in disaster scenarios. Nowadays most people act as human sensors, exposing detailed information regarding occurring disasters, in social media. Through a pip...
Chapter
Considering the wide offer of mobile applications available nowadays, effective search engines are imperative for an user to find applications that provide a specific desired functionality. Retrieval approaches that leverage topic similarity between queries and applications have shown promising results in previous studies. However, the search engin...
Article
Full-text available
The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but, in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed...
Preprint
ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-purpose communicative functions. The automatic recognition of these functions, although practically unexplored, is relevant for a dialog system, since they provide cues regarding the intention behind the segments and how they should be interpreted...
Conference Paper
Full-text available
Resources such as FrameNet provide semantic information that is important for multiple tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspe...
Article
Full-text available
Automatic dialog act recognition is a task that has been widely explored over the years. In recent works, most approaches to the task explored different deep neural network architectures to combine the representations of the words in a segment and generate a segment representation that provides cues for intention. In this study, we explore means to...
Preprint
Full-text available
Dialog acts reveal the intention behind the uttered words. Thus, their automatic recognition is important for a dialog system trying to understand its conversational partner. The study presented in this article approaches that task on the DIHANA corpus, whose three-level dialog act annotation scheme poses problems which have not been explored in re...
Article
Full-text available
Os actos de diálogo revelam a intenção por trás das palavras pronunciadas. Por isso, o seu reconhecimento automático é importante para um sistema de diálogo que tenta entender o seu interlocutor. O estudo apresentado neste artigo aborda essa tarefa no corpus DIHANA, cujo esquema de anotação de actos de diálogo em três níveis coloca problemas que nã...
Preprint
Full-text available
Embodied cognition states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience history, making this biological semantic machinery noisy with r...
Article
This empirical data-driven research aims to unveil thought-provoking insights on the U.S. hotel offer across its 50 states. Information of more than 30,000 hotels was collected through web scraping from TripAdvisor. Using such data, 50 support vector machine models were trained to model the TripAdvisor score, one per state, to assess the convergent...
Article
Full-text available
We claim that it is possible to have artificial software agents for which their actions and the world they inhabit have first-person or intrinsic meanings. The first-person or intrinsic meaning of an entity to a system is defined as its relation with the system's goals and capabilities, given the properties of the environment in which it operates....
Article
Automatic cyberbullying detection is a task of growing interest, particularly in the Natural Language Processing and Machine Learning communities. Not only is it challenging, but it is also a relevant need given how social networks have become a vital part of individuals' lives and how dire the consequences of cyberbullying can be, especially among...
Preprint
Full-text available
Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We leverage this aspect of cognition, by considering dance as a proxy for music perception, in a statistica...
Article
Full-text available
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention....
Conference Paper
Full-text available
The three-level dialog act annotation scheme of the DIHANA corpus poses a multi-level classification problem in which the bottom levels allow multiple or no labels for a single segment. We approach automatic dialog act recognition on the three lev- els using an end-to-end approach, in order to implicitly cap- ture relations between them. Our deep n...
Article
Purpose This study aims to propose a data-driven approach, based on open-source tools, that makes it possible to understand customer satisfaction of the accommodation offer of a whole country. Design/methodology/approach The method starts by extracting information from all hotels of Portugal available at TripAdvisor through Web scraping. Then, a...
Article
Purpose: To develop a model to predict online review ratings from multiple sources, which can be used to detect fraudulent reviews, create proprietary rating indexes, or which can be employed as a measure of selection in recommender systems. Methodology: This study applies machine learning and natural language processing approaches to combine feat...
Chapter
Full-text available
Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function o...
Chapter
Sentence compression produces a shorter sentence by removing redundant information, preserving the grammaticality and the important content. We propose an improvement to current neural deletion systems. These systems output a binary sequence of labels for an input sentence: one indicates that the token from the source sentence remains in the compre...
Preprint
Full-text available
A dialog act is a representation of an intention transmitted in the form of words. In this sense, when someone wants to transmit some intention, it is revealed both in the selected words and in how they are combined to form a structured segment. Furthermore, the intentions of a speaker depend not only on her intrinsic motivation, but also on the hi...
Conference Paper
Full-text available
The IT incident management process requires a correct categorization to attribute incident tickets to the right resolution group and obtain as quickly as possible an operational system, impacting the minimum as possible the business and costumers. In this work, we introduce automatic text classification, demonstrating the application of several nat...
Preprint
Full-text available
Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function o...
Article
Full-text available
Online reviews are one of the main influencers of hotel purchase decisions. This study performs an analysis of reviews extracted from well-known online review sources in combination with hotel sales data and concludes that ratings differ according to the language of reviews. Data science tools have been applied to English, Spanish, and Portuguese r...
Article
Full-text available
Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces. The resulting semantic spaces are theoretically limited, either because the chosen high-level tags do not cover all of music semantics or because audio data itself is not enough to determine mu...