
Ricardo RibeiroIscte – University Institute of Lisbon | ISCTE
Ricardo Ribeiro
About
135
Publications
22,160
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,346
Citations
Introduction
Additional affiliations
January 2002 - December 2009
Publications
Publications (135)
Sentiment analysis, or opinion mining, is an important task of natural language processing (NLP) that extracts opinions, attitudes, and emotions from text. With the growth of digital platforms like blogs and social networks, opinion mining has become a key tool for organizations to understand public sentiment. In recent research, machine learning a...
Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a C...
The rapid proliferation of hate speech on social media poses significant challenges to maintaining a safe and inclusive digital environment. This paper presents a comprehensive review of automatic hate speech detection methods, with a particular focus on the evolution of approaches from traditional machine learning and deep learning models to the m...
The article presents a Multivocal Literature Review (MLR) on the use of Artificial Intelligence (AI) in the real estate sector, aiming to analyze existing applications, literature gaps, and current challenges. The methodology involved defining keywords, searching online repositories, and applying inclusion/exclusion criteria. In total, 185 document...
The rapid rise of social media has brought about new ways of digital communication, along with a worrying increase in online hate speech (HS), which, in turn, has led researchers to develop several Natural Language Processing methods for its detection. Although significant strides have been made in automating HS detection, research focusing on the...
In the current digital era, language technologies are playing an increasingly vital role in the legal domain, assisting users, lawyers, judges, and legal professionals to solve many real-world problems. While open datasets and innovative deep learning methodologies have led to recent breakthroughs in the area, significant efforts are still being ma...
Social media platforms offer cost-effective digital marketing opportunities to monitor the market, create user communities, and spread positive opinions. They allow companies with fewer budgets, like startups, to achieve their goals and grow. In fact, studies found that startups with active engagement on those platforms have a higher chance of succ...
From the perspective of a dialog system, the identification of the intention behind the segments in a dialog is important, as it provides cues regarding the information present in the segments and how they should be interpreted. The ISO 24617-2 standard for dialog act annotation defines a hierarchically organized set of general-purpose communicativ...
This paper addresses the specificities of online hate speech against the Afro-descendant, Roma, and LGBTQ+ communities in Portugal. The research is based on the analysis of CO-HATE, a corpus composed of 20,590 YouTube comments, which were manually annotated following detailed guidelines created for that purpose. We applied methods from corpus lingu...
Social media platforms offer cost-effective digital marketing opportunities to monitor the market, create user communities, and spread positive opinions. They allow companies with fewer budgets, like startups, to achieve their goals and grow.In fact, studies found that startups with active engagement on those platforms have a higher chance of succe...
Sentiment analysis of stock-related tweets is a challenging task, not only due to the specificity of the domain but also because of the short nature of the texts. This work proposes SA-MAIS, a two-step lightweight methodology, specially adapted to perform sentiment analysis in domain-constrained short-text messages. To tackle the issue of domain sp...
This paper aims to provide to all entities involved in Lisbon tourism activities a geospatial, statistical, and longitudinal analysis tool based on data provided by a mobile operator in cooperation with Lisbon City council, which allows obtaining knowledge about the behaviors and habits of tourists and visitors of the city. The main intention is to...
Social media platforms have become powerful tools for startups, helping them find customers and raise funding. In this study, we applied a social media intelligence-based methodology to analyze startups’ content and to understand how their communication strategies may differ during their scaling process. To understand if a startup’s social media co...
Social media platforms have become powerful tools for startups, helping them find customers and raise funding. Analysing the contents posted through social media would help them make the best use of this communication and scale their business. To understand if a startup’s social media content reflects its position in its business maturation, we sta...
Consumers use technologies to share their experiences, leading to the creation of online platforms where the main objective is to allow users to share their opinion about products or services, such as hotels, books, restaurants, and search for the opinions of other users. The emergence of these online platforms has changed the business dynamics, th...
The rapid spread of COVID-19 around the world had a significant impact on daily life. As in other countries, measures were taken in Portugal to combat the exponential increase of cases, such as curfews and the use of masks. Thus, in parallel with the direct consequences on health and the healthcare sector, the pandemic also caused changes in human...
Given Airbnb's changes since its inception and the dynamism of customer preferences, a study that sheds light on how customer satisfaction is evolving is relevant. An automated method is proposed for identifying these satisfaction tendencies at a large scale. This study follows a text mining approach to analyse 590,070 reviews posted between 2010 a...
Purpose
Considering the importance of the content created by the host for Airbnb consumers while making purchasing decisions, this study aims to analyze how the Airbnb hosts promote their properties by revealing the predominant attributes considered by hosts when advertising them.
Design/methodology/approach
The unstructured textual content of onl...
In a previous paper, we have proposed a set of concepts, axiom schemata and algorithms that can be used by agents to learn to describe their behaviour, goals, capabilities, and environment. The current paper proposes a new set of concepts, axiom schemata and algorithms that allow the agent to learn new descriptions of an observed behaviour (e.g., p...
The development of artificial agents able to learn through dialog without domain restrictions has the potential to allow machines to learn how to perform tasks in a similar manner to humans and change how we relate to them. However, research in this area is practically nonexistent. In this paper, we identify the modifications required for a dialog...
From the perspective of a dialog system, it is important to identify the intention behind the segments in a dialog, since it provides an important cue regarding the information that is present in the segments and how they should be interpreted. ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-pu...
Embodied Cognition (EC) states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience, making this biological semantic machinery noisy with resp...
Purpose
Real estate agents are professionals who need up-to-date and accurate information about their clients in order to maintain profitable and long-lasting relationships with each of them. A satisfied customer can be very valuable and profitable in the long term. This research focuses on solving the problem of the lack of a mobile Customer Relat...
Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We explore this aspect of cognition, by considering dance as an overt expression of semantic aspects of mus...
Este relatório foi desenvolvido por uma equipa de investigadores do Iscte - Instituto Universitário de Lisboa e do LNEG - Laboratório Nacional de Energia e Geologia para a ANI - Agência Nacional de Inovação, no âmbito de uma aquisição de serviços de consultoria para a realização de um estudo com o objetivo de conhecer o mercado da contratação públi...
Restaurant management requires customer responsiveness to deal with increasingly higher expectations and market competitiveness. This study proposes an approach to simplify the decision-making process of restaurant managers by combining both live social media customer feedback and historical sales data in a sales forecast model (based on TripAdviso...
Different forms of verbal aggression are often present in cyberbullying, which may impair executive function skills that enable the regulation of emotions and behavior. Emotion and behavioral regulation has been associated with better social adjustment and more positive interactions between peers. This study aimed to understand if fostering emotion...
Context:
Profiling developers is challenging since many factors, such as their skills, experience, development environment and behaviors, may influence a detailed analysis and the delivery of coherent interpretations.
Objective:
We aim at profiling software developers by mining their software development process. To do so, we performed a controll...
Football player’s performance can be measured in an objective way (e. g. Goals scored, assists, interceptions), this being seldom a method to compare and rank the best players by categories. Over years of study, many other factors that can influence the players performance were discovered and studied, considering not only objective factors, but als...
This paper presents DisBot, the first Portuguese speaking chatbot that uses social media retrieved knowledge to support citizens and first-responders in disaster scenarios, in order to improve community resilience and decision-making. It was developed and tested using Design Science Research Methodology (DSRM), being progressively matured with fiel...
Resources such as FrameNet, which provide sets of semantic frame definitions and annotated textual data that maps into the evoked frames, are important for several NLP tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised mann...
This research is aimed at creating and presenting DisKnow, a data extraction system with the capability of filtering and abstracting tweets, to improve community resilience and decision-making in disaster scenarios. Nowadays most people act as human sensors, exposing detailed information regarding occurring disasters, in social media. Through a pip...
Considering the wide offer of mobile applications available nowadays, effective search engines are imperative for an user to find applications that provide a specific desired functionality. Retrieval approaches that leverage topic similarity between queries and applications have shown promising results in previous studies. However, the search engin...
The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but, in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed...
ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-purpose communicative functions. The automatic recognition of these functions, although practically unexplored, is relevant for a dialog system, since they provide cues regarding the intention behind the segments and how they should be interpreted...
Resources such as FrameNet provide semantic information that is important for multiple tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspe...
Automatic dialog act recognition is a task that has been widely explored over the years. In recent works, most approaches to the task explored different deep neural network architectures to combine the representations of the words in a segment and generate a segment representation that provides cues for intention. In this study, we explore means to...
Dialog acts reveal the intention behind the uttered words. Thus, their automatic recognition is important for a dialog system trying to understand its conversational partner. The study presented in this article approaches that task on the DIHANA corpus, whose three-level dialog act annotation scheme poses problems which have not been explored in re...
Os actos de diálogo revelam a intenção por trás das palavras pronunciadas. Por isso, o seu reconhecimento automático é importante para um sistema de diálogo que tenta entender o seu interlocutor. O estudo apresentado neste artigo aborda essa tarefa no corpus DIHANA, cujo esquema de anotação de actos de diálogo em três níveis coloca problemas que nã...
Embodied cognition states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience history, making this biological semantic machinery noisy with r...
This empirical data-driven research aims to unveil thought-provoking insights on the U.S. hotel offer across its 50 states. Information of more than 30,000 hotels was collected through web scraping from TripAdvisor. Using such data, 50 support vector machine models were trained to model the TripAdvisor score, one per state, to assess the convergent...
We claim that it is possible to have artificial software agents for which
their actions and the world they inhabit have first-person or intrinsic
meanings. The first-person or intrinsic meaning of an entity to a system is
defined as its relation with the system's goals and capabilities, given the
properties of the environment in which it operates....
Automatic cyberbullying detection is a task of growing interest, particularly in the Natural Language Processing and Machine Learning communities. Not only is it challenging, but it is also a relevant need given how social networks have become a vital part of individuals' lives and how dire the consequences of cyberbullying can be, especially among...
Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We leverage this aspect of cognition, by considering dance as a proxy for music perception, in a statistica...
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention....
The three-level dialog act annotation scheme of the DIHANA corpus poses a multi-level classification problem in which the bottom levels allow multiple or no labels for a single segment. We approach automatic dialog act recognition on the three lev- els using an end-to-end approach, in order to implicitly cap- ture relations between them. Our deep n...
Purpose
This study aims to propose a data-driven approach, based on open-source tools, that makes it possible to understand customer satisfaction of the accommodation offer of a whole country.
Design/methodology/approach
The method starts by extracting information from all hotels of Portugal available at TripAdvisor through Web scraping. Then, a...
Purpose:
To develop a model to predict online review ratings from multiple sources, which can be used to detect fraudulent reviews, create proprietary rating indexes, or which can be employed as a measure of selection in recommender systems.
Methodology:
This study applies machine learning and natural language processing approaches to combine feat...
Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function o...
Sentence compression produces a shorter sentence by removing redundant information, preserving the grammaticality and the important content. We propose an improvement to current neural deletion systems. These systems output a binary sequence of labels for an input sentence: one indicates that the token from the source sentence remains in the compre...
A dialog act is a representation of an intention transmitted in the form of words. In this sense, when someone wants to transmit some intention, it is revealed both in the selected words and in how they are combined to form a structured segment. Furthermore, the intentions of a speaker depend not only on her intrinsic motivation, but also on the hi...
The IT incident management process requires a correct categorization to attribute incident tickets
to the right resolution group and obtain as quickly as possible an operational system, impacting
the minimum as possible the business and costumers. In this work, we introduce automatic text
classification, demonstrating the application of several nat...
Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function o...
Online reviews are one of the main influencers of hotel purchase decisions. This study performs an analysis of reviews extracted from well-known online review sources in combination with hotel sales data and concludes that ratings differ according to the language of reviews. Data science tools have been applied to English, Spanish, and Portuguese r...
Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces. The resulting semantic spaces are theoretically limited, either because the chosen high-level tags do not cover all of music semantics or because audio data itself is not enough to determine mu...