
Rafael Geraldeli RossiFederal University of Mato Grosso do Sul, Três Lagoas, Brazil
Rafael Geraldeli Rossi
PhD
Senior Data Scientist at iFood
About
61
Publications
15,229
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
597
Citations
Introduction
Additional affiliations
August 2011 - October 2015
August 2011 - October 2015
Publications
Publications (61)
The advancement of techniques and computational tools for data mining has been boosting the music market with applications focused on user experience. These techniques explore musical data looking for patterns and trends that can guide business strategies. One of the key steps in these applications is the vector representation of the original text....
Context
Mobile app reviews are a rich source of information for software evolution and maintenance. Several studies have shown the effectiveness of exploring relevant reviews in the software development lifecycle, such as release planning and requirements engineering tasks. Popular apps receive even millions of reviews, thereby making manual extrac...
In this paper, we introduce the concept of learning to sense, which aims to emulate a complex characteristic of human reasoning: the ability to monitor and understand a set of interdependent events for decision-making processes. Event datasets are composed of textual data and spatio-temporal features that determine where and when a given phenomenon...
Atualmente há uma quantidade massiva de textos sendo produzida no universo digital. Esse grande conjunto de textos pode conter conhecimentoútil para diversasáreas, tanto acadêmicas quanto empresariais. Uma das formas para extração de conhecimento e gerenciamento de grandes volumes de textó e a classificação automática. Uma maneira de tornar mais at...
Events are phenomena that occur at a specific time and place. Its detection can bring benefits to society since it is possible to extract knowledge from these events. Event detection is a multimodal task since these events have textual, geographical, and temporal components. Most multimodal research in the literature uses the concatenation of the c...
The dynamism of fake news evolution and dissemination plays a crucial role in influencing and confirming personal beliefs. To minimize the spread of disinformation approaches proposed in the literature, automatic fake news detection generally learns models through binary supervised algorithms considering textual and contextual information. However,...
Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to l...
Positive and Unlabeled Learning (PUL) uses unlabeled documents and a few positive documents for retrieving a set of ”interest” documents from a text collection. Usually, PUL approaches are based on the vector space model. However, when dealing with semi-supervised learning for text classification or information retrieval, graph-based approaches hav...
A Análise de Sentimentosé um processo que tem por objetivo principal extrair as polaridades dos sentimentos expressos nas opiniões em relação a um tópico de interesse. Essaárea de pesquisa vem ganhando atenção, tanto na Web quanto na academia, pois instituições, pessoas e companhias se inte-ressam em saber a opinião real de um grupo de pessoas a re...
Computational techniques can be used to identify musical trends and patterns, helping people filtering and selecting music according to their preferences. In this scenario, researches claim that the future of music permeates artificial intelligence, which will play the role of composing music that best fits the tastes of consumers. So, extracting p...
Event analysis from news and social networks is a promising way to understand complex social phenomena. Each event consists of different components, which indicate what happened, when, where, and the people and organizations involved. Heterogeneous networks are useful for modeling large event datasets, where we map different types of objects (e.g....
Dado o volume massivo de textos sendo produzido nos dias atuais, a classificação automática de textos tem se tornado interessante tanto para fins acadêmicos quanto empresariais. Tradicionalmente, a classificação automática de textos é realizada por meio de aprendizado de máquina multi-classe, o qual requer que o usuário apresente textos rotulados d...
Técnicas computacionais podem ser usadas para identificar tendências e padrões musicais, ajudando as pessoas a filtrar e selecionar músicas de acordo com suas preferências. Nesse cenário, pesquisas afirmam que o futuro da música permeia a inteligência artificial, que desempenhará o papel de compor músicas que melhor atendam aos gostos dos consumido...
Resumo. O espanholé hoje a segunda língua mais falada do mundo e se en-contra cada vez mais presente na vida dos brasileiros, seja na vida acadêmica, viagens ou negócios, ou ainda por ser a língua predominante na América. Neste contexto, com o advento e evolução da internet, diversas ferramentas on-line para tradução surgiram de forma a propiciar u...
The texts automatic classification (TAC) has become interesting for academic and business purposes due to the massive volume of texts being produced. TAC is usually performed through multi-class learning, in which a user must provide labeled texts for all classes of an application domain. However, in scenarios in which the intent is to verify if a...
Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achieved with traditional bag of words, such representa...
Aspect-Based Sentiment Analysis (ABSA) is a promising approach to analyze consumer reviews at a high level of detail, where the opinion about each feature of the product or service is considered. ABSA usually explores supervised inductive learning algorithms, which requires intense human effort for the labeling process. In this paper, we investigat...
Devido à popularização da internet e ao surgimento das redes sociais, todos os dias são produzidos milhares de dados em forma de textos, especialmente em ambientes que proporcionam o rápido compartilhamento de mensa-gens entre usuários, assim como observado no Twitter. Visto que muitos destes textos são opinativos e expressam os sentimentos dos usu...
Nos dias atuais há um grande volume de tráfego terrestre, sendo o maior responsável pelo transporte de cargas e pessoas no país. Monitorar e manter essas vias é um grande desafio para os órgãos responsáveis. Na maioria das vezes, a inspeção manual é preferível para realizar o monitoramento das vias, pois acredita-se ser o modo mais econômico financ...
Analisando o volume massivo de textos que trafegam na internet e suas estimativas de crescimento, classificar automaticamente os textos que circulam na rede mundial de computadores tem se tornado interessante para fins acadêmicos e empresariais. Uma das formas de se realizar esta classificação automáticá e utilizando técnicas de aprendizado de máqu...
Events can be defined as "something that occurs at specific place and time associated with some specific actions". In general, events extracted from news articles and social networks are used to map the information from web to the various phenomena that occur in our physical world. One of the main steps to perform this relationship is the use of ma...
An event is defined as “a particular thing which happens at a specific time and place” and can be extracted from news articles, social networks, forums, as well as any digital documents associated with metadata describing temporal and geographical information. In practice, this knowledge is a digital representation (virtual world) of various phenom...
Many real-world applications, such as those related to sensors, allow collecting large amounts of inexpensive unlabeled sequential data. However, the use of supervised machine learning methods is frequently hindered by the high costs involved in gathering labels for such data. These methods assume the availability of a considerable amount of labele...
The quality of the pavement of roads and streets has significant influence in the final price of goods and services, in the safety of pedestrians and also in the driver’s comfort. Thus, the development of tools for continuous monitoring of the pavement, intending to obtain a more precise and adequate maintenance plan is essential. In order to reduc...
Due to the volume of texts available in digital form, the organization, management and knowledge extraction are laborious and frequently impossible to be handled. To automatically cope with these tasks, usually classification models are generated through supervised learning techniques. Unfortunately, this type of learning usually demands a huge hum...
Aspect-Based Sentiment Analysis (ABSA) allows to analyze the sentiment from each product aspect, e.g., the camera quality, operating system and the storage capacity of a smartphone. Two main tasks to perform ABSA are: (i) the terms/words related to the aspects and (ii) performing sentiment analysis for each identified aspect. Several approaches to...
Transductive classification is an useful way to classify a collection of unlabelled textual documents when only a small fraction of this collection can be manually labelled. Graph-based algorithms have aroused considerable interests in recent years to perform transductive classification since the graph-based representation facilitates label propaga...
Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independ...
Transductive classification is a useful way to classify texts when just few labeled examples are available. Transductive classification algorithms rely on term frequency to directly classify texts represented in vector space model or to build networks and perform label propagation. Related terms tend to belong to the same class and this information...
Na Análise de Sentimentos baseada em Aspectos (ASBA) é possível analisar o sentimento de cada aspecto de um produto, por exemplo, a qualidade da câmera, sistema operacional e capacidade de armazenamento de um Smartphone. Trabalhos existentes utilizando aprendizado de máquina para ASBA requerem (i) conhecer previamente os possíveis aspectos ou (ii)...
Causative verbs can assist in the identification of causative relations. Portuguese has a large number of verbs that would make the manual labelling of causative verbs an manually expensive task. This paper presents a classification strategy which uses the characteristics of causative verbs co-occurring with common nouns to classify Brazilian Portu...
The popularization of music distribution in electronic format has increased the amount of music with incomplete metadata. The incompleteness of data can hamper some important tasks, such as music and artist recommendation. In this scenario, transductive classification can be used to classify the whole dataset considering just few labeled instances....
A bipartite heterogeneous network is one of the simplest ways to represent a textual document collection. In such case, the network consists of two types of vertices, representing documents and terms, and links connecting terms to the documents. Transductive algorithms are usually applied to perform classification of networked objects. This type of...
Algorithms for numeric data classification have been applied for text classification. Usually the vector space model is used to represent text collections. The characteristics of this representation such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Networks can be used to represent text collection...
Incremental clustering is a very useful approach to organize dynamic text collections. Due to the time/space restrictions for incremental clustering, the textual documents must be preprocessed to maintain only their most important information. Domain independent statistical keyword extraction methods are useful in this scenario, since they analyze...
Several text mining techniques have been proposed to deal with the huge number of textual documents that are available and that have been published nowadays. Mainly classification techniques, which assign pre-defined labels to new documents, and clustering techniques, which separates texts into clusters. The techniques proposed in literature are us...
Recommending given names is a special case of recommender system that is little explored, but has gained a great interest recently. Indication of names related to a user's query or suggestion of names for parents in order to choose a name for their unborn child are examples of applications of name recommendation. In this paper, we present results f...
Terms are the basis for general text mining and natural language processing applications. However, the manual term extraction is unfeasible due to the huge number of words presented in a domain corpus and also the human effort required to do the extraction. For the term extraction task, machine learning techniques have been used to perform automati...
Incremental clustering is a very useful approach to organize dynamic text collections. Due to the time/space restrictions for incremental clustering, the textual documents must be preprocessed to maintain only their most important information. Statistical keyword extraction methods from single documents are useful in this scenario. However, differe...
Usually, algorithms for categorization of numeric data have been applied for text categorization after a preprocessing phase which assigns weights for textual terms deemed as attributes. However, due to characteristics of textual data, some algorithms for data categorization are not efficient for text categorization. Characteristics of textual data...
A simple and intuitive way to organize a huge document collection is by a topic hierarchy. Generally two steps are carried out to build a topic hierarchy automatically: 1) hierarchical document clustering and 2) cluster labeling. For both steps, a good textual document representation is essential. The bag-of-words is the common way to represent tex...
The technological progress designing new devices and the scientific growth in the field of Human-Computer Interaction are enabling new interaction modalities to move from research to commercial products. However, developing multimodal interfaces is still a difficult task due to the lack of tools that consider not only code generation, but usability...
Considering the huge growth of the number of documents in the dig- ital universe and the possibility of obtaining some competitive advantage in processing them, this paper describes some of the difficulties of working with text collections. More specifically, it shows some of the challenges on the step considered one of the most important of the Te...
Resumo. Neste relatório técnico é apresentada a ferramenta IEsystem, que extrai metadados de coleções de artigos científi-cos. Esta ferramenta é capaz de realizar a extração de metadados mesmo quando os artigos científicos são provenientes de diferentes fontes ou escritos em diferentes línguas. O processo de extração de metadados pauta-se em modelo...
The amount of textual documents available in digital format is incredibly large. Sometimes, it is impossible for a human being to man-age and extract knowledge from a large amount of textual documents. In order to deal with these challenges, automatic techniques to organize, manage and extract knowledge from textual data are becoming very im-portan...