
Solange Oliveira Rezende- Professor
- University of São Paulo
Solange Oliveira Rezende
- Professor
- University of São Paulo
About
256
Publications
41,769
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,873
Citations
Introduction
Solange Oliveira Rezende currently works at the Departamento de Ciência da Computação (SCC) (Sao Carlos), University of São Paulo. Solange does research in Artificial Intelligence and Data Mining. Their current project is 'Learning Websensors from Textual Data'.
Current institution
Additional affiliations
August 1991 - present
Education
March 1990 - December 1993
March 1990 - December 1993
March 1987 - March 1990
Publications
Publications (256)
Forecasting trends in the financial market is a classic and challenging problem that attracts economists’ and computer scientists’ attention. This research area, characterized by its dynamic, chaotic, and nonlinear nature, is further complicated by the overarching influence of the efficient market hypothesis (EMH). The EMH posits that all available...
Cyberbullying is a form of bullying that has emerged and is a concerning problem with the exponential increase of social media users. Social networks provide a suitable environment for those bullies to attack and cause serious psychological problems in their victims. To mitigate these issues, proactive measures are essential to detect and prevent c...
The project aims to develop an educational platform with chatbots based on Large Language Models (LLMs), AI systems that interact with users in an active and inquisitive manner, stimulating students' critical and logical thinking, particularly those in high school years. This interaction methodology seeks to prevent users from becoming passive rece...
O projeto visa desenvolver uma plataforma educacional com Chatbots baseados em Grandes Modelos de Linguagem (LLMs), sistemas de Inteligência Artificial , que interagem com usuários de forma ativa e questionadora, de forma a estimular o pensamento crítico e lógico dos alunos, especialmente os do ensino médio. Essa metodologia de interação objetiva e...
Complex and large software-intensive systems are increasingly present in several application domains, including Industry 4.0, connected health, smart cities, and smart agriculture, to mention a few. These systems are commonly composed of diverse other systems often developed by different organizations using various technologies and, as a consequenc...
This dataset involves a collection of soybean market news through web scraping from a Brazilian website. The news articles gathered span from January 2015 to June 2023 and have undergone a labeling process to categorize them as relevant or non-relevant. The news labeling process was conducted under the guidance of an agricultural economics expert,...
Fake news detection (FND) tools are essential to increase the reliability of information in social media. FND can be approached as a machine learning classification problem so that discriminative features can be automatically extracted. However, this requires a large news set, which in turn implies a considerable amount of human experts' effort for...
O Programa Nacional de Alimentação Escolar contribui para o aperfeiçoamento e desenvolvimento dos escolares por meio da transferência de fundos. No entanto, não existe apreciação para monitorar a qualidade dos cardápios para guiar as ações do programa para promover a segurança alimentar e nutricional. Neste artigo, está apresentada a construção de...
A recuperação de pastagens degradadas tem sido tema importante no que diz respeito à segurança alimentar. Apesar do grande volume de artigos científicos sobre “pastagens degradadas”, há um grande desafio em termos de recuperação desses documentos para extração de conhecimento. Neste artigo foram exploradas duas abordagens de classificação, uma supe...
Myocardial revascularization surgery is one of the recommended approaches for the treatment of chronic coronary disease. Several complications related to mortality, sequelae, length of stay, and hospital costs are also associated with this procedure. Death rates and complications depend on the characteristics of each patient. Knowing the factors re...
The amount of news generated on the internet has increased significantly in recent years. As a trend, text data has gained attention from industry, government, academia, and the financial market. This information is potentially valuable to assist domain experts in decision making. Therefore, related applications based on machine learning have been...
Forecasting models of the financial market generally use time series data. However, external factors can influence time series, such as political events, economic crises, government macroeconomic policy, and the foreign exchange market. This information is not explicit in the time series and can influence the prediction of the variable values. Text...
The World Health Organization (WHO) and the Global Burden of Disease study estimate that nearly 800,000 people die by suicide each year. Social media are emerging surveillance tools that can help researchers track suicide risk factors in real time. Text Mining naturally becomes an area with greater affinity to promote studies in media such as Twitt...
Forecasting models in the financial market generally use quantitative time-series data. However, external factors can influence data in time-series, such as weather events, economic crises, and the foreign exchange market. This information is not explicit in the time-series and can influence the prediction of the variable values. Textual data can b...
The dynamism of fake news evolution and dissemination plays a crucial role in influencing and confirming personal beliefs. To minimize the spread of disinformation approaches proposed in the literature, automatic fake news detection generally learns models through binary supervised algorithms considering textual and contextual information. However,...
Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to l...
Dealing with an uncertain and dynamic environment when facing multiple and fast-paced challenges forces professionals to adopt agile practices in different environments, resulting in the use of hybrid project management models. However, identifying the right practice to adopt can be challenging, given the variety of project types and environmental...
A mineração de dados é frequentemente descrita como o processo de localização de padrões interessantes em grandes bancos de dados. Dada a grande quantidade de dados digitais que estão continuamente sendo gerados e armazenados, a mineração de dados oferece uma solução para o problema de resumir e pesquisar rapidamente relacionamentos não óbvios nos...
Deep learning and neural language models have obtained state-of-the-art results in aspects extraction tasks, in which the objective is to automatically extract characteristics of products and services that are the target of consumer opinion. However, these methods require a large amount of labeled data to achieve such results. Since data labeling i...
Prototypes enable the materialization of information and generate a common reference for those involved in the design process. The literature discusses how different types of prototype assist the user involvement in the design of healthcare products. However, it does not discuss how to improve the efficiency and effectiveness of the process of prot...
UNSTRUCTURED
Background: A systematic review can be defined as a summary of the evidence found in the literature via a systematic search in the available scientific databases. One of the steps involved is article selection, which is typically a laborious task. Machine learning and artificial intelligence can be important tools in automating this st...
The commodities corn and soybean are products consumed on a large scale in the world. Fluctuations in market prices have far-reaching effects on consumers, farmers, and grain processors. Thus, forecasting the prices of these grains has attracted significant attention from researchers. Forecasting models generally use quantitative time-series data....
Introdução: A revisão sistemática pode ser definida como o resumo das evidências encontradas na literatura através de uma busca sistematizada nas bases de dados científicas disponíveis. Uma das etapas é a seleção dos artigos, sendo geralmente uma tarefa trabalhosa. A máquina de aprendizado e a inteligência artificial pode ser uma ferramenta importa...
A recommender system is an information filtering technology that can be used to recommend items that may be of interest to users. Additionally, there are the context‐aware recommender systems that consider contextual information to generate the recommendations. Reviews can provide relevant information that can be used by recommender systems, includ...
Event analysis from news and social networks is a promising way to understand complex social phenomena. Each event consists of different components, which indicate what happened, when, where, and the people and organizations involved. Heterogeneous networks are useful for modeling large event datasets, where we map different types of objects (e.g....
Dengue is an endemic disease in Brazil since the 1980s and since 1996 in Piau ́ı. The number of cases increases each year, with the incidence of more severe symptoms. This research aimed to evaluate the use of an automatic knowledge identification technique in factors related to the number of dengue occurrences. We built a dataset formed by data av...
A tarefa de recomendação é uma área de estudo proeminente e desafiadora em Machine Learning. O objetivo é recomendar itens (como produtos, filmes e serviços) aos usuários, de acordo com o que eles gostaram no passado. Em geral, a maioria dos sistemas de recomendação considera apenas dados estruturados. Por exemplo, ao recomendar filmes para os usuá...
According to the World Health Organization, every 40s a person dies of suicide in the world. Among young people aged 15 to 29, suicide is the second largest cause of death. Yet such deaths can be prevented. In this scenario, social networks like Twitter can become sources of information in real time and help in suicide prevention. The present work...
Goal: In this work, from a set of textual scientific documents from a public database, we carried
out an automatic analysis to detect the dominant terms in the field of knowledge governance.
Design / Methodology / Approach: We apply text mining techniques from Artificial Intelligence
combined with complex network metrics to identify the keywords....
In this paper, the Filtered-Association Rules Network (Filtered-ARN) is presented to structure, prune, and analyze a set of association rules in order to construct candidate hypotheses. The Filtered-ARN algorithm selects association rules with the use of asymmetric objective measures, Added Value and Gain then builds a network allowing more explora...
The discovery of knowledge in textual databases is an approach that basically seeks for implicit
relationships between different concepts in different documents written in natural language, in
order to identify new useful knowledge. To assist in this process, this approach can count on the
help of Text Mining techniques. Despite all the progress ma...
The emergence of new and challenging text mining applications is demanding the development of novel text processing and knowledge extraction techniques. One important challenge of text mining is the proper treatment of text meaning, which may be addressed by incorporating different types of information (e.g., syntactic or semantic) into the text re...
Bug localization (BL) from the bug report is the strategic activity of the software maintaining process. Because BL is a costly and tedious activity, BL techniques information retrieval-based and machine learning-based could aid software engineers. We propose a method for BUg Localization with word embeddings and Network Regularization (BULNER). Th...
A recommender system is an information filtering technology that can be used to recommend items that may be of interest to users. In their traditional form, recommender systems do not consider information that might enrich the recommendation process, as contextual information. In this way, we have the context-aware recommender systems that consider...
Bug localization (BL) from the bug report is the strategic activity of the software maintaining process. Because BL is a costly and tedious activity, BL techniques information retrieval-based and machine learning-based could aid software engineers. We propose a method for BUg Localization with word embeddings and Network Regularization (BULNER). Th...
Bug localization (BL) from the bug report is the strategic activity of the software maintaining process. Because BL is a costly and tedious activity, BL techniques information retrieval-based and machine learning-based could aid software engineers. We propose a method for BUg Localization with word embeddings and Network Regularization (BULNER). Th...
A capacidade de imaginar diferencia os humanos de qualquer outro ser animado. Analisar o passado, planejar o futuro e atuar no presente a partir de experiências vividas é uma prerrogativa humana que permitiu à espécie chegar ao nível de desenvolvimento conhecido até o momento. As questões relacionadas ao Olimpismo têm mobilizado a sociedade e a com...
The food sector is one of the most critical areas of the economy, and consumers are seeking safer, more readily available, more affordable, and better quality food. Therefore, organic agriculture has become a possible approach for optimizing the characteristics of processed foods. Vegetables have essential uses as green manure, but the greatest dif...
Recommender systems help users by recommending items, such as products and services, that can be of interest to these users. Context-aware recommender systems have been widely investigated in both academia and industry because they can make recommendations based on a user’s current context (e.g., location and time). Moreover, the advent of Web 2.0...
Sistemas de recomendação baseados em técnicas de fatoração de matrizes são considerados o estado-da-arte na literatura da área, em razão de sua boa escalabilidade e acurácia. Esses sistemas podem ser utilizados na forma tradicional, considerando apenas informações de usuários e itens, como também na forma sensível ao contexto, considerando também o...
Nowadays, with the growth of the digital universe, e-commerce and social networks, a great diversity of information, products and services is available on the Web. A recommender system can aid in user decisions like which product to buy, which movie to watch and which hotel to book. Traditional recommender systems focus on user and item data to gen...
The popularization of web platforms promoted a significant increase in the publication of financial news and reports in digital media. In this sense, a multidisciplinary research area called “learning to sense” (or sensor learning) has received attention recently. Unlike traditional machine learning methods, in sensor learning there is an interest...
A análise de eventos tem recebido atenção recentemente devido à popularização de plataformas web para publicação de conteúdo, especialmente portais de notícias, redes sociais, blogs e fóruns. Essas plataformas armazenam eventos por meio de textos a respeito de diversos setores da sociedade e podem ser vistas como uma representação digital (mundo vi...
Modern agribusiness management incorporates instruments
for risk management with the objective of mitigating
uncertainties to the producer. In this context, the producer (risk
averse) transfer the risk of price oscillation to companies or
individuals that operate in the futures market and who expect
to receive a payment (risk premium) for assuming...
In this paper, the Filtered-Association Rules Network (\textit{Filtered}-ARN) is presented to structure, prune, and analyze a set of association rules to construct candidate hypotheses. The \textit{Filtered}-ARN algorithm selects association rules with the use of asymmetric objective measures, \textit{Added Value} and \textit{Gain} then builds a ne...
The discovery of knowledge in textual databases is an approach that basically seeks for implicit relationships between different concepts in different documents written in natural language, in order to identify new useful knowledge. To assist in this process, this approach can count on the help of Text Mining techniques. Despite all the progress ma...
Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achieved with traditional bag of words, such representa...
Aspect-Based Sentiment Analysis (ABSA) is a promising approach to analyze consumer reviews at a high level of detail, where the opinion about each feature of the product or service is considered. ABSA usually explores supervised inductive learning algorithms, which requires intense human effort for the labeling process. In this paper, we investigat...
The quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model...
The focus of this paper is on the evaluation of sixteen labeling methods for hierarchical document clusters over five datasets. All of the methods are independent from clustering algorithms, applied subsequently to the dendrogram construction and based on probabilistic dependence relations among labels and clusters. To reach a fair comparison as we...
BEST sports text collection version 1.1 is a collection of sports news written in Portuguese. It was collected, prepared, and provided to be used as benchmarking collection in text mining research. We created four datasets, considering real application scenarios, in which different users or situations require different organizations (or classificat...
The availability of labeled text collections is a common need in the text mining research community. These collections are used for both learning and evaluating text mining models. In this technical report, we present the BEST sports collection. This collection of documents written in Portuguese was collected, prepared, and provided to be used as b...
The association rules (ARs) post-processing step is challenging, since many patterns are extracted and only a few of them are useful to the user. One of the most traditional approaches to find rules that are of interestingness is the use of objective measures (OMs). Due to their frequent use, many of them exist (over 50). Therefore, when a user dec...
Text Mining process is essential to knowledge discovery in textual databases. In order to extract patterns from a textual collection, it is important to provide meaning to data, reflecting its characteristics by an efficient representation that transmits the original relationships of the database. One of the most used representation technique is Ba...
As text semantics has an important role in text meaning, the term semantics has been seen in a vast sort of text mining studies. However, there is a lack of studies that integrate the different research branches and summarize the developed works. This paper reports a systematic mapping about semantics-concerned text mining studies. This systematic...
In the last years, latent Dirichlet allocation (LDA), a state-of-the-art topic modeling method, has been applied in several text mining tasks. LDA solutions can be used as either a clustering solution or a low-dimensional document representation. The low-dimensional space obtained by LDA is normally called semantic space, as alternative forms expre...
An event is defined as “a particular thing which happens at a specific time and place” and can be extracted from news articles, social networks, forums, as well as any digital documents associated with metadata describing temporal and geographical information. In practice, this knowledge is a digital representation (virtual world) of various phenom...
Many real-world applications, such as those related to sensors, allow collecting large amounts of inexpensive unlabeled sequential data. However, the use of supervised machine learning methods is frequently hindered by the high costs involved in gathering labels for such data. These methods assume the availability of a considerable amount of labele...
Modern agricultural processes are increasingly looking at the use of chemicals, so the constant search for organic alternatives to fertilization becomes frequent. The use of data mining using association rule networks (ARN) can aid in the analysis of the parameters involved in choosing which plant to use as green manure. In this work, an analysis o...
Fuzzy document clustering aims at automatically organizing related documents into clusters in a flexible way. At this context, the topics identification addressed by documents in every cluster is performed by automatically discovering cluster descriptors, which are relevant terms present in these documents. Since documents are represented by a high...
Context: Due to a large number of incident reports that are persistent in Bug Tracking Systems repositories and the need to prioritize them according to the type of severity, it is necessary to investigate tools that support the prediction of incident reports severity. Objective: Apply Text Mining (TM) techniques and learning methods to help the pr...
RESUMO Em um mercado competitivo, adotar boas estratégias de negócio é um requisito fundamental para satisfazer e fidelizar clientes. Este trabalho teve como objetivo identificar os perfis de usuários que se mantêm fiéis a uma operadora de telecomunicações e os perfis daqueles que a abandonam. Consideram-se os relatórios de ocorrências de atendimen...
Due to the volume of texts available in digital form, the organization, management and knowledge extraction are laborious and frequently impossible to be handled. To automatically cope with these tasks, usually classification models are generated through supervised learning techniques. Unfortunately, this type of learning usually demands a huge hum...
This paper deals with the entity extraction task (named entity recognition) of a text mining process that aims at unveiling non-trivial semantic structures, such as relationships and interaction between entities or communities. In this paper we present a simple and efficient named entity extraction algorithm. The method, named PAMPO (PAttern Matchi...
With the explosion of text content made available in the internet Sentiment Analysis (SA) started to attract more of people's attention by offering alternatives to automatically extract opinion information from text. As the internet extended its reach throughout the globe, the need for tools to enable information exchange between people who do not...
In order to achieve good Text Mining results, the representation model used to structure the text collection must preserve important information hidden in the text documents. In this context, a text representation based on expressions of a specific domain was developed in a previous work. In this paper, we propose a generalization of that represent...
Aspect-Based Sentiment Analysis (ABSA) allows to analyze the sentiment from each product aspect, e.g., the camera quality, operating system and the storage capacity of a smartphone. Two main tasks to perform ABSA are: (i) the terms/words related to the aspects and (ii) performing sentiment analysis for each identified aspect. Several approaches to...
Recommendation of textual documents requires indexing mechanisms to extract structured metadata for attribute-aware recommender systems. Applying a variety of text mining algorithms has the advantage of capturing different aspects of unstructured content, resulting in richer descriptions. However, it is difficult to integrate them into a unique mod...
A recommender system is used in various fields to recommend items of interest to the users. Most recommender approaches focus only on the users and items to make the recommendations. However, in many applications, it is also important to incorporate contextual information into the recommendation process. Although the use of contextual information h...
Association rules are widely used to find relations among items in a given database. However, the amount of generated rules is too large to be manually explored. Traditionally, this task is done by post-processing approaches that explore and direct the user to the interesting rules. Recently, the user’s knowledge has been considered to post-process...
Many objective measures (OMs) were proposed since they are frequently used to discover interesting association rules. Therefore, an important challenge is to decide which OM to use. For that, one can: (a) reduce the number of OMs to be chosen; (b) aggregate OMs’ values in only one importance value as a mean of not selecting a suitable OM. The probl...
Industrial Product-Service Systems (IPS2) is an innovative way of doing business potentially capable of
answering sustainability challenges. The interest in IPS2 has opened several research branches as one concerned on how to properly develop it. Well-known methods and tools have been adapted for IPS2 development or new ones created by researches....