Andre de Carvalho

Andre de Carvalho
  • PhD
  • Universidade de São Paulo at University of São Paulo

About

429
Publications
172,184
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,467
Citations
Current institution
University of São Paulo
Current position
  • Universidade de São Paulo
Additional affiliations
January 2000 - December 2000
University of Guelph
Position
  • Professor (Associate)
January 2009 - present
University of São Paulo
Education
September 1990 - July 1994
University of Kent
Field of study
  • Neural Networks

Publications

Publications (429)
Article
Full-text available
The adoption of deep learning algorithms in the medical imaging area is a prominent research issue, with high potential for advancing AI-based Computer-aided diagnosis solutions. However, current solutions face challenges due to a lack of interpretability features and high data demands, prompting recent efforts to address these issues. In this stud...
Article
Full-text available
Legislative houses all over the world are adopting tools based on artificial intelligence to support their work. The incorporation of these tools can improve the analysis of the text of the proposed new laws and speed the preparation and discussion of new laws. The performance of artificial intelligence tools for text processing tasks is largely af...
Conference Paper
À medida que o armazenamento de sequências biológicas aumenta, extrair informações torna-se crucial para avanços na saúde. A complexidade dessas sequências exige técnicas sofisticadas, como Aprendizado de Máquina (AM). No entanto, desenvolver soluções fortes de AM demanda conhecimento especializado, muitas vezes fora do alcance de muitos pesquisado...
Article
Full-text available
Machine Learning (ML) algorithms have been important tools for the extraction of useful knowledge from biological sequences, particularly in healthcare, agriculture, and the environment. However, the categorical and unstructured nature of these sequences requiring usually additional feature engineering steps, before an ML algorithm can be efficient...
Chapter
The growing influx of lawsuits in judicial systems presents a pressing challenge for timely case resolution. The Sao Paulo Justice Court is particularly noteworthy, boasting the world’s largest caseload with an 84% congestion rate and an average processing time of over seven years. To address this issue, this article introduces LegalClass, a comput...
Conference Paper
Full-text available
The increasing volume and complexity of legal documents have led to a growing interest in text summarizing for legal texts. In this context, this paper presents LegalSum, a tool for automatically summarizing lawsuits in Portuguese, aiming to improve the efficiency of legal professionals and researchers. The tool is equipped with a legal-domain expr...
Article
Full-text available
Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predicti...
Preprint
Full-text available
Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations, and their complex interactions, it is common to use optimization techniques to find settings that lead to high predict...
Chapter
Machine Learning has revolutionized the categorization of vast legal documents, minimizing costs and improving evaluations. However, conventional models struggle with unseen data categories in real-world scenarios, a challenge termed Open Set Classification. Our study tackles the issue faced by the Court of Justice in São Paulo, Brazil, to identify...
Conference Paper
Given the increasing number of biological sequences stored in databases, there is a large source of information that can benefit several sectors such as agriculture and health. Machine Learning (ML) algorithms can extract useful and new information from these data, increasing social and economic benefits, in addition to productivity. However, the c...
Conference Paper
Applying Machine Learning (ML) algorithms to a dataset can be time-consuming. It usually involves, not only selecting and fine-tuning the algorithm, but also other steps, such as data preprocessing. To reduce this time, the whole or a subset of this process has been automated by Automated ML (AutoML) techniques, which can include Bayesian Optimizat...
Chapter
Data stream applications in highly dynamic environments often face concept drift problems, a phenomenon in which the statistical properties of the variables change over time, which can degrade the performance of Machine Learning models. This work presents a new model monitoring tool through the use of Meta Learning. The algorithm was conceived for...
Article
Full-text available
Several AutoML tools aim to facilitate the usability of machine learning algorithms, automatically recommending algorithms using techniques such as meta-learning, grid search, and genetic programming. However, the preprocessing step is usually not well handled by those tools. Thus, in this work, we present a systematic review of preprocessing algor...
Article
Due to their unique optical and electronic functionalities, chalcogenide glasses are materials of choice for numerous microelectronic and photonic devices. However, to extend the range of compositions and applications, profound knowledge about composition-property relationships is necessary. To this end, we collected a large quantity of composition...
Article
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. These recommendations are made based on meta-data, consisting of performance evaluations of algorithms and characterizations on prior datasets. These characterizations, also called meta-features, describe properties of the data...
Chapter
Initializing the hyper-parameters (HPs) of machine learning (ML) techniques became an important step in the area of automated ML (AutoML). The main premise in HP initialization is that a HP setting that performs well for a certain dataset(s) will also be suitable for a similar dataset. Thus, evaluation of similarities of datasets based on their cha...
Article
Choosing the most suitable algorithm to perform a machine learning task for a new problem is a recurrent and complex task. In multi-target regression tasks, when problem transformation methods are applied, this choice is even harder. The reason is the need to simultaneously choose the problem transformation method and the base learning algorithm. T...
Chapter
Metalearning has been largely used over the last years to recommend machine learning algorithms for new problems based on past experience. For such, the first step is the creation of metabase, or metadataset, containing metafeatures extracted from several datasets along with the performance of a pool of candidate algorithm(s). The next step is the...
Preprint
Full-text available
Due to their unique optical and electronic functionalities, chalcogenide glasses are materials of choice for numerous microelectronic and photonic devices. However, to extend the range of compositions and applications, profound knowledge about composition-property relationships is necessary. To this end, we collected a large quantity of composition...
Article
Full-text available
With the advent of powerful computer simulation techniques, it is time to move from the widely used knowledge-guided empirical methods to approaches driven by data science, mainly machine learning algorithms. We investigated the predictive performance of three machine learning algorithms for six different glass properties. For such, we used an exte...
Book
Há algum tempo, a área de inteligência artificial deixou de ser vista apenas como teórica – destinada à aplicação em pequenos problemas “curiosos” – para se tornar um campo de pesquisa crescente, em busca de soluções de problemas reais da sociedade. Vencedor do Prêmio Jabuti 2012 (Categoria Tecnologia e Informática) quando foi lançado, Inteligênci...
Article
Data stream mining needs to deal with scenarios where data distribution can change over time. As a result, different learning algorithms can be more suitable in different time periods. This paper proposes micro-MetaStream, a meta-learning based method to recommend the most suitable learning algorithm for each new example arriving in a data stream....
Article
Full-text available
A central aspect of online decision trees is evaluating the incoming data and performing model growth. For such, trees much deal with different kinds of input features. Numerical features are no exception, and they pose additional challenges compared to other kinds of features, as there is no trivial strategy to choose the best point to make a spli...
Article
Full-text available
Human Activity Recognition is focused on the use of sensing technology to classify human activities and to infer human behavior. While traditional machine learning approaches use hand-crafted features to train their models, recent advancements in neural networks allow for automatic feature extraction. Auto-encoders are a type of neural network that...
Article
Imbalanced datasets are an important challenge in supervised Machine Learning (ML). According to the literature, class imbalance does not necessarily impose difficulties for ML algorithms. Difficulties mainly arise from other characteristics, such as overlapping between classes and complex decision boundaries. For binary classification tasks, calcu...
Preprint
Full-text available
A central aspect of online decision tree solutions is evaluating the incoming data and enabling model growth. For such, trees much deal with different kinds of input features and partition them to learn from the data. Numerical features are no exception, and they pose additional challenges compared to other kinds of features, as there is no trivial...
Conference Paper
Meta-learning has been successfully applied to time series forecasting. For such, it uses meta-datasets created by previous machine learning applications. Each row in a meta-dataset represents a time series dataset. Each row, apart from the last, is meta-feature describing aspects of the related dataset. The last column is a target value, a meta-la...
Chapter
Incremental machine learning algorithms have been effective alternatives to deal with stream data. The Hoeffding Tree framework is one of the most successful solutions for supervised online prediction tasks. Although online regression tasks are present in several forms, and in many real-life problems, most of the research efforts have been devoted...
Chapter
Classification tasks using imbalanced datasets are not challenging on their own. Classification models perform poorly on the minority class when the datasets present other difficulties, such as class overlap and complex decision border. Data complexity measures can identify such difficulties, better dealing with imbalanced datasets. They can captur...
Article
Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied...
Preprint
Full-text available
This paper presents an experimental comparison among four Automated Machine Learning (AutoML) methods for recommending the best classification algorithm for a given input dataset. Three of these methods are based on Evolutionary Algorithms (EAs), and the other is Auto-WEKA, a well-known AutoML method based on the Combined Algorithm Selection and Hy...
Preprint
Full-text available
With the advent of powerful computer simulation techniques, it is time to move from the widely used knowledge-guided empirical methods to approaches driven by data science, mainly machine learning algorithms. Due to their (hidden) smooth composition-property relationships, this strategy is especially relevant for the development of new glasses. We...
Article
Full-text available
Investigating strategies that are able to efficiently deal with multi-label classification tasks is a current research topic in machine learning. Many methods have been proposed, making the selection of the most suitable strategy a challenging issue. From this premise, this paper presents an extensive empirical analysis of the binary transformation...
Preprint
Machine Learning (ML) algorithms have been successfully employed by a vast range of practitioners with different backgrounds. One of the reasons for ML popularity is the capability to consistently delivers accurate results, which can be further boosted by adjusting hyperparameters (HP). However, part of practitioners has limited knowledge about the...
Article
Full-text available
Several studies in the field of human–computer interaction have focused on the importance of emotional factors related to the interaction of humans with computer systems. According to the knowledge of the users’ emotions, intelligent software can be developed for interacting and even influencing users. However, such a scenario is still a challenge...
Article
Full-text available
Modern technologies demand the development of new glasses with unusual properties. Most of the previous developments occurred by slow, expensive trial-and-error approaches, which have produced a considerable amount of data over the past 100 years. By finding patterns in such types of data, Machine Learning (ML) algorithms can extract useful knowled...
Chapter
Human Activity Recognition is a machine learning task for the classification of human physical activities. Applications for that task have been extensively researched in recent literature, specially due to the benefits of improving quality of life. Since wearable technologies and smartphones have become more ubiquitous, a large amount of informatio...
Article
Full-text available
Automated recommendation of machine learning algorithms is receiving a large deal of attention, not only because they can recommend the most suitable algorithms for a new task, but also because they can support efficient hyper-parameter tuning, leading to better machine learning solutions. The automated recommendation can be implemented using meta-...
Article
Image segmentation is a key issue in image processing. New image segmentation algorithms have been proposed in the last years. However, there is no optimal algorithm for every image processing task. The selection of the most suitable algorithm usually occurs by testing every possible algorithm or using knowledge from previous problems. These proces...
Chapter
In data streams new classes can appear over time due to changes in the data statistical distribution. Consequently, models can become outdated, which requires the use of incremental learning algorithms capable of detecting and learning the changes over time. However, when a single classification model is used for novelty detection, there is a risk...
Chapter
Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to mo...
Article
Full-text available
Human mobility has a significant impact on several layers of society, from infrastructural planning and economics to the spread of diseases and crime. Representing the system as a complex network, in which nodes are assigned to regions (e.g., a city) and links indicate the flow of people between two of them, physics-inspired models have been propos...
Preprint
For many machine learning algorithms, predictive performance is critically affected by the hyperparameter values used to train them. However, tuning these hyperparameters can come at a high computational cost, especially on larger datasets, while the tuned settings do not always significantly outperform the default values. This paper proposes a rec...
Article
For many machine learning algorithms, predictive performance is critically affected by the hyperparameter values used to train them. However, tuning these hyperparameters can come at a high computational cost, especially on larger datasets, while the tuned settings do not always significantly outperform the default values. This paper proposes a rec...
Preprint
The amount of available data raises at large steps. Developing machine learning strategies to cope with the high throughput and changing data streams is a scope of high relevance. Among the prediction tasks in online machine learning, multi-target regression has gained increased attention due to its high applicability and relation with real-world p...
Chapter
Full-text available
Humans are frequently looking for patterns and uniformity to support their choices and decisions. Whatever falls outside the expected can be said to be an anomaly. However, in many practical situations, the presence of anomalies can provide valuable insights, which can point out useful novelties. Thus, in predictive maintenance, for example, anomal...
Article
Imbalanced datasets may negatively impact the predictive performance of most classical classification algorithms. This problem, commonly found in real-world, is known in machine learning domain as imbalanced learning. Most techniques proposed to deal with imbalanced learning have been proposed and applied only to binary classification. When applied...
Article
Hierarchical Multi-Label Classification is a challenging classification task where the classes are hierarchically structured, with superclass and subclass relationships. It is a very common task, for instance, in Protein Function Prediction, where a protein can simultaneously perform multiple functions. In these tasks it is very difficult to achiev...
Chapter
Human Activity Recognition has been primarily investigated as a machine learning classification task forcing it to handle with two main limitations. First, it must assume that the testing data has an equal distribution with the training sample. However, the inherent structure of an activity recognition systems is fertile in distribution changes ove...
Chapter
Data stream is a challenging research topic in which data can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, for example, a concept drift. A concept drift occurs when the concepts associated with a dataset change when new data arrive. T...
Preprint
Full-text available
Human activity recognition (HAR) is a classification task that aims to classify human activities or predict human behavior by means of features extracted from sensors data. Typical HAR systems use wearable sensors and/or handheld and mobile devices with built-in sensing capabilities. Due to the widespread use of smartphones and to the inclusion of...
Conference Paper
Recently, several classification algorithms capable of dealing with potentially infinite data streams have been proposed. One of the main challenges of this task is to continuously update predictive models to address concept drifts without compromise their predictive performance. Moreover, the classification algorithm used must be able to efficient...
Article
Meta-learning has been successfully used for algorithm recommendation tasks. It uses machine learning to induce meta-models able to predict the best algorithms for a new dataset. In this paper, meta-models are applied to a set of meta-features, describing a dataset, to predict the performance of clustering algorithms applied to this dataset. The pa...
Conference Paper
Full-text available
As Collaborative Filtering becomes increasingly important in both academia and industry recommendation solutions, it also becomes imperative to study the algorithm selection task in this domain. This problem aims at finding automatic solutions which enable the selection of the best algorithms for a new problem, without performing full-fledged train...
Preprint
Full-text available
Algorithm selection using Metalearning aims to find mappings between problem characteristics (i.e. metafeatures) with relative algorithm performance to predict the best algorithm(s) for new datasets. Therefore, it is of the utmost importance that the metafeatures used are informative. In Collaborative Filtering, recent research has created an exten...
Article
Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors hav...
Article
Noise is often present in real datasets used for training Machine Learning classifiers. Their disruptive effects in the learning process may include: increasing the complexity of the induced models, a higher processing time and a reduced predictive power in the classification of new examples. Therefore, treating noisy data in a preprocessing step i...
Preprint
Full-text available
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describ...
Article
Full-text available
Motivation: With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsol...
Article
Full-text available
The glass transition temperature (Tg) is a kinetic property of major importance for both fundamental and applied glass science. In this study, we designed and trained an artificial neural network to induce a model that can predict the Tg of multicomponent oxide glasses. To do this, we used a dataset containing more than 55,000 inorganic glass compo...
Preprint
Full-text available
To select the best algorithm for a new problem is an expensive and difficult task. However, there are automatic solutions to address this problem: using Metalearning, which takes advantage of problem characteristics (i.e. metafeatures), one is able to predict the relative performance of algorithms. In the Collaborative Filtering scope, recent works...
Article
Full-text available
Food trucks are a widely popular fast food restaurant alternative, whose differentiating factor is their proximity to customers. Their popularity has stimulated the expansion of available options, which now includes several different types of cuisines, consequently making the choice by customers a challenging issue. From data obtained via a market...
Chapter
Full-text available
This paper addresses the Cluster Editing problem. The objective of this problem is to transform a graph into a disjoint union of cliques using a minimum number of edge modifications. This problem has been considered in the context of bioinformatics, document clustering, image segmentation, consensus clustering, qualitative data clustering among oth...
Article
Many real-world situations constantly generate concept-drifting data streams at high speed. These situations demand adaptive algorithms able to learn online in accordance with the most recent target function (concept). This paper presents Online Adaptive Classifier Ensemble, a new ensemble algorithm able to learn from concept-drifting data streams....
Chapter
This chapter describes a new group of predictive learning algorithms – search‐based and optimization‐based algorithms – which allow us to deal efficiently with more complex classification tasks. Decision tree induction algorithms (DTIAs) induce models with a tree‐shaped decision structure where each internal node is associated with one or more pred...
Chapter
This chapter describes the three current fields of data analytics that are attracting a great deal of attention due to their wide application in different domains: text mining, social network analysis (SNA) and recommendation systems. Text mining is a very active area of data analytics. Text mining is an important part of several other tasks, like...
Chapter
Classification is one of the most common tasks in analytics, and the most common in predictive analytics. In addition to being the most common classification task, binary classification is the simplest classification task. This chapter illustrates how the data are distributed in the data set. The main concern of most data analytic applications is t...
Chapter
Predictive tasks are divided between classification tasks and regression tasks. This chapter focusses mainly on regression. It describes the concepts that are meaningful for both regression and classification, namely generalization and model validation. The chapter also describes some of the most popular regression methods. The methods described ar...
Chapter
This chapter explores the advanced subjects in predictive analytics. The individual classifiers whose predictions will be combined will be referred to here as the “base” classifiers. Each base classifier can be induced using the same, original, training set, or parts of the original training set. Two important requirements in developing ensembles w...
Chapter
This chapter focuses on how a data set can be described by descriptive statistics and by visualization techniques for single attributes and pairs of attributes. It presents several univariate and bivariate statistical formulae and data visualization techniques. The chapter describes the different scale types that exist to describe data. There are t...
Chapter
This chapter presents a cheat sheet of descriptive analytics. The main purpose of descriptive analytics is to understand the data, providing relevant knowledge for future decisions in the project development. It presents the main aspects of the univariate methods that is, methods used to summarize a single attribute. It also presents a summary of b...
Chapter
This chapter presents an important family of techniques for descriptive tasks. They can describe a data set by partitioning it, so that objects in the same group are similar to each other. These “clustering” techniques have been developed and extensively used to partition data sets into groups. Clustering techniques use only predictive attributes t...
Chapter
This chapter discusses the frequent itemset mining, describing the three main approaches: Apriori, Eclat and frequent pattern growth (FP‐Growth). Frequent pattern mining methods were developed to deal with very large data sets recorded in hypermarkets and social media sites. The chapter discusses the min_sup threshold, a hyper‐parameter with high i...
Chapter
This chapter discusses the aspects of data quality and describes the preprocessing techniques frequently used in data analytics. The quality of a dataset strongly affects the results of a data quality project. The chapter also discusses the techniques for data‐type conversions, a necessary operation when the values of a predictive attribute need to...
Chapter
This introduction presents an overview of key concepts discussed in the subsequent chapters of the book. The book describes two real‐world problems from different areas as an introduction to the different subjects. It explains the multi‐layer perceptron neural networks and k‐means. The book explores the methodologies for planning and developing pro...
Chapter
This chapter explores a project that relates to the CRoss‐Industry Standard Process for Data Mining (CRISP‐DM) methodology. The data used can be obtained in the UCI machine learning repository, easily obtainable in the web, entitled “Polish companies bankruptcy data”. The chapter presents a cheat sheet on predictive algorithms. Investors, banks and...
Chapter
This chapter describes simple multivariate methods from the three data analysis approaches – frequency, visualization and statistical. The multivariate frequency values can be computed independently for each attribute. The chapter explores how multivariate data can be visually represented in different ways and the main benefits of each of these alt...
Preprint
Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors hav...

Network

Cited By