Andre de Carvalho

Andre de Carvalho
University of São Paulo | USP · Departamento de Ciência da Computação (SCC) (Sao Carlos)

PhD

About

413
Publications
135,462
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,465
Citations
Additional affiliations
January 2009 - present
University of São Paulo
January 2000 - December 2000
University of Guelph
Position
  • Professor (Associate)
Education
September 1990 - July 1994
University of Kent
Field of study
  • Neural Networks

Publications

Publications (413)
Article
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. These recommendations are made based on meta-data, consisting of performance evaluations of algorithms and characterizations on prior datasets. These characterizations, also called meta-features, describe properties of the data...
Article
Choosing the most suitable algorithm to perform a machine learning task for a new problem is a recurrent and complex task. In multi-target regression tasks, when problem transformation methods are applied, this choice is even harder. The reason is the need to simultaneously choose the problem transformation method and the base learning algorithm. T...
Chapter
Metalearning has been largely used over the last years to recommend machine learning algorithms for new problems based on past experience. For such, the first step is the creation of metabase, or metadataset, containing metafeatures extracted from several datasets along with the performance of a pool of candidate algorithm(s). The next step is the...
Preprint
Full-text available
Due to their unique optical and electronic functionalities, chalcogenide glasses are materials of choice for numerous microelectronic and photonic devices. However, to extend the range of compositions and applications, profound knowledge about composition-property relationships is necessary. To this end, we collected a large quantity of composition...
Article
Full-text available
With the advent of powerful computer simulation techniques, it is time to move from the widely used knowledge-guided empirical methods to approaches driven by data science, mainly machine learning algorithms. We investigated the predictive performance of three machine learning algorithms for six different glass properties. For such, we used an exte...
Book
Há algum tempo, a área de inteligência artificial deixou de ser vista apenas como teórica – destinada à aplicação em pequenos problemas “curiosos” – para se tornar um campo de pesquisa crescente, em busca de soluções de problemas reais da sociedade. Vencedor do Prêmio Jabuti 2012 (Categoria Tecnologia e Informática) quando foi lançado, Inteligênci...
Article
Data stream mining needs to deal with scenarios where data distribution can change over time. As a result, different learning algorithms can be more suitable in different time periods. This paper proposes micro-MetaStream, a meta-learning based method to recommend the most suitable learning algorithm for each new example arriving in a data stream....
Article
Full-text available
A central aspect of online decision trees is evaluating the incoming data and performing model growth. For such, trees much deal with different kinds of input features. Numerical features are no exception, and they pose additional challenges compared to other kinds of features, as there is no trivial strategy to choose the best point to make a spli...
Article
Full-text available
Human Activity Recognition is focused on the use of sensing technology to classify human activities and to infer human behavior. While traditional machine learning approaches use hand-crafted features to train their models, recent advancements in neural networks allow for automatic feature extraction. Auto-encoders are a type of neural network that...
Chapter
Initializing the hyper-parameters (HPs) of machine learning (ML) techniques became an important step in the area of automated ML (AutoML). The main premise in HP initialization is that a HP setting that performs well for a certain dataset(s) will also be suitable for a similar dataset. Thus, evaluation of similarities of datasets based on their cha...
Article
Imbalanced datasets are an important challenge in supervised Machine Learning (ML). According to the literature, class imbalance does not necessarily impose difficulties for ML algorithms. Difficulties mainly arise from other characteristics, such as overlapping between classes and complex decision boundaries. For binary classification tasks, calcu...
Preprint
Full-text available
A central aspect of online decision tree solutions is evaluating the incoming data and enabling model growth. For such, trees much deal with different kinds of input features and partition them to learn from the data. Numerical features are no exception, and they pose additional challenges compared to other kinds of features, as there is no trivial...
Conference Paper
Meta-learning has been successfully applied to time series forecasting. For such, it uses meta-datasets created by previous machine learning applications. Each row in a meta-dataset represents a time series dataset. Each row, apart from the last, is meta-feature describing aspects of the related dataset. The last column is a target value, a meta-la...
Chapter
Incremental machine learning algorithms have been effective alternatives to deal with stream data. The Hoeffding Tree framework is one of the most successful solutions for supervised online prediction tasks. Although online regression tasks are present in several forms, and in many real-life problems, most of the research efforts have been devoted...
Chapter
Classification tasks using imbalanced datasets are not challenging on their own. Classification models perform poorly on the minority class when the datasets present other difficulties, such as class overlap and complex decision border. Data complexity measures can identify such difficulties, better dealing with imbalanced datasets. They can captur...
Article
Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied...
Preprint
Full-text available
This paper presents an experimental comparison among four Automated Machine Learning (AutoML) methods for recommending the best classification algorithm for a given input dataset. Three of these methods are based on Evolutionary Algorithms (EAs), and the other is Auto-WEKA, a well-known AutoML method based on the Combined Algorithm Selection and Hy...
Preprint
Full-text available
With the advent of powerful computer simulation techniques, it is time to move from the widely used knowledge-guided empirical methods to approaches driven by data science, mainly machine learning algorithms. Due to their (hidden) smooth composition-property relationships, this strategy is especially relevant for the development of new glasses. We...
Article
Full-text available
Investigating strategies that are able to efficiently deal with multi-label classification tasks is a current research topic in machine learning. Many methods have been proposed, making the selection of the most suitable strategy a challenging issue. From this premise, this paper presents an extensive empirical analysis of the binary transformation...
Preprint
Machine Learning (ML) algorithms have been successfully employed by a vast range of practitioners with different backgrounds. One of the reasons for ML popularity is the capability to consistently delivers accurate results, which can be further boosted by adjusting hyperparameters (HP). However, part of practitioners has limited knowledge about the...
Article
Full-text available
Several studies in the field of human–computer interaction have focused on the importance of emotional factors related to the interaction of humans with computer systems. According to the knowledge of the users’ emotions, intelligent software can be developed for interacting and even influencing users. However, such a scenario is still a challenge...
Article
Full-text available
Modern technologies demand the development of new glasses with unusual properties. Most of the previous developments occurred by slow, expensive trial-and-error approaches, which have produced a considerable amount of data over the past 100 years. By finding patterns in such types of data, Machine Learning (ML) algorithms can extract useful knowled...
Chapter
Human Activity Recognition is a machine learning task for the classification of human physical activities. Applications for that task have been extensively researched in recent literature, specially due to the benefits of improving quality of life. Since wearable technologies and smartphones have become more ubiquitous, a large amount of informatio...
Article
Full-text available
Automated recommendation of machine learning algorithms is receiving a large deal of attention, not only because they can recommend the most suitable algorithms for a new task, but also because they can support efficient hyper-parameter tuning, leading to better machine learning solutions. The automated recommendation can be implemented using meta-...
Article
Image segmentation is a key issue in image processing. New image segmentation algorithms have been proposed in the last years. However, there is no optimal algorithm for every image processing task. The selection of the most suitable algorithm usually occurs by testing every possible algorithm or using knowledge from previous problems. These proces...
Chapter
In data streams new classes can appear over time due to changes in the data statistical distribution. Consequently, models can become outdated, which requires the use of incremental learning algorithms capable of detecting and learning the changes over time. However, when a single classification model is used for novelty detection, there is a risk...
Chapter
Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to mo...
Article
Full-text available
Human mobility has a significant impact on several layers of society, from infrastructural planning and economics to the spread of diseases and crime. Representing the system as a complex network, in which nodes are assigned to regions (e.g., a city) and links indicate the flow of people between two of them, physics-inspired models have been propos...
Preprint
For many machine learning algorithms, predictive performance is critically affected by the hyperparameter values used to train them. However, tuning these hyperparameters can come at a high computational cost, especially on larger datasets, while the tuned settings do not always significantly outperform the default values. This paper proposes a rec...
Article
For many machine learning algorithms, predictive performance is critically affected by the hyperparameter values used to train them. However, tuning these hyperparameters can come at a high computational cost, especially on larger datasets, while the tuned settings do not always significantly outperform the default values. This paper proposes a rec...
Preprint
The amount of available data raises at large steps. Developing machine learning strategies to cope with the high throughput and changing data streams is a scope of high relevance. Among the prediction tasks in online machine learning, multi-target regression has gained increased attention due to its high applicability and relation with real-world p...
Chapter
Full-text available
Humans are frequently looking for patterns and uniformity to support their choices and decisions. Whatever falls outside the expected can be said to be an anomaly. However, in many practical situations, the presence of anomalies can provide valuable insights, which can point out useful novelties. Thus, in predictive maintenance, for example, anomal...
Article
Imbalanced datasets may negatively impact the predictive performance of most classical classification algorithms. This problem, commonly found in real-world, is known in machine learning domain as imbalanced learning. Most techniques proposed to deal with imbalanced learning have been proposed and applied only to binary classification. When applied...
Article
Hierarchical Multi-Label Classification is a challenging classification task where the classes are hierarchically structured, with superclass and subclass relationships. It is a very common task, for instance, in Protein Function Prediction, where a protein can simultaneously perform multiple functions. In these tasks it is very difficult to achiev...
Preprint
Full-text available
Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations, and their complex interactions, it is common to use optimization techniques to find settings that lead to high predict...
Chapter
Human Activity Recognition has been primarily investigated as a machine learning classification task forcing it to handle with two main limitations. First, it must assume that the testing data has an equal distribution with the training sample. However, the inherent structure of an activity recognition systems is fertile in distribution changes ove...
Chapter
Data stream is a challenging research topic in which data can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, for example, a concept drift. A concept drift occurs when the concepts associated with a dataset change when new data arrive. T...
Preprint
Full-text available
Human activity recognition (HAR) is a classification task that aims to classify human activities or predict human behavior by means of features extracted from sensors data. Typical HAR systems use wearable sensors and/or handheld and mobile devices with built-in sensing capabilities. Due to the widespread use of smartphones and to the inclusion of...
Conference Paper
Recently, several classification algorithms capable of dealing with potentially infinite data streams have been proposed. One of the main challenges of this task is to continuously update predictive models to address concept drifts without compromise their predictive performance. Moreover, the classification algorithm used must be able to efficient...
Article
Meta-learning has been successfully used for algorithm recommendation tasks. It uses machine learning to induce meta-models able to predict the best algorithms for a new dataset. In this paper, meta-models are applied to a set of meta-features, describing a dataset, to predict the performance of clustering algorithms applied to this dataset. The pa...
Conference Paper
Full-text available
As Collaborative Filtering becomes increasingly important in both academia and industry recommendation solutions, it also becomes imperative to study the algorithm selection task in this domain. This problem aims at finding automatic solutions which enable the selection of the best algorithms for a new problem, without performing full-fledged train...
Preprint
Full-text available
Algorithm selection using Metalearning aims to find mappings between problem characteristics (i.e. metafeatures) with relative algorithm performance to predict the best algorithm(s) for new datasets. Therefore, it is of the utmost importance that the metafeatures used are informative. In Collaborative Filtering, recent research has created an exten...
Article
Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors hav...
Article
Noise is often present in real datasets used for training Machine Learning classifiers. Their disruptive effects in the learning process may include: increasing the complexity of the induced models, a higher processing time and a reduced predictive power in the classification of new examples. Therefore, treating noisy data in a preprocessing step i...
Preprint
Full-text available
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describ...
Article
Full-text available
Motivation: With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsol...
Article
Full-text available
The glass transition temperature (Tg) is a kinetic property of major importance for both fundamental and applied glass science. In this study, we designed and trained an artificial neural network to induce a model that can predict the Tg of multicomponent oxide glasses. To do this, we used a dataset containing more than 55,000 inorganic glass compo...
Preprint
Full-text available
To select the best algorithm for a new problem is an expensive and difficult task. However, there are automatic solutions to address this problem: using Metalearning, which takes advantage of problem characteristics (i.e. metafeatures), one is able to predict the relative performance of algorithms. In the Collaborative Filtering scope, recent works...
Article
Full-text available
Food trucks are a widely popular fast food restaurant alternative, whose differentiating factor is their proximity to customers. Their popularity has stimulated the expansion of available options, which now includes several different types of cuisines, consequently making the choice by customers a challenging issue. From data obtained via a market...
Chapter
Full-text available
This paper addresses the Cluster Editing problem. The objective of this problem is to transform a graph into a disjoint union of cliques using a minimum number of edge modifications. This problem has been considered in the context of bioinformatics, document clustering, image segmentation, consensus clustering, qualitative data clustering among oth...
Article
Many real-world situations constantly generate concept-drifting data streams at high speed. These situations demand adaptive algorithms able to learn online in accordance with the most recent target function (concept). This paper presents Online Adaptive Classifier Ensemble, a new ensemble algorithm able to learn from concept-drifting data streams....
Chapter
This chapter describes a new group of predictive learning algorithms – search‐based and optimization‐based algorithms – which allow us to deal efficiently with more complex classification tasks. Decision tree induction algorithms (DTIAs) induce models with a tree‐shaped decision structure where each internal node is associated with one or more pred...
Chapter
This chapter describes the three current fields of data analytics that are attracting a great deal of attention due to their wide application in different domains: text mining, social network analysis (SNA) and recommendation systems. Text mining is a very active area of data analytics. Text mining is an important part of several other tasks, like...