
Anne M. P. Canuto- Federal University of Rio Grande do Norte
Anne M. P. Canuto
- Federal University of Rio Grande do Norte
About
197
Publications
19,640
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,763
Citations
Introduction
Current institution
Publications
Publications (197)
Neste cenário de crescimento exponencial nos dados globais, a habilidade de processamento e análise torna-se vital para empresas e organizações. Este artigo tem por objetivo a análise de duas abordagens fundamentais: processamento em tempo real e processamento em lotes. Ao analisar métricas de comparação, características e vantagens/desvantagens, e...
This paper proposes the use of a more elaborated criterion for both selecting and/or labelling unlabelled instances of a wrapping-based SSL method. In order to assess the feasibility of the proposed method, an empirical analysis will be conducted, in which the proposed Self-training versions are compared to other Self-training versions, some existi...
Audio signal processing has been under investigation for the last decades. The majority of the works found in literature focus on signal analysis and classification. Most of them integrate Machine Learning (ML) algorithms with the audio signal processing techniques. As the performance of any ML algorithm depends on the features of a dataset used fo...
Semi-supervised learning is a machine learning approach that integrates supervised and unsupervised learning mechanisms. In this learning, most of labels in the training set are unknown, while there is a small part of data that has known labels. The semi-supervised learning is attractive due to its potential to use labeled and unlabeled data to per...
It is well known that machine learning (ML) techniques have been playing an important role in several real world applications. However, one of the main challenges is the selection of the most accurate technique to be used in a specific application. In the classification context, for instance, two main approaches can be applied, model selection and...
Machine Learning (ML) is a field that aims to develop efficient techniques to provide intelligent decision making solutions to complex real problems. Among the different ML structures, a classifier ensemble has been successfully applied to several classification domains. A classifier ensemble is composed of a set of classifiers (specialists) organi...
Data streams applications generate a continuous stream of data in a high rate that it is not possible to store all data in available memory. Hence, it is important to apply techniques that are capable of learning concepts according to data presentation, taking into consideration available time, processing and memory resources. This paper presents a...
Respiratory and lung problems affect a high number of people every year. As a result, great ammount of effort has been put in order to provide understanding and diagnosis of these types of diseases leading to an early search for treatment or even an adequate diagnosis of it. In this context, recent research has been using computational techniques o...
Although several automatic computer systems have been proposed to address facial expression recognition problems, the majority of them still fail to cope with some requirements of many practical application scenarios. In this paper, one of the most influential and common issues raised in practical application scenarios when applying automatic facia...
Hierarchical agglomerative clustering (HAC) is among the most widely adopted algorithms in unsupervised learning. This method employs a linkage criterion to measure the similarity between two clusters and the correct selection of this criterion highly influences the performance of HAC. This paper presents and evaluates a novel linkage criterion for...
Feature selection has become a mandatory step in several data exploration and Machine Learning applications since data quality can have a strong impact in the performance of machine learning models. Many feature selection strategies have been developed in the past decades, using different criteria to select the most relevant features. The use of dy...
Pokémon GO is one of the most popular Pokémon games. This game consists of walking around the world and collecting Pokémon characters using augmented reality. In addition , you can battle with friends, join a gym, or make attacks. These battles must happen between teams with the same size, and this poses a question that is related to the best combi...
The use of Semi-supervised Learning (SSL) methods have emerged as an efficient solution to smooth out the problem of availability of labelled instances. Several methods have been proposed in the literature and Self-training and Co-training are two well-known methods. The main aim is to use only a few labelled instances to define a model and to appl...
Object Detection (OD) is an important task in Computer Vision with many practical applications. For some use cases, OD must be done on videos, where the object of interest has a periodic motion. In this paper, we formalize the problem of periodic OD, which consists in improving the performance of an OD model in the specific case where the object of...
In real world classification problems, the amount of labelled data is usually limited (very hard or expensive to manually label the instances). However, a natural limitation of a classification algorithm is that it needs to have a set of labelled instances with a reasonable size in order to achieve a reasonable performance. Therefore, one solution...
Clustering algorithms have been applied to different problems in many different real-word applications. Nevertheless, each algorithm has its own advantages and drawbacks, which can result in different solutions for the same problem. Therefore, the combination of different clustering algorithms (cluster ensembles) has emerged as an attempt to overco...
It has become increasingly common in academic and industrial environments the necessity to process huge amounts of seismic signals. Several researchers have been seeking for ways to improve and optimize the processing of these enormous amounts of data that are related to routine demands of geophysicists. One of these demands is the classification o...
Semi-supervised learning (SSL) is a paradigm that has been continuously used in data classification tasks in datasets that do not have enough labeled instances to train a supervised model with a minimum acceptable accuracy. In this context, data stream classification in dynamic environments appears as a natural application for this approach, becaus...
In the context of facial expression recognition (FER), this paper reviews the fundamental theories of emotions and further explains the key dimensions of a defined emotional space. The main contribution of this paper is to propose a set of novel categorization methods for facial expressions to be used in the design of an automatic FER system. This...
It is well known that relational databases still play an important role for many companies around the world. For this reason, the use of data mining methods to discover knowledge in large relational databases has become an interesting research issue. In the context of unsupervised data mining, for instance, the conventional clustering algorithms ca...
Supervised machine learning methods, also known as classification algorithms, have been widely used in the literature for many classification tasks. In this context, some aspects of these algorithms, as the used attributes used and the form they were built, have a direct impact in the system performance. Therefore, in this paper, we evaluate the ap...
O Aprendizado de Máquina (AM) tem se popularizado nos últimos anos como uma
abordagem eficiente para resolução de problemas. Existem na atualidade centenas de
métodos de classificação, por exemplo, o que torna praticamente impossível analisar todos os possíveis resultados, dado que além de existirem muitos métodos, são muitas as configurações para...
ABSTRACT
Semi-supervised learning algorithms are able to train classifiers from a small portion of initially labeled objects. The reliability of the classification process depends on several factors that include the type of classifier used and a set of parameters that customize them. One of the most important factors is a threshold that determine...
Metaheuristic algorithms have been applied to a wide range of global optimization problems. Basically, these techniques can be applied to problems in which a good solution must be found, providing imperfect or incomplete knowledge about the optimal solution. However, the concept of combining metaheuristics in an efficient way has emerged recently,...
The potential for processing car sensing data has increased in recent years due to the development of new technologies. Having this type of data is important, for instance, to analyze the way drivers behave when sitting behind steering wheel. Many studies have addressed the drive behavior by developing smartphone-based telematics systems. However,...
Semi-supervised learning algorithms are able to train classifiers from a small portion of initially labeled objects. The reliability of the classification process depends on several factors that include the type of classifier used and a set of parameters that customize them. One of the most important factors is a threshold that determines which ins...
One of the main issues of machine learning algorithms is the curse of dimensionality. With the fast growing of complex data in real world scenarios, the feature selection becomes a mandatory preprocessing step in any application to reduce both the complexity of the data and the computing time. Based on that, several works have been produced in orde...
Classifier ensembles are pattern recognition structures composed of a set of classification algorithms (members), organized in a parallel way, and a combination method with the aim of increasing the classification accuracy of a classification system. In this study, we investigate the application of a generalized mixture (GM) functions as a new appr...
Classifier ensembles are pattern recognition structures composed of a set of classification algorithms (members), organized in a parallel way, and a combination method with the aim of increasing the classification accuracy of a classification system. In this study, we investigate the application of a generalized mixture (GM)functions as a new appro...
Ensemble of Classifiers are composed of parallel-organized components (individual classifiers) whose outputs are combined using a combination method that provides the final output for an ensemble. In this context, Dynamic Ensemble Systems (DES) is an ensemble-based system that, for each test pattern, a different ensemble structure is defined, in wh...
This paper performs an exploratory study of the use of metaheuristic optimization techniques to select important parameters (features and members) in the design of ensemble of classifiers. In order to do this, an empirical investigation, using 10 different optimization techniques applied to 23 classification problems, will be performed. Furthermore...
This paper proposes the use of a Information Theory measure in a dynamic feature selection approach. We tested such approach including elements of Information Theory in the process, such as Mutual Information, and compared with classical methods like PCA and LDA as well as Mutual Information based algorithms. Results showed that the proposed method...
This paper introduces a new dynamic feature selection to classification algorithms, which is based on individual similarity and it uses a clustering algorithm to select the best features for an instance individually. In addition, an empirical analysis will be performed to evaluate the performance of the proposed method and to compare it with existi...
Usually, the evaluation of the classifiers performance is not an easy task to be performed, mainly when we analyze different criteria (output parameters). In this evaluation process, we can use quantitative measures (accuracy, specificity, among others), however when the output values are very close and we have several criteria, the results are dif...
Several clustering algorithms have been applied to a great variety of problems in different application domains. Each algorithm, however, has its own advantages and limitations, which can result in different solutions for the same problem. In this sense, combining different clustering algorithms (cluster ensembles) is one of the most used approache...
The main aim of this paper is to combine multiple partitions generated by different clustering algorithms into a single clustering solution (consensus partition), using a new bio-inspired optimization technique to optimize the cluster ensembles. In this proposed technique, the cluster ensembles are heterogeneously created and the initial partitions...
The task of automatically configuring classifier ensembles as a multilabel classification problem and investigate a meta-learning approach to cope with it is modelled. More specifically, the role of the metalearner is to recommend the types of components that should comprise the best ensemble model for coping with a given classification problem. Si...
Ensemble systems are classification structures that apply a two-level decision-making process, in which the first level produces the outputs of the individual classifiers and the second level produces the output of the combination method (final output). Although ensemble systems have been proven to be efficient for pattern recognition tasks, its ef...
This paper describes an auxiliary environment in the process of teaching and learning for students and professors of medicine. The environment has a serious game available in various computing devices to simulate clinical cases in order to assess students' knowledge. Diagnostics are simulated using 3D environment, mobile application using voice syn...
The main goal of using data with interval nature is to represent numeric information endowed with impreciseness, which are normally captured from measures of real world. However, in order to do this, it is necessary to adapt real-valued techniques to be applied on interval-based data. For interval-based clustering applications, for instance, it is...
Ensemble of classifiers, or simply ensemble systems, have been proved to be efficient for pattern recognition tasks. However, its design can become a difficult task. For instance, the choice of its individual classifiers and the use of feature selection methods are very difficult to define in the design of these systems. In order to smooth out this...
Behavioural Biometric-based authentication systems can be considered as an emergent area in future of the user identification, verification and access control systems. How-ever, there is still much progress to be done in this field, specially related to system security and acceptable accuracy results for its practical use. One alternative solution...
In classification problems with hierarchical structures of labels, the target function must assign several labels that are hierarchically organized. The hierarchical structures of labels can be used either for single-label (one label per instance) or multi-label classification problems (more than one label per instance). In general, classification...
In this paper, we investigate two important problems in multi-label classification algorithms, which are: the number of labeled instances and the high dimensionality of the labeled instances. In the literature, we can find several papers about multi-label classification problems, where an instance can be associated with more than one label simultan...
This paper presents an experimental analysis of a revocable biometric verification problem using ensemble systems.
Behavioural Biometric-based systems are a future emergent area on identification, verification and access control systems of users.
However, there is still progress to be done in this field, specially related to system security and a...
The main advantage of using an interval-based distance for interval-based data lies on the fact that it preserves the underlying imprecision on intervals which is usually lost when real-valued distances are applied. One of the main problems when using interval-based distance in fuzzy clustering algorithms is the way to obtain the center of the grou...
In classification problems with hierarchical structures of labels, the target function must assign labels that are hierarchically organized and it can be used either for single-label (one label per instance) or multi-label classification problems (more than one label per instance). In parallel to these developments, the idea of semi-supervised lear...
Feature selection methods select a subset of attributes (features) of a dataset and it is done based on a defined measure, eliminating the redundant and irrelevant ones. When a feature selection method is applied in a dataset, we aim to improve the quality of the dataset representation. For ensemble systems, feature selection techniques can supply...
The concept of cancellable biometrics has been introduced as a way to overcome privacy concerns surrounding the management of biometric data. The goal is to transform a biometric trait into a new but revocable representation for enrolment and identification/verification. Thus, if compromised, a new representation of original biometric data can be g...
This paper presents an experimental analysis in a revocable biometric verification problem. Behavioural Biometric-based systems are the emergent area in future of the user identification, verification and access control systems. However, there is still much progress to be done in this field, specially related to system security and acceptable accur...
Currently, there is a concern about the security of biometric data in the identification systems, mainly due to the increase of fraudulent attacks in these systems. Therefore, in this paper, we propose a comparative analysis of traditional cryptographic algorithms and transformation functions to be used as biometric template protection methods in t...
Ensemble systems are composed of a set of individual classifiers, organized in a parallel way, that receive the input patterns and send their output to a combination method, which is responsible for providing the final output of the system. The use of feature selection methods in ensemble systems has been shown to be efficient, since it reduces the...
In this paper, we propose a comparative analysis of the use of cryptography and transformation functions to be used as biometric (signature) template protection methods. The main goal is to investigate the increasement of the biometric dataset security as well as the performance of the protected dataset in the biometric-based systems. We use the we...
Biometric-based identification systems can offer several advantages over traditional forms of identity authentication. However, concerns have been raised about the privacy of the personal biometric data, since these systems need to ensure their integrity and public acceptance. In order to address these issues, the notion of cancellable biometrics w...
In a decision making process, we are usually oriented to take into consideration all the relevant features (characteristics) involved in a specific problem. In Machine Learning, for instance, a decision is made through the use of a learning algorithm and the characterization process is represented by the corresponding datasets. In this context, cla...
In most traditional classification methods, each instance is associated with one single nominal target variable (single-label problems). However, there are also cases where an instance can be associated with more than one label simultaneously, referring to as multi-label classification problems. One of the main problems with classification methods...
Similarity and dissimilarity (distance) between objects is an important aspect that must be considered when clustering data. When clustering categorical data, for instance, these distance (similarity or dissimilarity) measures need to address properly the real particularities of categorical data. In this paper, we perform a comparative analysis wit...
This paper investigates the influence of measures of good and bad diversity when used explicitly to guide the search of a genetic algorithm to design ensemble systems. We then analyze what the best set of objectives between classification error, good diversity and bad diversity as well as all combination of them. In this analysis, we make use of th...
Cancellable biometrics has recently been introduced in order to overcome some privacy issues about the management of biometric data, aiming to transform a biometric trait into a new but revocable representation for enrolment and identification (verification). Therefore, a new representation of original biometric data can be generated in case of bei...
This paper presents an analysis of applying ensemble systems in revocable biometric recognition. Biometric-based systems are the eminent future of the identification techniques and users access control. However, there is still much progress to be done in this field, specially related to systems security. One alternative to the security problem in b...
This paper proposes a multiagent approach for metaheuristics hybridization inspired on the popular technique called Particle Swarm Optimization (PSO). In the proposed approach, agents develop a society with collaboration to achieve their own individual as well as common goals and their decision-making process matches the basic nature of a particle...