Juan J. Rodríguez

Juan J. Rodríguez
Verified
Juan verified their affiliation via an institutional email.
Verified
Juan verified their affiliation via an institutional email.
  • PhD
  • Professor at University of Burgos

About

145
Publications
48,737
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,290
Citations
Current institution
University of Burgos
Current position
  • Professor

Publications

Publications (145)
Article
Full-text available
This paper shows a new low-cost technology for the measurement of crack propagation in quasi-fragile materials based on a stereo pair of cameras and LED light spots. The two cameras record the displacement experienced by a series of LED white lights. For each frame, the X , Y and Z 3D coordinates of all the centroids of the LED points are obtained....
Article
Full-text available
The prediction of multiple numeric outputs at the same time is called multi-target regression (MTR), and it has gained attention during the last decades. This task is a challenging research topic in supervised learning because it poses additional difficulties to traditional single-target regression (STR), and many real-world problems involve the pr...
Article
Full-text available
Monitoring students in Learning Management Systems (LMS) throughout the teaching–learning process has been shown to be a very effective technique for detecting students at risk. Likewise, the teaching style in the LMS conditions, the type of student behaviours on the platform and the learning outcomes. The main objective of this study was to test t...
Article
This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classifi...
Article
One of the main goals of Big Data research, is to find new data mining methods that are able to process large amounts of data in acceptable times. In Big Data classification, as in traditional classification, class imbalance is a common problem that must be addressed, in the case of Big Data also looking for a solution that can be applied in an acc...
Article
Full-text available
Radar technology has evolved considerably in the last few decades. There are many areas where radar systems are applied, including air traffic control in airports, ocean surveillance, and research systems, to cite a few. Other types of sensors have recently appeared, which allow tracking sub-millimeter motion with high speed and accuracy rates. The...
Article
Datasets are growing in size and complexity at a pace never seen before, forming ever larger datasets known as Big Data. A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. Some decades ago, imbalanced classification was therefore introduced, to correct the tende...
Article
Full-text available
Featured Application This work has an important direct application for teachers or educational institutions working with Moodle, because it provides an open access software application, UBUMonitor, which facilitates the detection of students at risk. Abstract In this study, we used a module for monitoring and detecting students at risk of dropping...
Article
The Rotation Forest classifier is a successful ensemble method for a wide variety of data mining applications. However, the way in which Rotation Forest transforms the feature space through PCA, although powerful, penalizes training and prediction times, making it unfeasible for Big Data. In this paper, a MapReduce Rotation Forest and its implement...
Preprint
Full-text available
In classification problems, the purpose of feature selection is to identify a small, highly discriminative subset of the original feature set. In many applications, the dataset may have thousands of features and only a few dozens of samples (sometimes termed `wide'). This study is a cautionary tale demonstrating why feature selection in such cases...
Article
Over the past few decades, the remarkable prediction capabilities of ensemble methods have been used within a wide range of applications. Maximization of base-model ensemble accuracy and diversity are the keys to the heightened performance of these methods. One way to achieve diversity for training the base models is to generate artificial/syntheti...
Article
Full-text available
The use of learning environments that apply Advanced Learning Technologies (ALTs) and Self-Regulated Learning (SRL) is increasingly frequent. In this study, eye-tracking technology was used to analyze scan-path differences in a History of Art learning task. The study involved 36 participants (students versus university teachers with and without pre...
Article
Random Balance strategy (RandBal) has been recently proposed for constructing classifier ensembles for imbalanced, two-class data sets. In RandBal, each base classifier is trained with a sample of the data with a random class prevalence, independent of the a priori distribution. Hence, for each sample, one of the classes will be undersampled while...
Article
Full-text available
In this paper, the focus is on the application of prototype selection to multi-label data sets as a preliminary stage in the learning process. There are two general strategies when designing Machine Learning algorithms that are capable of dealing with multi-label problems: data transformation and method adaptation. These strategies have been succes...
Article
Full-text available
The multi-label classification problem is an extension of traditional (single-label) classification, in which the output is a vector of values rather than a single categorical value. The multi-label problem is therefore a very different and much more challenging one than the single-label problem. Recently, multi-label classification has attracted i...
Article
Full-text available
High-dimensional data with very few instances are typical in many application domains. Selecting a highly discriminative subset of the original features is often the main interest of the end user. The widely-used feature selection protocol for such type of data consists of two steps. First, features are selected from the data (possibly through cros...
Article
Full-text available
Detecting change in multivariate data is a challenging problem, especially when class labels are not available. There is a large body of research on univariate change detection, notably in control charts developed originally for engineering applications. We evaluate univariate change detection approaches —including those in the MOA framework — buil...
Article
We consider a problem where a set X of N objects (instances) coming from c classes have to be classified simultaneously. A restriction is imposed on X in that the maximum possible number of objects from each class is known, hence we dubbed the problem who-is-there? We compare three approaches to this problem: (1) independent classification whereby...
Article
The selection of the right cutting tool in manufacturing process design is always an open question, especially when different tools are available on the market with similar characteristics, but marked differences in price, ranging from low-cost to high-performance cutting tools. The ultimate decision of the engineer will depend on previous experien...
Article
Full-text available
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems...
Article
Machine Learning has two central processes of interest that captivate the scientific community: classification and regression. Although instance selection for classification has shown its usefulness and has been researched in depth, instance selection for regression has not followed the same path and there are few published algorithms on the subjec...
Article
Ensembles are learning methods the operation of which relies on a combination of different base models. The diversity of ensembles is a fundamental aspect that conditions their operation. Random Feature Weights (\({\mathcal {RFW}}\)) was proposed as a classification-tree ensemble construction method in which diversity is introduced into each tree b...
Article
An important step in building expert and intelligent systems is to obtain the knowledge that they will use. This knowledge can be obtained from experts or, nowadays more often, from machine learning processes applied to large volumes of data. However, for some of these learning processes, if the volume of data is large, the knowledge extraction pha...
Conference Paper
Binarization techniques deal with multiclass classification problem combining several binary classifiers. They were originally introduced for dealing with multiclass problems with methods that were only able to deal with two classes (e.g., SVM). Nevertheless, it has been shown that they can also be useful with classification methods able to deal di...
Article
Cutting tool breakage detection is an important task, due to its economic impact on mass production lines in the automobile industry. This task presents a central limitation: real data-sets are extremely imbalanced because breakage occurs in very few cases compared with normal operation of the cutting process. In this paper, we present an analysis...
Article
Two new methods for tree ensemble construction are presented: G-Forest and GAR-Forest. In a similar way to Random Forest, the tree construction process entails a degree of randomness. The same strategy used in the GRASP metaheuristic for generating random and adaptive solutions is used at each node of the trees. The source of diversity of the ensem...
Article
Full-text available
We propose a probabilistic framework for classifier combination, which gives rigorous optimality conditions (minimum classification error) for four combination methods: majority vote, weighted majority vote, recall combiner and the naive Bayes combiner. The framework is based on two assumptions: class-conditional independence of the classifier outp...
Conference Paper
Full-text available
The improvement of certain manufacturing processes often involves the challenge of how to optimize complex and multivariable processes under industrial conditions. Moreover, many of these processes can be treated as regression or classification problems. Although their outputs are in the form of continuous variables, industrial requirements define...
Article
Rotation Forest, originally proposed for the combination of classifiers, has shown itself to be very competitive, when compared with other ensemble construction methods. In this paper, the performance of Rotation Forest for combining regressors is investigated using a broad range of datasets, 61 in total, which vary in size from 13 to more than 40,...
Conference Paper
Full-text available
In the Random Oracle ensemble method, each base classifier is a mini-ensemble of two classifiers and a randomly generated oracle that selects one of the two classifiers. The performance of this method have been previously studied, but not for imbalanced data sets. This work studies its performance for this kind of data. As the Random Oracle ensembl...
Article
This work presents a novel approach to multivariate time series classification. The method exploits the multivariate structure of the time series and the possibilities of the stacking ensemble method. The basics of the method may be described in three steps: first, decomposing the multivariate time series on its constituent univariate time series;...
Conference Paper
Disturbing Neighbors (DN) is a method for generating classifier ensembles. Moreover, it can be combined with any other ensemble method, generally improving the results. This paper considers the application of these ensembles to imbalanced data: classification problems where the class proportions are significantly different. DN ensembles are compare...
Article
We present a method for constructing ensembles of classifiers using supervised projections of random subspaces. The method combines the philosophy of boosting, focusing on difficult instances, with the improved accuracy achieved by supervised projection methods to obtain very good results in terms of testing error. To achieve both accuracy and dive...
Article
Full-text available
The diagnosis of Chronic Obstructive Pulmonary Disease COPD is based on symptoms, clinical examination, exposure to risk factors smoking and certain occupational dusts and confirming lung airflow obstruction on spirometry. However, most people with COPD remain undiagnosed and controversies regarding spirometry persist. Developing accurate and relia...
Article
Full-text available
Event-related potential data can be used to index perceptual and cognitive operations. However, they are typically high-dimensional and noisy. This study examines the original raw data and six feature-extraction methods as a pre-processing step before classification. Four traditionally used feature-extraction methods were considered: principal comp...
Article
This paper proposes a method for constructing ensembles of decision trees, random feature weights (RFW). The method is similar to Random Forest, they are methods that introduce randomness in the construction method of the decision trees. In Random Forest only a random subset of attributes are considered for each node, but RFW considers all of them....
Conference Paper
Two contexts may be considered, in which it is of interest to reduce the dimension of a data set. One of these arises when the intention is to mitigate the curse of dimensionality, when the data set will be used for training a data mining algorithm with a heavy computational load. The other is when one wishes to identify the data set attributes tha...
Conference Paper
Ensembles of decision trees are considered for imbalanced datasets. Conventional decision trees (C4.5) and trees for imbalanced data (CCPDT: Class Confidence Proportion Decision Tree) are used as base classifiers. Ensemble methods, based on undersampling and oversampling, for imbalanced data are considered. Conventional ensemble methods, not specif...
Article
Full-text available
Laser polishing of steel components is an emergent process in the automation of finishing operations in the industry. The aim of this work is to develop a soft computing tool for surface roughness prediction of laser polished components. The laser polishing process depends primarily on three factors: surface material, initial topography and energy...
Conference Paper
This work describes a new on-line sensor that includes a novel calibration process for the real-time condition monitoring of lubricating oil. The parameter studied with this sensor has been the variation of the Total Acid Number (TAN) since the beginning of oil’s operation, which is one of the most important laboratory parameters used to determine...
Article
Full-text available
This paper presents an experimental study using different projection strategies and techniques to improve the performance of Support Vector Machine (SVM) ensembles. The study has been made over 62 UCI datasets using Principal Component Analysis (PCA) and three types of Random Projections (RP), taking into account the size of the projected space an...
Article
Random Forests have been recently widely used for different kinds of classification problems. One of them is classification of gene expression samples that is known as a problem with extremely high dimensionality, and therefore demands suited classification techniques. Due to its strong robustness with respect to large feature sets, Random Forests...
Conference Paper
Data projections have been used extensively to reduce input space dimensionality. Such reduction is useful to get faster results, and sometimes can help to discard unnecessary or noisy input dimensions. Random Projections (RP) can be computed faster than other methods as for example Principal Component Analysis (PCA). This paper presents an experim...
Conference Paper
Full-text available
This paper proposes a method for constructing ensembles of decision trees: GRASP Forest. This method uses the metaheuristic GRASP, usually used in optimization problems, to increase the diversity of the ensemble. While Random Forest increases the diversity by randomly choosing a subset of attributes in each tree node, GRASP Forest takes into accoun...
Conference Paper
Model trees are decision trees with linear regression functions at the leaves. Although originally proposed for regression, they have also been applied successfully in classification problems. This paper studies their performance for imbalanced problems. These trees give better results that standard decision trees (J48, based on C4.5) and decision...
Chapter
Full-text available
This paper considers the use of Random Oracles in Ensembles for regression tasks. A Random Oracle model (Kuncheva and Rodríguez, 2007) consists of a pair of models and a fixed randomly created "oracle" (in the case of the Linear Random Oracle, it is a hyperplane that divides the dataset in two during training and, once the ensemble is trained, deci...
Article
The classification of genomic and proteomic data in extremely high dimensional datasets is a well-known problem which requires appropriate classification techniques. Classification methods are usually combined with gene selection techniques to provide optimal classification conditions—i.e. a lower dimensional classification environment. Another rea...
Conference Paper
This work presents an experimental study of ensemble methods for regression, using Multilayer Perceptrons (MLP) as the base method and 61 datasets. The considered ensemble methods are Randomization, Random Subspaces, Bagging, Iterated Bagging and AdaBoost.R2. Surprisingly, because it is in contradiction to previous studies, the best overall results...
Conference Paper
This work presents an on-line diagnosis algorithm for dynamic systems that combines model based diagnosis and machine learning techniques. The Possible Conflicts (PCs) method is used to perform consistency based diagnosis, providing fault detection and isolation. Machine learning methods are use to induce time series classifiers, that are applied...
Article
Full-text available
Functional magnetic resonance imaging (fMRI) is becoming a forefront brain-computer interface tool. To decipher brain patterns, fast, accurate and reliable classifier methods are needed. The support vector machine (SVM) classifier has been traditionally used. Here we argue that state-of-the-art methods from pattern recognition and machine learning,...
Conference Paper
Functional Trees are one type of multivariate trees. This work studies the performance of different ensemble methods (Bagging, Random Subspaces, AdaBoost, Rotation Forest) using three variants (multivariate internal nodes, multivariate leaves or both) of these trees as base classifiers. The best results, for all the ensemble methods, are obtained u...
Article
Full-text available
Classification of brain images obtained through functional magnetic resonance imaging (fMRI) poses a serious challenge to pattern recognition and machine learning due to the extremely large feature-to-instance ratio. This calls for revision and adaptation of the current state-of-the-art classification methods. We investigate the suitability of the...
Article
Full-text available
Ensemble methods are often able to generate more accurate classifiers than the individual classifiers. In multiclass problems, it is possible to obtain an ensemble combining binary classifiers. It is sensible to use a multiclass method for constructing the binary classifiers, because the ensemble of binary classifiers can be more accurate than the...
Chapter
Full-text available
Ensemble methods take their output from a set of base predictors. The ensemble accuracy depends on two factors: the base classifiers accuracy and their diversity (how different these base classifiers outputs are from each other). An approach for increasing the diversity of the base classifiers is presented in this paper. The method builds some new...
Article
Full-text available
This work presents an on-line diagnosis algorithm for dynamic systems that combines model based diagnosis and machine learning techniques. The Possible Conflicts method is used to perform con-sistency based diagnosis. Possible conflicts are in charge of fault detection and isolation. Machine learning methods are use to induce time series clas-sifie...
Conference Paper
Full-text available
Ensembles need their base classifiers do not always agree for any prediction (diverse base classifiers). Disturbing Neighbors (DN\mathcal{DN}) is a method for improving the diversity of the base classifiers of any ensemble algorithm. DN\mathcal{DN} builds for each base classifier a set of extra features based on a 1-Nearest Neighbors (1-NN) output....
Article
Full-text available
Resumen El desarrollo de trabajos final de carrera en las actuales titulaciones de informática es una de las materias fundamentales a la conclusión de los estudios. En las propuestas de grado y máster este punto sigue siendo fundamental, no existiendo apenas discusión en la necesidad de realizar un proyecto donde se aglutinen todos los conocimiento...
Conference Paper
Full-text available
Any change in the classification problem in the course of online classification is termed changing environments. Examples of changing environments include change in the underlying data distribution, change in the class definition, adding or removing a feature. The two general strategies for handling changing environments are (i) constant update of...
Conference Paper
Random Forests, Support Vector Machines and k-Nearest Neighbors are successful and proven classification techniques that are widely used for different kinds of classification problems. One of them is classification of genomic and proteomic data that is known as a problem with extremely high dimensionality and therefore demands suited classification...
Article
Boosting is a set of methods for the construction of classifier ensembles. The differential feature of these methods is that they allow to obtain a strong classifier from the combination of weak classifiers. Therefore, it is possible to use boosting methods with very simple base classifiers. One of the most simple classifiers are decision stumps, d...
Chapter
In pattern recognition, many learning methods need numbers as inputs. This paper analyzes two-level classifier ensembles to improve numeric learning methods on nominal data. A different classifier was used at each level. The classifier at the base level transforms the nominal inputs into continuous probabilities that the classifier at the meta leve...
Conference Paper
Full-text available
We describe an artificial vision system used to recognize the Spanish car license plate numbers in raster images. The algorithm is designed to be independent of the distance from the car to the camera, the size of the plate number, the inclination and the light conditions. In the preprocessing steps, the algorithm takes a raster image as input and...
Chapter
This work explores the capacity of Stacking to generate multivariate time series classifiers from classifiers of their univariate time series components. The Stacking scheme proposed uses k-nearest neighbors (K-NN) with dynamic time warping (DTW) as a dissimilarity measure for the level 0 learners. Support vector machines and Na ̈ıve Bayes are appl...
Conference Paper
This paper explores an integrated approach to diagnosis of complex dynamic systems. Consistency-based diagnosis is capable of performing automatic fault detection and localization using just correct behaviour models. Nevertheless, it may exhibit low discriminative power among fault candidates. Hence, we combined the consistency based approach with...
Article
Scrapie is a neuro-degenerative disease in small ruminants. A data set of 3113 records of sheep reported to the Scrapie Notifications Database in Great Britain has been studied. Clinical signs were recorded as present/absent in each animal by veterinary officials (VO) and a post-mortem diagnosis was made. In an attempt to detect healthy animals wit...
Conference Paper
Classification methods are widely used in computer-based medical systems. Often, the accuracy of a classifier can be improved using a classifier ensemble, the combination of several classifiers. Two classifiers ensembles and their results on several medical data sets will be presented: Rotation Forest (Rodriguez, Kuncheva and Alonso) and Random Ora...
Conference Paper
Consistency-based diagnosis automatically provides fault detection and localization capabilities, using just models for correct behavior. However, it may exhibit a lack of discrimination power. Knowledge about fault modes can be added to tackle the problem. Unfortunately, it brings additional complexity issues, since it will be necessary to discrim...
Conference Paper
Full-text available
Ensemble methods with Random Oracles have been pro- posed recently (Kuncheva and Rodr´ iguez, 2007). A random-oracle clas- sifier consists of a pair of classifiers and a fixed, randomly created or- acle that selects between them. Ensembles of random-oracle decision trees were shown to fare better than standard ensembles. In that study, the oracle f...
Conference Paper
Full-text available
In pattern recognition many methods need numbers as inputs. Using nominal datasets with these methods requires to transform such data into numerical. Usually, this transformation consists in encoding nominal attributes into a group of binary attributes (one for each possible nominal value). This approach, however, can be enhanced for certain method...
Conference Paper
Full-text available
Rotation Forest is a recently proposed method for building classifier ensembles using independently trained decision trees. It was found to be more accurate than bagging, AdaBoost and Random Forest ensembles across a collection of benchmark data sets. This paper car- ries out a lesion study on Rotation Forest in order to find out which of the param...
Article
Full-text available
We propose a combined fusion-selection approach to classifier ensemble design. Each classifier in the ensemble is replaced by a miniensemble of a pair of subclassifiers with a random linear oracle to choose between the two. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual...
Conference Paper
We describe and application which allows the interactive use of Andrews curves variants. In this application we can use general and well established mechanisms such as brushing and linking, but as well others new and specific for Andrews curves. The graphic user interface of the application allows the visualization of the Andrews curves and the gra...
Article
Full-text available
We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the...
Conference Paper
Full-text available
Grafted trees are trees that are constructed using two methods. The first method creates an initial tree, while the second method is used to complete the tree. In this work, the first classifier is an unpruned tree from a 10% sample of the training data. Grafting is a method for constructing en- sembles of decision trees, where each tree is a graft...
Conference Paper
Full-text available
Ensemble methods allow to improve the accuracy of clas- sification methods. This work considers the application of one of these methods, named Rotation-based, when the classifiers to combine are RBF Networks. This ensemble method, for each member of the ensemble, trans- forms the data set using a pseudo-random rotation of the axis. Then the classif...
Conference Paper
In this paper we introduce a system for early classification of several fault modes in a continuous process. Early fault classification is basic in supervision and diagnosis systems, since a fault could arise at any time, and the system must identify the fault as soon as possible. We present a computational framework to deal with the problem of ear...
Article
In previous works, a time series classification system has been presented. It is based on boosting very simple classifiers, formed only by one literal. The used literals are based on temporal intervals. The obtained classifiers were simply a linear combination of literals, so it is natural to expect some improvements in the results if those literal...
Conference Paper
Full-text available
In Machine Learning, ensembles are combination of classifiers. Their objective is to improve the accuracy. In previous works, we have presented a method for the generation of ensembles, named rotation-based. It transforms the training data set; it groups, randomly, the attributes in different subgroups, and applies, for each group, an axis rotation...

Network

Cited By