Héctor Allende

Héctor Allende
Universidad Técnica Federico Santa María · Department of Informatics

Dr. rer. nat. Statistik

About

185
Publications
19,912
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,729
Citations
Additional affiliations
January 1972 - January 2021
Universidad Técnica Federico Santa María
Position
  • Professor (Full)

Publications

Publications (185)
Article
Full-text available
The overload problem in semi-autogenous grinding (SAG) mills is critical in the mining industry, impacting the extraction of valuable metals and overall productivity. Overloads can lead to severe operational issues, including increased wear, reduced grinding efficiency, and unscheduled shutdowns, which result in financial losses. Various strategies...
Article
Full-text available
Mental illnesses are becoming one of the most common health concerns among the population. Despite the proven efficacy of psychological treatments, mental illnesses are largely underdiagnosed, particularly in developing countries. A key factor contributing to this is the scarcity of mental health providers capable of diagnosing. In this work, we pr...
Article
Full-text available
Alzheimer’s disease is the most common form of dementia and the early detection is essential to prevent its proliferation. Real data available has been of paramount importance in order to achieve progress in the automatic detection despite presenting two major challenges: Multi-source observations containing Magnetic resonance (MRI), Positron emiss...
Article
Full-text available
When deep learning models are used to predict the probability distribution of future values, a task called probabilistic forecasting, they need to handle the epistemic and the aleatoric uncertainties. For the former, some are based on Monte Carlo dropout and have focused on prediction interval estimation assuming a normal distribution for the aleat...
Article
Full-text available
Multiple sclerosis (MS) segmentation is a crucial task that helps to monitor the progression of that condition and to investigate how efficient is the treatment provided to a patient. Convolutional Neural Networks (CNN) have been successfully employed in MS lesion segmentation in recent years, but still have problems in segmenting voxels in the bou...
Article
Full-text available
Assigning predefined classes to natural language texts, based on their content, is a necessary component in many tasks in organizations. This task is carried out by classifying documents within a set of predefined categories using models and computational methods. Text representation for classification purposes has traditionally been performed usin...
Chapter
Medical images segmentation has become a fundamental tool for making more precise the assessment of complex diagnosis and surgical tasks. In particular, this work focuses on multiple sclerosis (MS) disease in which lesion segmentation is useful for getting an accurate diagnosis and for tracking its progression. In recent years, Convolutional Neural...
Chapter
Global energy transition to renewable sources is among the substantial challenges facing humanity. In this context, the precise estimation of the renewable potential of given areas is valuable to decision-makers. This is particularly difficult for the urban case. In Chile, valuable data for solving this problem is available, however, standard machi...
Chapter
The automatic detection of hate speech is a blooming field in the natural language processing community. In recent years there have been efforts in detecting hate speech in multiple languages, using models trained on multiple languages at the same time. Furthermore, there is special interest in the capabilities of language agnostic features to repr...
Article
In recent years, deep learning models have been developed to address probabilistic forecasting tasks, assuming an implicit stochastic process that relates past observed values to uncertain future values. These models are capable of capturing the inherent uncertainty of the underlying process, but they ignore the model uncertainty that comes from th...
Chapter
Integrating wind power to the electrical grid is complicated due to the stochastic nature of the wind, which makes its prediction a challenging task. Then, it is important to devise forecasting tools to support this task. For example, a network that integrates an Echo State Network architecture and Long Short-Term Memory blocks as hidden units (ESN...
Chapter
Multiple sclerosis lesions segmentation is an important step in the diagnosis and tracking in the evolution of the disease. Convolutional Neural Networks (CNN) have been obtaining successful results in the task of lesion segmentation in recent years, but still present problem segmenting boundaries of the lesions. In this work we focus the learning...
Article
Full-text available
Resumen La asignación de una o más categorías predefinidas a los textos en lenguaje natural, basados en su contenido, es un componente importante y necesario en muchas tareas al interior de las organizaciones. Esta tarea se realiza comúnmente a través de la clasificación automática de textos, esto es, clasificando documentos dentro de un conjunto d...
Chapter
Using artificial neural networks for forecasting tasks is a popular approach that has proven to be very accurate. When used to estimate prediction intervals, a normal distribution is usually assumed as the data noise uncertainty term, as in MVE networks, while model parameters uncertainty is often ignored. Because of this, prediction intervals esti...
Article
Crucial to wind energy penetration in electrical systems is the precise forecasting of wind speed, which turns into accurate future wind power estimates. Current trends in wind speed forecasting involve using Recurrent Neural Networks to model complex temporal dynamics in the time-series. These networks, however, have problems learning long tempora...
Article
Functional Magnetic Resonance Imaging (fMRI) is a key neuroimaging technique. The classic fMRI analysis pipeline is based on the assumption that the haemodynamic response (HR) is the same across brain regions, time, and subjects. Although convenient, there is ample evidence that this assumption does not hold, and that these differences result in in...
Article
Full-text available
Ensemble learning is an active field of research with applications to a broad range of problems. Adaboost is a widely used ensemble approach, however, its computational burden is high because it uses an explicit diversity method for building the individual learners. To address this issue, we present a variant of Adaboost where the learners can be t...
Article
Full-text available
The rotary inverted pendulum (RIP) is an underactuated mechanical system that can be balanced in the upward position, by applying the linear control theory based on a nonlinear dynamical model linearized around that position. Its structure makes it possible to implement controllers designed for energy minimization in the non-linearizable region. Du...
Conference Paper
Robotic soccer provides an adversarial scenario where collaborative agents have to execute actions by following a hand-coded or a learned strategy, which in the case of the Small Size League, is given by a centralized decision maker. This work takes advantage of this centralized approach for modelling the keepaway strategy learning problem which is...
Chapter
Wind speed forecasting is crucial for the penetration of wind energy sources in electrical systems, since accurate wind speed forecasts directly translates into accurate wind power predictions. A framework called Multi-scale RNNs specifically addresses the issue of learning long term dependencies in RNNs. Following that approach, we devised a LSTM-...
Chapter
Convolutional Neural Networks (CNN) have been obtaining successful results in the task of image segmentation in recent years. These methods use as input the sampling obtained using square uniform patches centered on each voxel of the image, which could not be the optimal approach since there is a very limited use of global context. In this work we...
Chapter
Full-text available
Finding real-world applications whose records contain missing values is not uncommon. As many data analysis algorithms are not designed to work with missing data, a frequent approach is to remove all variables associated with such records from the analysis. A much better alternative is to employ data imputation techniques to estimate the missing va...
Chapter
In machine learning classification problems, it is common to assume train and test sets follow a similar underlying distribution. When this is not true, this can be seen as a transfer learning problem. Sometimes, there is a set of already trained source classification models available. This work focuses on how to best use these models as an ensembl...
Chapter
Wind power is the Non-Conventional Renewable Energy that has become more relevant in recent years. Given the stochastic behavior of wind speed it is necessary to have efficient prediction models at different horizons. Several kind of models have been used to forecast wind power, but using the same kind of model to forecast at different horizons is...
Preprint
During the last years, there has been a lot of interest in achieving some kind of complex reasoning using deep neural networks. To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms. These architectures, however, lack of more complex reasoning mechanisms that could allow, for instance, rela...
Article
Full-text available
Wind power generation has presented an important development around the world. However, its integration into electrical systems presents numerous challenges due to the variable nature of the wind. Therefore, to maintain an economical and reliable electricity supply, it is necessary to accurately predict wind generation. The Wind Power Prediction To...
Chapter
Improvement of time series forecasting accuracy is an active research area that has significant importance in many practical domains. Ensemble methods have gained considerable attention from machine learning and soft computing communities in recent years. There are several practical and theoretical reasons, mainly statistical reasons, why an ensemb...
Chapter
The amount of information available nowadays is almost incalculable, presenting new opportunities to gain insight from this data. In this chapter we present some of the work done in field of Distributed Machine Learning and discuss a problem not often mentioned in the literature. The problem is related when the distributed information comes from di...
Conference Paper
Due to its variability, the development of wind power entails several difficulties, including wind speed forecasting. The Long Short-Term Memory (LSTM) is a particular type of recurrent network that can be used to work with sequential data, and previous works showed good empirical results. However, its training algorithm is expensive in terms of co...
Conference Paper
In this work we propose a subsampled version of the Concurrent AdaBoost algorithm in order to deal with large datasets in an efficient way. The proposal is based on a concurrent computing approach focused on improving the distribution weight estimation in the algorithm, hence obtaining better capacity of generalization. On each round, we train in p...
Book
The book is an authoritative collection of contributions by leading experts on the topics of fuzzy logic, multi-valued logic and neural network. Originally written as an homage to Claudio Moraga, seen by his colleagues as an example of concentration, discipline and passion for science, the book also represents a timely reference guide for advance s...
Article
Clustering is one of the most important techniques for the design of intelligent systems, and it has been incorporated into a large number of real applications. However, classical clustering algorithms cannot process high-dimensional data, such as text, in a reasonable amount of time. To address this problem, we use techniques based on locality-sen...
Article
Full-text available
State feedback controllers are appealing due to their structural simplicity. Nevertheless, when stabilizing a given plant, dynamics of this type of controllers could lead the static feedback gain to take higher values than desired. On the other hand, a dynamic state feedback controller is capable of achieving the same or even better performance by...
Article
Full-text available
This paper presents a new adaptive learning algorithm to automatically design a neural fuzzy model. This constructive learning algorithm attempts to identify the structure of the model based on an architectural self-organization mechanism with a data-driven approach. The proposed training algorithm self-organizes the model with intuitive adding, me...
Article
This paper addresses the problem of content-based image retrieval in a large-scale setting. Recently several graph-based image retrieval systems have been proposed to fuse different representations, with excellent results. However, most of them use one very precise representation, which does not scale as well as global dense representations with an...
Chapter
AdaBoost is one of the most known Ensemble approaches used in the Machine Learning literature. Several AdaBoost approaches that use Parallel processing, in order to speed up the computation in Large datasets, have been recently proposed. These approaches try to approximate the classic AdaBoost, thus sacrificing its generalization ability. In this w...
Article
In this paper we present a distributed regression framework to model distributed data with different contexts. Different context is defined as the change of the underlying laws of probability in the distributed sources. Most state of the art methods do not take into account the different context and assume that the data comes from the same statisti...
Conference Paper
A common assumption in the field of machine learning is that the data used for training and the target data in which the model is applied share the same distribution. While this may hold in many applications, in many other cases the assumption does not hold. We may want to do classification in certain domain in which we do not have enough labeled d...
Conference Paper
In recent years, wind power has prompted as a renewable energy source. However, integrating wind power into the electric grid is a major challenge due to the wind speed variations. Then, wind speed forecasting is an alternative for the pre-dispatch of power system. This paper proposes the forecast wind speed from 1 to 24 hours ahead using a multiva...
Conference Paper
In this work we present the effects of centralizing distributed data sources in order to perform automatic data analysis, without taking into account the different underlying laws of probability that these data sources could have. We compare a centralized approach and two distributed approaches for the distributed regression task. The experiments a...
Conference Paper
Full-text available
In real-world applications it is common to find data sets whose records contain missing values. As many data analysis algorithms are not designed to work with missing data, all variables associated with such records are generally removed from the analysis. A better alternative is to employ data imputation techniques to estimate the missing values u...
Conference Paper
The modelling and forecasting of volatility in Time Series has been receiving great attention from researchers over the past years. In this topic, GARCH models are one of the most popular models. In this work, the effects of choosing different distribution families for the innovation process on asymmetric GARCH models are investigated. In particula...
Conference Paper
This paper addresses the problem of content-based image retrieval in a large-scale setting. Recently several graph-based image retrieval systems to fuse different representations have been proposed with excellent results, however most of them use at least one representation based on local descriptors that does not scale very well with the number th...
Conference Paper
In this paper a new algorithm to perform edge detection based on a bootstrap approach is presented. This approach uses the estimated spatial conditional distribution of the pixels conditioned by their neighbors. The proposed algorithm approximates the original image by adjusting local 2D autoregressive models to different blocks of the image. The r...
Chapter
Forecasting is one of the main goals in time series analysis and it has had a great development in the last decades. In forecasting, the prediction intervals provide additional assessment of the uncertainty compared with a point forecast, which can better guide risk management decisions. The construction of prediction intervals requires fitting a m...
Article
When distributed data sources have different contexts the problem of Distributed Regression becomes severe. It is the underlying law of probability that constitutes the context of a source. A new Distributed Regression System is presented, which makes use of a discrete representation of the probability density functions (pdfs). Neighborhoods of sim...
Article
This work proposes a method to find the set of the most influential lags and the rule structure of a Takagi–Sugeno–Kang (TSK) fuzzy model for time series applications. The proposed method resembles the techniques that prioritize lags, evaluating the proximity of nearby samples in the input space using the closeness of the corresponding target value...
Article
Full-text available
This paper studies the performance of a Genetic Algorithm (GA) to find solutions to problems of robust design in multiobjective systems with many control and noise factors, representing the output vector in a single aggregation function. The results show that the GA is able to find solutions that achieve a good adjustment of the responses to their...
Article
This paper addresses the problem of content-based image retrieval in a large-scale setting. Most works in the area sample image patches using an affine invariant detector or in a dense fashion, but we show that both sampling methods are complementary. By using Fisher Vectors we show how several sampling methods can be combined in a simple fashion i...
Conference Paper
In recent years several models for financial high-frequency data have been proposed. One of the most known models for this type of applications is the ACM-ACD model. This model focuses on modelling the underlying joint distribution of both duration and price changes between consecutive transactions. However this model imposes distributional assumpt...
Article
Time series prediction is of primary importance in a variety of applications from several science fields, like engineering, finance, earth sciences, etc. Time series prediction can be divided in to two main tasks, point and interval estimation. Estimating prediction intervals, is in some cases more important than point estimation mainly because it...
Article
Full-text available
We present a Pareto Genetic Algorithm (PGA), which finds the Pareto frontier of solutions to problems of robust design in multiobjective systems. The PGA was designed to be applied using Taguchi's Parameter Design method, which is the most frequently used approach by practitioners to executing robust design studies. We tested the PGA using data obt...
Article
The success of current antiviral treatment for hepatitis C virus (HCV) recurrence in liver transplant (LT) recipients remains limited. We aimed at evaluating the value of IL28B genotype and early viral kinetics to predict response to standard treatment in the transplant setting. We retrospectively evaluated 104 LT recipients treated for HCV genotyp...
Conference Paper
In this work we present a Distributed Regression approach, which works in problems where distributed data sources may have different contexts. Different context is defined as the change of the underlying law of probability in the distributed sources. We present an approach which uses a discrete representation of the probability density functions (p...
Conference Paper
Recently, the sieve bootstrap method has been successfully used in prediction of nonlinear time series. In this work we study the performance of the prediction intervals based on the sieve bootstrap technique, which does not require the distributional assumption of normality as most techniques that are found in the literature. The construction of p...
Conference Paper
In this paper we apply a distributed learning approach to improve the perfomance of wind speed forecast. We use data obtained from 54 different weather stations in the U. S. and without sharing data between sites, we share model information between them, to improve the performance over local models trained with only local data. We show that sharing...
Conference Paper
Content-based image retrieval is an important area of research in Multimedia, since it is linked to numerous image applications. Few works in the field have used differently sampled descriptors jointly to improve accuracy, but most of the time the improvement is attributed to other factors. In this paper, firstly we show that a couple descriptor se...
Article
Automatic anomaly detection has become a key issue in many engineering applications due to the increasing amount of data in need of analysis. Addressing this kind of task using pattern recognition methods requires a proper design of the learning strategy, given the reduced amount of flawed cases available for training compared to that of normal ins...
Article
Full-text available
Recently, there has been a renewed interest in the machine learning community for variants of a sparse greedy approximation procedure for concave optimization known as {the Frank-Wolfe (FW) method}. In particular, this procedure has been successfully applied to train large-scale instances of non-linear Support Vector Machines (SVMs). Specializing F...
Conference Paper
In this paper we present a distributed regression framework to model data with di�erent contexts. Di�erent context is de�ned as the change of the under- lying laws of probability in the distributed sources. State of the art methods do not take into account the di�erent context and assume that the data comes from the same statistical distribution. W...
Conference Paper
Full-text available
Twitter has become the most widely used microblogging service nowadays, where people tells and spread, with short messages, what are they feeling or what it is happening at that moment. For this reason, having an insight of the behavior of the messages stream inside the social network could be of great help to support difficult challenges such as e...
Conference Paper
In this paper we present the construction of prediction intervals for time series based on the sieve bootstrap technique, which does not require the distributional assumption of normality that most parametric techniques impose. The construction of prediction intervals in the presence of innovation outliers does not have distributional robustness, l...
Conference Paper
As machine learning acquires special attention for real-world problem solving, a growing number of new problems not previously considered have appeared. One of such problems is the imbalance in class distributions, which is said to hinder the performance of traditional error-minimization-based classification algorithms. In this paper we propose an...
Conference Paper
In real world pattern recognition problems, such as computer-assisted medical diagnosis, events of a given phenomena are usually found in minority, making it necessary to build algorithms that emphasize the effect of one of the classes at training time. In this paper we propose a variation of the well-known Adaboost algorithm that is able to improv...
Article
Ensemble methods learn models from examples by generating a set of hypotheses, which are then combined to make a single decision. We propose an algorithm to construct an ensemble for regression estimation. Our proposal generates the hypotheses sequentially using a simple procedure whereby the target map to be learned by the base learner at each ste...
Article
Full-text available
In this work, a Takagi-Sugeno-Kang (TSK) model is used for time series analysis and some important questions about the identification of this kind of models are addressed: the identification of the model structure and the set of the most influential regressors or lags. The main idea behind of the proposed method resembles to those techniques that p...
Article
Ensemble learning has gained considerable attention in different learning tasks including regression, classification, and clustering problems. One of the drawbacks of the ensemble is the high computational cost of training stages. Resampling local negative correlation (RLNC) is a technique that combines two well-known methods to generate ensemble d...
Conference Paper
Full-text available
This paper illustrates how a combination of information re-trieval, machine learning, and NLP corpus annotation techniques was applied to a problem of text content reliability estimation in Web docu-ments. Our proposal for text content reliability estimation is based on a model in which reliability is a similarity measure between the content of the...
Conference Paper
Full-text available
Twitter is one of the most important social network, where extracting useful information is of paramount importance to many application areas. Many works to date have tried to mine this information by taking the network structure, language itself or even by searching for a pattern in the words employed by the users. Anyway, a simple idea that might...
Article
In a previous article, we presented a genetic algorithm (GA), which finds solutions to problems of robust design in multivariate systems. Based on that GA, we developed a new GA that uses a new desirability function, based on the aggregation of the observed variance of the responses and the squared deviation between the mean of each response and it...
Conference Paper
We present a model based on ensemble of base classifiers, that are combined using weighted majority voting, for the task of incremental classification. Definition of such voting weights becomes even more critical in non-stationary environments where the patterns underlying the observations change over time. Given an instance to classify, we propose...
Conference Paper
Full-text available
Valuable information of the iris is intrinsically located in its natural texture, therefore preserve and extract the most relevant features for biometric recognition is of paramount importance. The iris pattern is subject to translation, scaling and rotation, consequently the variations produced by these artifacts must be minimized. The main contri...
Article
Artificial neural networks techniques have been successfully applied in vector quantization (VQ) encoding. The objective of VQ is to statistically preserve the topological relationships existing in a data set and to project the data to a lattice of lower dimensions, for visualization, compression, storage, or transmission purposes. However, one of...
Conference Paper
Full-text available
Automatic text classification is the task of assigning unseen documents to a predefined set of classes. Text representation for classification purposes has been traditionally approached using a vector space model due to its simplicity and good performance. On the other hand, multi-label automatic text classification has been typically addressed eit...
Conference Paper
It has been recently shown that the quadratic programming formulation underlying a number of kernel methods can be treated as a minimal enclosing ball (MEB) problem in a feature space where data has been previously embedded. Core Vector Machines (CVMs) in particular, make use of this equivalence in order to compute Support Vector Machines (SVMs) fr...
Article
This article presents an improved genetic algorithm (GA), which finds solutions to problems of robust design in multivariate systems with many control and noise factors. Since some values of responses of the system might not have been obtained from the robust design experiment, but may be needed in the search process, the GA uses response surface m...
Article
Ensemble learning has gained considerable attention in different tasks including regression, classification and clustering. Adaboost and Bagging are two popular approaches used to train these models. The former provides accurate estimations in regression settings but is computationally expensive because of its inherently sequential structure, while...
Conference Paper
Magnetic Resonance Image segmentation is a fundamental task in a wide variety of computed-based medical applications that support therapy, diagnostic and medical applications. In this work, spatial information is included for estimating paramaters of a finite mixture model, with gaussian distribution assumption, using a modified version of the well...
Conference Paper
The All-Distances SVM is a single-objective light extension of the binary μ-SVM for multi-category classification that is competitive against multi-objective SVMs, such as One-against-the-Rest SVMs and One-against-One SVMs. Although the model takes into account considerably less constraints than previous formulations, it lacks of an efficient train...
Conference Paper
Arrhythmia diagnosis is commonly conducted through visual analysis of human electrocardiograms, a very resource consuming task for physicians. In this paper we present a computational approach for arrhythmia detection based on heart rate variability signal analysis and the application of a neuro-fuzzy classification model called SONFIS. The aforeme...