## About

747

Publications

175,402

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

43,206

Citations

## Publications

Publications (747)

Training of neural networks amounts to nonconvex optimization problems that are typically solved by using backpropagation and (variants of) stochastic gradient descent. In this work we propose an alternative approach by viewing the training task as a nonlinear optimal control problem. Under this lens, backpropagation amounts to the sequential appro...

A number of decarbonization scenarios for the energy sector are built on simultaneous electrification of energy demand, and decarbonization of electricity generation through renewable energy sources. However, increased electricity demand due to heat and transport electrification and the variability associated with renewables have the potential to d...

We propose three techniques for improving accuracy and
speed of margin stochastic gradient descent support vector machines
(MSGDSVM). The first technique is to use sampling with full replace-
ment. The second technique is to use the new update rule derived from
the squared hinge loss function. The third technique is to limit the num-
ber of values...

Training of neural networks amounts to nonconvex optimization problems that are typically solved by using backpropagation and (variants of) stochastic gradient descent. In this work we propose an alternative approach by viewing the training task as a nonlinear optimal control problem. Under this lens, backpropagation amounts to the sequential appro...

We propose a fast training procedure for the Support Vector Machines (SVM) algorithm which returns a decision boundary with the same coefficients for any data set, that differs only in the number of support vectors and kernel function values. The modification is based on the recently proposed SVM without a regularization term based on Stochastic Gr...

In this paper, we propose a fast surrogate leverage weighted sampling strategy to generate refined random Fourier features for kernel approximation. Compared to the current state-of-the-art method that uses the leverage weighted scheme [Li-ICML2019], our new strategy is simpler and more effective. It uses kernel alignment to guide the sampling proc...

Designers rely on performance predictions to direct the design toward appropriate requirements. Machine learning (ML) models exhibit the potential for rapid and accurate predictions. Developing conventional ML models that can be generalized well in unseen design cases requires an effective feature engineering and selection. Identifying generalizabl...

Random Fourier features (RFFs) have been successfully employed to kernel approximation in large-scale situations. The rationale behind RFF relies on Bochner's theorem, but the condition is too strict and excludes many widely used kernels, e.g., dot-product kernels (violates the shift-invariant condition) and indefinite kernels [violates the positiv...

We solve the problem of classification on graphs by generating a similarity matrix from a graph with virtual edges created using predefined rules. The rules are defined based on axioms for similarity spaces. Virtual edges are generated by solving the problem of computing paths with maximal fixed length. We perform experiments by using the similarit...

Hot water systems represent a substantial energy draw for most residential buildings. For design and operational optimization, they are usually either modelled by domain experts or through black-box models which makes use of sensor data. However, given the wide variability in hot water systems, it is impractical for a domain expert to individually...

This paper introduces a novel framework for generative models based on Restricted Kernel Machines (RKMs) with joint multi-view generation and uncorrelated feature learning, called Gen-RKM. To enable joint multi-view generation, this mechanism uses a shared representation of data from various views. Furthermore, the model has a primal and dual formu...

Kernel regression models have been used as non-parametric methods for fitting experimental data. However, due to their non-parametric nature, they belong to the so-called “black box” models, indicating that the relation between the input variables and the output, depending on the kernel selection, is unknown. In this paper we propose a new methodol...

Selecting diverse and important items from a large set is a problem of interest in machine learning. As a specific example, in order to deal with large training sets, kernel methods often rely on low rank matrix approximations based on the selection or sampling of Nystr\"om centers. In this context, we propose a deterministic and a randomized adapt...

In this paper, the efficient hinging hyperplanes (EHH) neural network is proposed based on the model of hinging hyperplanes (HH). The EHH neural network is a distributed representation, the training of which involves solving several convex optimization problems and is fast. It is proved that for every EHH neural network, there is an equivalent adap...

We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). "Best-scored" means to select one regression tree with the best empirical performance out of a certain number of purely random regression tree candidates, and "two-stage" means to divide the original random tree splitting p...

Increasing energy efficiency of thermostatically controlled loads has the potential to substantially reduce domestic energy demand. However, optimizing the efficiency of thermostatically controlled loads requires either an existing model or detailed data from sensors to learn it online. Often, neither is practical because of real-world constraints....

A graph-based classification method is proposed both for semi-supervised learning in the case of Euclidean data and for classification in the case of graph data. Our manifold learning technique is based on a convex optimization problem involving a convex regularization term and a concave loss function with a trade-off parameter carefully chosen so...

Long Short-Term Memory (LSTM) is a well-known method used widely on sequence learning and time series prediction. In this paper we deployed stacked LSTM model in an application of weather forecasting. We propose a 2-layer spatio-temporal stacked LSTM model which consists of independent LSTM models per location in the first LSTM layer. Subsequently,...

In multi-view regression the information from multiple representations of the input data is combined to improve the prediction. Inspired by the success of deep learning, this paper proposes a novel model called Weighted Multi-view Deep Neural Networks (MV-DNN) regression. The objective function used is a weighted version of the primal formulation o...

In many real-life applications data can be described through multiple representations, or views. Multi-view learning aims at combining the information from all views, in order to obtain a better performance. Most well-known multi-view methods optimize some form of correlation between two views, while in many applications there are three or more vie...

Hyper-kernels endowed by hyper-Reproducing Kernel Hilbert Space (hyper-RKHS) formulate the kernel learning task as learning on the space of kernels itself, which provides significant model flexibility for kernel learning with outstanding performance in real-world applications. However, the convergence behavior of these learning algorithms in hyper-...

Manual labeling of sufficient training data for diverse application domains is a costly, laborious task and often prohibitive. Therefore, designing models that can leverage rich labeled data in one domain and be applicable to a different but related domain is highly desirable. In particular, domain adaptation or transfer learning algorithms seek to...

In kernel methods, the kernels are often required to be positive definitethat restricts the use of many indefinite kernels. To consider those nonpositive definite kernels, in this paper, we aim to build an indefinite kernel learning framework for kernel logistic regression (KLR). The proposed indefinite KLR (IKLR) model is analyzed in the reproduci...

We propose a new methodology for identifying Wiener systems using the data acquired from two separate experiments. In the first experiment, we feed the system with a sinusoid at a prescribed frequency and use the steady state response of the system to estimate the static nonlinearity. In the second experiment, the estimated nonlinearity is used to...

The one-bit quantization is implemented by one single comparator that operates at low power and a high rate. Hence one-bit compressive sensing (1bit-CS) becomes attractive in signal processing. When measurements are corrupted by noise during signal acquisition and transmission, 1bit-CS is usually modeled as minimizing a loss function with a sparsit...

Increasing sustainability requirements make evaluating different design options for identifying energy-efficient design ever more important. These requirements demand simulation models that are not only accurate but also fast. Machine Learning (ML) enables effective mimicry of Building Performance Simulation (BPS) while generating results much fast...

The use of indefinite kernels has attracted many research interests in recent years due to their flexibility. They do not possess the usual restrictions of being positive definite as in the traditional study of kernel methods. This paper introduces the indefinite unsupervised and semi-supervised learning in the framework of least squares support ve...

Entropy measures have been a major interest of researchers to measure the information content of a dynamical system. One of the well-known methodologies is sample entropy, which is a model-free approach and can be deployed to measure the information transfer in time series. Sample entropy is based on the conditional entropy where a major concern is...

Increasing energy efficiency in buildings can reduce costs and emissions substantially. Historically, this has been treated as a local, or single-agent, optimization problem. However, many buildings utilize the same types of thermal equipment e.g. electric heaters and hot water vessels. During operation, occupants in these buildings interact with t...

This paper studies the matrix completion problems when the entries are contaminated by non-Gaussian noise or outliers. The proposed approach employs a nonconvex loss function induced by the maximum correntropy criterion. With the help of this loss function, we develop a rank constrained, as well as a nuclear norm regularized model, which is resista...

This paper introduces a novel hybrid deep neural kernel framework. The proposed deep learning model makes a combination of a neural networks based architecture and a kernel based model. In particular, here an explicit feature map, based on random Fourier features, is used to make the transition between the two architectures more straightforward as...

The design of sparse quadratures for the approximation of integral operators related to symmetric positive-semidefinite kernels is addressed. Particular emphasis is placed on the approximation of the main eigenpairs of an initial operator and on the assessment of the approximation accuracy. Special attention is drawn to the design of sparse quadrat...

In this paper we introduce a new method for Wiener system identification that relies on the data collected on two separate experiments. In the first experiment, the system is excited with a sine signal at fixed frequency and phase shift. Using the steady state response of the system, we estimate the static nonlinearity, which is assumed to be a pol...

In multi-view learning, data is described using different representations, or views. Multi-view classification methods try to exploit information from all views to improve the classification performance. Here a new model is proposed that performs classification when two or more views are available. The model is called Multi-View Least Squares Suppo...

In multi-view clustering, datasets are comprised of different representations of the data, or views. Although each view could individually be used, exploiting information from all views together could improve the cluster quality. In this paper a new model Multi-View Kernel Spectral Clustering (MVKSC) is proposed that performs clustering when two or...

A new methodology for identifying Multiple Input Multiple Output (MIMO) Hammerstein Systems is presented in this paper. The method consists of two stages. In the first stage, a Least Squares Support Vector Machine (LS-SVM) is used to model the nonlinear block of the Hammerstein System from its steady-state response. In the second stage, the interme...

In machine learning or statistics, it is often desirable to reduce the dimensionality of high dimensional data. We propose to obtain the low dimensional embedding coordinates as the eigenvectors of a positive semi-definite kernel matrix. This kernel matrix is the solution of a semi-definite program promoting a low rank solution and defined with the...

In this paper a new methodology for identifying Multiple Inputs Multiple Outputs (MIMO) Hammerstein systems is presented. The proposed method aims at incorporating the impulse response of the system into a Least Squares Support Vector Machine (LS-SVM) formulation and therefore the regularization capabilities of LS-SVM are applied to the system as a...

This paper introduces a novel hybrid deep neural kernel framework. The proposed deep learning model follows a combination of neural networks based architecture and a kernel based model. In particular, here an explicit feature map, based on random Fourier features, is used to make the transition between the two architectures more straightforward as...

Matlab demo for: ”Multi-Label Semi-Supervised Kernel Spectral Clustering”.

Building energy predictions are playing an important role in steering the design towards the required sustainability regulations. Time-consuming nature of detailed Building Energy Modelling (BEM) has introduced simplified BEM and metamodels within the design process. The paper further elaborates the limitations of this method and proposes a compone...

The development and implementation of better control strategies to improve the overall performance of a plant is often hampered by the lack of available measurements of key quality variables. One way to resolve this problem is to develop a soft sensor that is capable of providing process information as often as necessary for control. One potential...

Domain adaptation learning is one of the fundamental research topics in pattern recognition and machine learning. This paper introduces a regularized semipaired kernel canonical correlation analysis formulation for learning a latent space for the domain adaptation problem. The optimization problem is formulated in the primal-dual least squares supp...

In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of $\ell^p$ regularized learning methods. Our main contribution is proposing a fast dual algorithm, and showing that it allows to solve the problem efficiently. Our results contrast recent findings suggesting kernel methods ca...

Hammerstein systems are composed by a static nonlinearity followed by a linear dynamic system. The proposed method for identifying Hammerstein systems consists of a formulation within the Least Squares Support Vector Machines (LS-SVM) framework where the Impulse Response of the system is incorporated as a constraint. A fundamental aspect of this wo...

Artificial Neural Networks (ANN) are a universal approximator for any non-linear function. However, ANN approximation strongly depends on the architecture, i.e. the structure of the neurons and the training methods. This paper evaluates ANN architectures to model components that represent a building for its energy prediction. ANN architectures eval...

Hammerstein systems are composed by a static nonlinearity followed by a linear dynamic system. The proposed method for identifying Hammerstein systems consists of a formulation within the Least Squares Support Vector Machines (LS-SVM) framework where the Impulse Response of the system is incorporated as a constraint. A fundamental aspect of this wo...

This work proposes a new algorithm for training a re-weighted L2 Support Vector Machine (SVM), inspired on the re-weighted Lasso algorithm of Cand\`es et al. and on the equivalence between Lasso and SVM shown recently by Jaggi. In particular, the margin required for each training vector is set independently, defining a new weighted SVM model. These...

Structural health monitoring refers to the process of measuring damage-sensitive variables to assess the functionality of a structure. In principle, vibration data can capture the dynamics of the structure and reveal possible failures, but environmental and operational variability can mask this information. Thus, an effective outlier detection algo...

The aim of this letter is to propose a theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines. From the viewpoint of deep learning, it is partially related to restricted Boltzmann machines, which are characterized by visible and hidden units in a bipartite graph without hidden-to-hidden connections...

Hammerstein systems are composed by the cascading of a static nonlinearity and a linear system. In this paper, a methodology for identifying such systems using a combination of least squares support vector machines (LS-SVM) and best linear approximation (BLA) techniques is proposed. To do this, a novel method for estimating the intermediate variabl...

Spectral clustering suffers from a scalability problem in both memory usage and computational time when the number of data instances N is large. To solve this issue, we present a fast spectral clustering algorithm able to effectively handle millions of datapoints at a desktop PC scale. The proposed technique relies on a kernel-based formulation of...

Local learning methods have been investigated by many
researchers. While global learning methods consider the same weight for
all training points in model fitting, local learning methods assume that the
training samples in the test point region are more influential. In this paper,
we propose Moving Least Squares Support Vector Machines (M-LSSVM)
in...

Computational tools in modern data analysis must be scalable to satisfy business and research time constraints. In this regard, two alternatives are possible: (i) adapt available algorithms or design new approaches such that they can run on a distributed computing environment (ii) develop model-based learning techniques that can be trained efficien...

This brief proposes a truncated ℓ
<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub>
distance (TL1) kernel, which results in a classifier that is nonlinear in the global region but is linear in each subregion. With this kernel, the subregion structure can be trained using all the training data and...

This paper studies the nonparametric modal regression problem systematically from a statistical learning view. Originally motivated by pursuing a theoretical understanding of the maximum correntropy criterion based regression (MCCR), our study reveals that MCCR with a tending-to-zero scale parameter is essentially modal regression. We show that non...

Performing predictions using a non-linear support vector machine (SVM) can be too expensive in some large-scale scenarios. In the non-linear case, the complexity of storing and using the classifier is determined by the number of support vectors, which is often a significant fraction of the training data. This is a major limitation in applications w...

This issue marks the first anniversary issue since I was honored to serve as the Editor-in-Chief (EiC) of the IEEE Transactions on Neural Networks and Learning Systems (TNNLS). I am happy to report that we had a very successful year and here are a few highlights that I would like to share with the community. •
The latest impact factor of TNNLS is...

In pattern classification, polynomial classifiers are well-studied methods as they are capable of generating complex decision surfaces. Unfortunately, the use of multivariate polynomials is limited to support vector machine kernels, as polynomials quickly become impractical for high-dimensional problems. In this paper, we effectively overcome the c...

We live in the era of big data with dataset sizes growing steadily over the past decades. In addition, obtaining expert labels for all the instances is time-consuming and in many cases may not even be possible. This necessitates the development of advanced semi-supervised models that can learn from both labeled and unlabeled data points and also sc...

Wiener systems represent a linear time invariant (LTI) system followed by a static nonlinearity. The identification of these systems has been a research problem for a long time as it is not a trivial task. A new methodology for identifying Wiener systems is proposed in this paper. The proposed method is a combination of well known techniques, namel...

In this paper, Kernel PCA is reinterpreted as the solution to a convex optimization problem. Actually, there is a constrained convex problem for each principal component, so that the constraints guarantee that the principal component is indeed a solution, and not a mere saddle point. Although these insights do not imply any algorithmic improvement,...

Problem setting
Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting...

Bivariate plot of the Iris data (training data).
(PDF)

Definition of the different contributions in the approximation of the SVM classifiers.
This appendix summarizes how the terms f(q) and f(p,q) used in the expansion of the SVM model, are calculated for the linear, polynomial and RBF kernel.
(PDF)

Setting of artificial examples.
This appendix illustrates the settings of the artificial examples.
(PDF)

Explanation of how a color based nomogram results from a risk prediction model.
This appendix explains in detail how a risk prediction model that can be represented by means of Eq (3) can be represented by the proposed color based nomogram.
(PDF)

Application of the method to the IRIS data set.
This video illustrates the possibilities of the R package by means of an R application using the IRIS data set.
(MP4)

Although time delay is an important element in both system identification and control performance assessment, its computation remains elusive. This paper proposes the application of a least squares support vector machines driven approach to the problem of determining constant time delay for a chemical process. The approach consists of two steps, wh...

This paper introduces a novel algorithm, called Supervised Aggregated FEature learning or SAFE, which combines both (local) instance level and (global) bag level information in a joint framework to address the multiple instance classification task. In this realm, the collective assumption is used to express the relationship between the instance lab...

Frank-Wolfe (FW) algorithms have been often proposed over the last few years as efficient solvers for a variety of optimization problems arising in the field of Machine Learning. The ability to work with cheap projection-free iterations and the incremental nature of the method make FW a very effective choice for many large-scale problems where comp...

Because of several successful applications, indefinite kernels have attracted many research interests in recent years. This paper addresses indefinite learning in the framework of least squares support vector machines (LS-SVM). Unlike existing indefinite kernel learning methods, which usually involve non-convex problems, the indefinite LS-SVM is st...

Networks represent patterns of interactions between components of complex systems present in nature, science, technology and society. Furthermore, graph theory allows to perform insightful analysis for different kinds of data by representing the instances as nodes of a weighted network, where the weights characterize similarity between the data poi...

Evolutionary spectral clustering (ESC) represents a state-of-the-art algorithm for grouping objects evolving over time. It typically outperforms traditional static clustering by producing clustering results that can adapt to data drifts while being robust to short-term noise. A major drawback of ESC is given by its cubic complexity, e.g. O(N³), and...

Often in real-world applications such as web page categorization, automatic image annotations and protein function prediction, each instance is associated with multiple labels (categories) simultaneously. In addition, due to the labeling cost one usually deals with a large amount of unlabeled data while the fraction of labeled data points will typi...