Conference Paper

Alternating Binary Classifier and Graph Learning from Partial Labels

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Semi-supervised binary classifier learning is a fundamental machine learning task where only partial binary labels are observed, and labels of the remaining data need to be interpolated. Leveraging on the advances of graph signal processing (GSP), recently binary classifier learning is posed as a signal restoration problem regularized using a graph smoothness prior, where the undirected graph consists of a set of vertices and a set of weighted edges connecting vertices with similar features. In this paper, we improve the performance of such a graph-based classifier by simultaneously optimizing the feature weights used in the construction of the similarity graph. Specifically, we start by interpolating missing labels by first formulating a boolean quadratic program with a graph signal smoothness objective, then relax it to a convex semi-definite program, solvable in polynomial time. Next, we optimize the feature weights used for construction of the similarity graph by reusing the smoothness objective but with a convex set constraint for the weight vector. The reposed convex but non-differentiable problem is solved via an iterative proximal gradient descent algorithm. The two steps are solved alternately until convergence. Experimental results show that our alternating classifier / graph learning algorithm outperforms existing graph-based methods and support vector machines with various kernels.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Extending on these previous works [22], [25], [28], in this paper we study spectral graph learning when the number of signal observations is extremely small-just one observation or even fewer (i.e., partial observation of one signal). This is typically the case for image restoration applications with nonstationary statistics [4]- [7], where the underlying graph for a target image patch needs to be estimated for graph spectral processing given just one or partial noisy patch observation. ...
... To ease the ill-posedness of the problem, we assume the availability of a relevant feature vector f i per node i, f i ∈ R K (e.g., the color pixel intensities), and that an edge weight is an inverse function of the feature distance (i.e., larger the inter-node feature distance, smaller the edge weight). Many previous graph constructions including bilateral filter [4], [28], [29] implicitly assume some notion of feature distance when 2 assigning edge weights; our work is a more formal study of feature metric learning in a rigorous mathematical setting. ...
... Egilmez et al. [25] proposed graph learning under predefined graph structural and graph Laplacian constraints. Yang et al. [28] computed optimal feature weights in a similarity graph given a restored binary classifier signal. This is an earlier version of our feature metric learning, but restricts the search space only to diagonal matrices, which limits its effectiveness. ...
Article
Full-text available
Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix L, but require a large number of signal observations z's for a stable estimate. In this work, we assume instead the availability of a relevant feature vector ${f_{i}}$ per node ${i}$ , from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix ${M}$ by minimizing the graph Laplacian regularizer (GLR) ${z^{T}Lz}$ , where edge weight is ${w_{i,j} = exp\{(-fi-fj)^{T}M(fi-fj)\}}$ , given a single observation z. We optimize diagonal entries via proximal gradient (PG), where we constrain M to be positive definite (PD) via linear inequalities derived from the Gershgorin circle theorem. To optimize off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of M. To keep M PD, we constrain the Schur complement of sub-matrix ${\rm{M_{2,2}}}$ of M to be PD when optimizing via PG. Our algorithm mitigates full eigen-decomposition of M, thus ensuring fast computation speed even when feature vector ${f_{i}}$ has high dimension. To validate its usefulness, we apply our feature graph learning algorithm to the problem of 3D point cloud denoising, resulting in state-of-the-art performance compared to competing schemes in extensive experiments.
... While efficient, the assumed restricted search spaces often degrade the quality of sought metric M in defining the Mahalanobis distance. For example, low-rank methods explicitly assume reducibility of the K available features to a lower dimension, and hence exclude the simple yet important weighted feature metric case where M contains only positive diagonal entries [10], i.e., ...
... We require M to be a positive definite (PD) matrix 2 . The special case where M is diagonal with strictly positive entries was studied in [10]. Instead, we study here a more general case: M must be a graph metric matrix, which we define formally as follows. ...
... GLR has been used in the GSP literature to solve a range of inverse problems, including image denoising [23], deblurring [24], dequantization amd contrast enhancement [25], and soft decoding of JPEG [26]. We evaluate our method with the following competing schemes: three metric learning methods that only learn the diagonals of M, i.e., [27], [28], and [10], and two methods that learn the full matrix M, i.e., [6] and [29]. We perform classification tasks using one of the following two classifiers: 1) a k-nearest-neighbour classifier, and 2) a graph-based classifier with quadratic formulation minz z ⊤ L(M)z s.t. ...
Preprint
Full-text available
We propose a fast general projection-free metric learning framework, where the minimization objective $\min_{\M \in \cS} Q(\M)$ is a convex differentiable function of the metric matrix $\M$, and $\M$ resides in the set $\cS$ of generalized graph Laplacian matrices for connected graphs with positive edge weights and node degrees. Unlike low-rank metric matrices common in the literature, $\cS$ includes the important positive-diagonal-only matrices as a special case in the limit. The key idea for fast optimization is to rewrite the positive definite cone constraint in $\cS$ as signal-adaptive linear constraints via Gershgorin disc alignment, so that the alternating optimization of the diagonal and off-diagonal terms in $\M$ can be solved efficiently as linear programs via Frank-Wolfe iterations. We prove that the Gershgorin discs can be aligned perfectly using the first eigenvector $\v$ of $\M$, which we update iteratively using Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) with warm start as diagonal / off-diagonal terms are optimized. Experiments show that our efficiently computed graph metric matrices outperform metrics learned using competing methods in terms of classification tasks.
... An alternative approach is to restore corrupted training labels by representing them as piece-wise smooth signals on graphs and applying a graph signal smoothness prior [18]- [21]. In [22], an image denoising scheme is proposed using graph Laplacian regularization (GLR), given either small-or large-scale datasets. ...
... In this paper, we further extend our conference contribution [24] to a more generalized end-to-end CNN-based approach given noisy binary classifier signal, to perform iteratively GLR (similar to [22]) as a classifier signal restoration operator, update the underlying graph and regularize CNNs. Compared to the previous graph-based classifiers [18], [20], [21], [23], [29], [40], [41], [45], [46], by adopting edge convolution, iteratively updating graph and operating GLR, we learn a deeper feature representation, and assign the degree of freedom for learning the underlying data structure. Given noisy training labels, in contrast to the classical robust DNN-based classifiers [4], [7], [15], [16], [35], [39], we bring together the regularization benefits of GLR and the benefits of the proposed loss functions to perform more robust deep metric learning. ...
... Typically, the edge weight is computed using a Gaussian kernel function with a fixed scaling factor σ, i.e., exp − xi−xj 2 2 2σ 2 , to quantify the node-to-node correlation. Instead of using a fixed σ as in [18], [20], [21], motivated by [47], we introduce an auto-sigma Gaussian kernel function to assign edge weight w r i,j in G r by maximizing the margin between the edge weights assigned to P-edges and Q-edges, as: ...
Preprint
Full-text available
Convolutional neural network (CNN)-based feature learning has become state of the art, since given sufficient training data, CNN can significantly outperform traditional methods for various classification tasks. However, feature learning becomes more difficult if some training labels are noisy. With traditional regularization techniques, CNN often overfits to the noisy training labels, resulting in sub-par classification performance. In this paper, we propose a robust binary classifier, based on CNNs, to learn deep metric functions, which are then used to construct an optimal underlying graph structure used to clean noisy labels via graph Laplacian regularization (GLR). GLR is posed as a convex maximum a posteriori (MAP) problem solved via convex quadratic programming (QP). To penalize samples around the decision boundary, we propose two regularized loss functions for semi-supervised learning. The binary classification experiments on three datasets, varying in number and type of features, demonstrate that given a noisy training dataset, our proposed networks outperform several state-of-the-art classifiers, including label-noise robust support vector machine, CNNs with three different robust loss functions, model-based GLR, and dynamic graph CNN classifiers.
... Extending on these previous works [21], [24], [27], in this paper we study spectral graph learning when the number of signal observations is extremely small-just one observation or even fewer (i.e., partial observation of one signal). This is typically the case for image restoration applications with non-stationary statistics [4]- [7], where the underlying graph for a target image patch needs to be estimated for graph spectral processing given just one noisy and/or partial patch observation. ...
... To ease the ill-posedness of the problem, we assume the availability of a relevant feature vector f i per node i, f i ∈ R K (e.g., the color pixel intensities), and that an edge weight is an inverse function of the feature distance (i.e., larger the inter-node distance, smaller the edge weight). Many previous graph constructions including bilateral filter [4], [27]- [30] implicitly assume some notion of feature distance when assigning edge weights; our work is a more formal study of feature metric learning in a rigorous mathematical setting. ...
... Egilmez et al. [44] proposed graph learning under predefined graph structural and graph Laplacian constraints. Yang et al. [27] computed optimal feature weights in a similarity graph given a restored binary classifier signal. This is an earlier version of our feature metric learning, but restricts the search space only to diagonal matrices, which limits its effectiveness. ...
Preprint
Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix $\mathbf{L}$, but require a large number of signal observations $\mathbf{z}$'s for a stable estimate. In this work, we assume instead the availability of a relevant feature vector $\mathbf{f}_i$ per node $i$, from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix $\mathbf{M}$ by minimizing the graph Laplacian regularizer (GLR) $\mathbf{z}^{\top} \mathbf{L} \mathbf{z}$, where edge weight is $w_{i,j} = \exp\{-(\mathbf{f}_i - \mathbf{f}_j)^{\top} \mathbf{M} (\mathbf{f}_i - \mathbf{f}_j) \}$, given a single observation $\mathbf{z}$. We optimize diagonal entries via proximal gradient (PG), where we constrain $\mathbf{M}$ to be positive definite (PD) via linear inequalities derived from the Gershgorin circle theorem. To optimize off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of $\mathbf{M}$. To keep $\mathbf{M}$ PD, we constrain the Schur complement of sub-matrix $\mathbf{M}_{2,2}$ of $\mathbf{M}$ to be PD when optimizing via PG. Our algorithm mitigates full eigen-decomposition of $\mathbf{M}$, thus ensuring fast computation speed even when feature vector $\mathbf{f}_i$ has high dimension. To validate its usefulness, we apply our feature graph learning algorithm to the problem of 3D point cloud denoising, resulting in state-of-the-art performance compared to competing schemes in extensive experiments.
... The weight a ij reflects the level of correlation between the Nodes i and j (that is, between the features constructed in Event i and j). Following [55], we use the Gaussian kernel function and set: ...
... The weight a ij reflects the level of correlation between the Nodes i and j (that is, between the features constructed in Event i and j). Following [55], we use the Gaussian kernel function and set: ...
Article
Full-text available
Microseismic monitoring has been increasingly used in the past two decades to illuminate (sub)surface processes such as landslides, due to its ability to record small seismic waves generated by soil movement and/or brittle behaviour of rock. Understanding the evolution of landslide processes is of paramount importance in predicting or even avoiding an imminent failure. Microseismic monitoring recordings are often continuous, noisy and consist of signals emitted by various sources. Manually detecting and distinguishing the signals emitted by an unstable slope is challenging. Research on automated end-to-end denoising, detection, and classification of microseismic events, as an early warning system, is still in its infancy. To this effect, our work is focused on jointly evaluating and developing suitable approaches for signal denoising, accurate event detection, non site-specific feature construction, feature selection and event classification. We propose an automated end-to-end system that can process big data sets of continuous seismic recordings fast and demonstrate applicability and robustness to a wide range of events (distant and local earthquakes, slidequakes, anthropogenic noise etc.). Algorithmic contributions lie in novel signal processing and analysis methods with fewer tunable parameters than the state of the art, evaluated on two field datasets and benchmarked against the state of the art.
... While efficient, the assumed restricted search spaces often degrade the quality of sought metric M in defining the Mahalanobis distance. For example, lowrank methods explicitly assume reducibility of the K available features to a lower dimension, and hence exclude the simple yet important weighted feature metric case where M is diagonal [14], i.e., 2 , m k,k > 0, ∀k. In this paper, we propose a fast, general metric learning framework, capable of optimizing any convex differentiable objective Q(M), that entirely circumvents eigen-decomposition-based projection on the PD cone. ...
Preprint
Given a convex and differentiable objective $Q(\M)$ for a real, symmetric matrix $\M$ in the positive definite (PD) cone---used to compute Mahalanobis distances---we propose a fast general metric learning framework that is entirely projection-free. We first assume that $\M$ resides in a restricted space $\cS$ of generalized graph Laplacian matrices (graph metric matrices) corresponding to balanced signed graphs. Unlike low-rank metric matrices common in the literature, $\cS$ includes the important diagonal-only matrices as a special case. The key theorem to circumvent full eigen-decomposition and enable fast metric matrix optimization is Gershgorin disc alignment (GDA): given graph metric matrix $\M \in \cS$ and diagonal matrix $\S$ where $S_{ii} = 1/v_i$ and $\v$ is the first eigenvector of $\M$, we prove that Gershgorin disc left-ends of similar transform $\B = \S \M \S^{-1}$ are perfectly aligned at the smallest eigenvalue $\lambda_{\min}$. Using this theorem, we replace the PD cone constraint in the metric learning problem with tightest possible signal-adaptive linear constraints, so that the alternating optimization of the diagonal / off-diagonal terms in $\M$ can be solved efficiently as linear programs via Frank-Wolfe iterations. We update $\v$ using Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) with warm start as matrix entries in $\M$ are optimized successively. Experiments show that our graph metric optimization is significantly faster than cone-projection methods, and produces competitive binary classification performance.
Chapter
The image classification problem is to categorize elements of an image dataset into two or more pre‐defined classes based on inherent, detectable image features. Classification is typically posed in the context of supervised or semi‐supervised learning (SSL). This chapter discusses how SSL image classification can be mathematically formulated as optimization problems from a graph signal processing perspective. It also discusses the critical issue of graph construction for SSL graph‐based classifiers, assuming fixed and pre‐defined features. The chapter examines the effect of the constructed graph and regularization weight parameter on classification performance, and experimentally demonstrate the potential of different graph classifiers in the presence of noisy labels. It describes how deep learning techniques can be used to automatically learn relevant image features. Graph Fourier transform is one approach used to represent the smoothness and connectivity of an underlying graph.
Article
Given a convex and differentiable objective $Q({\mathbf M})$ for a real symmetric matrix ${\mathbf M}$ in the positive definite (PD) cone—used to compute Mahalanobis distances—we propose a fast general metric learning framework that is entirely projection-free. We first assume that ${\mathbf M}$ resides in a space ${\mathcal S}$ of generalized graph Laplacian matrices corresponding to balanced signed graphs. ${\mathbf M}\in {\mathcal S}$ that is also PD is called a graph metric matrix. Unlike low-rank metric matrices common in the literature, ${\mathcal S}$ includes the important diagonal-only matrices as a special case. The key theorem to circumvent full eigen-decomposition and enable fast metric matrix optimization is Gershgorin disc perfect alignment (GDPA): given ${\mathbf M}\in {\mathcal S}$ and diagonal matrix ${\mathbf S}$ , where $S_{ii} = 1/v_i$ and ${\mathbf v}$ is the first eigenvector of ${\mathbf M}$ , we prove that Gershgorin disc left-ends of similarity transform ${\mathbf B}= {\mathbf S}{\mathbf M}{\mathbf S}^{-1}$ are perfectly aligned at the smallest eigenvalue $\lambda _{\min }$ . Using this theorem, we replace the PD cone constraint in the metric learning problem with tightest possible linear constraints per iteration, so that the alternating optimization of the diagonal / off-diagonal terms in ${\mathbf M}$ can be solved efficiently as linear programs via the Frank-Wolfe method. We update ${\mathbf v}$ using Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) with warm start as entries in ${\mathbf M}$ are optimized successively. Experiments show that our graph metric optimization is significantly faster than cone-projection schemes, and produces competitive binary classification performance.
Article
In this paper, we propose a new semi-supervised graph construction method, which is capable of adaptively learning the similarity relationship between data samples by fully exploiting the potential of pairwise constraints, a kind of weakly supervisory information. Specifically, to adaptively learn the similarity relationship, we linearly approximate each sample with others under the regularization of the low-rankness of the matrix formed by the approximation coefficient vectors of all the samples. In the meanwhile, by taking advantage of the underlying local geometric structure of data samples that is empirically obtained, we enhance the dissimilarity information of the available pairwise constraints via propagation. We seamlessly combine the two adversarial learning processes to achieve mutual guidance. We cast our method as a constrained optimization problem and provide an efficient alternating iterative algorithm to solve it. Experimental results on five commonly-used benchmark datasets demonstrate that our method produces much higher classification accuracy than state-of-the-art methods, while running faster.
Article
Convolutional neural network (CNN)-based feature learning has become the state-of-the-art for many applications since, given sufficient training data, CNN can significantly outperform traditional methods for various classification tasks. However, feature learning is more challenging if training labels are noisy as CNN tends to overfit to the noisy training labels, resulting in sub-par classification performance. In this paper, we propose a robust binary classifier by learning CNN-based deep metric functions, to construct a graph, used to clean the noisy labels via graph Laplacian regularization (GLR). The denoised labels are then used in two proposed loss correction functions to regularize the deep metric functions. As a result, the node-to-node correlations in the graph are better reflected, leading to improved predictive performance. The experiments on three datasets, varying in number and type of features and under different levels of noise, demonstrate that given a noisy training dataset for the semi-supervised classification task, our proposed networks outperform several state-of-the-art classifiers, including label-noise robust support vector machine, CNNs with three different robust loss functions, model-based GLR, and dynamic graph CNN classifiers.
Article
Full-text available
Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation.
Article
Full-text available
In a semi-supervised learning scenario, (possibly noisy) partially observed labels are used as input to train a classifier, in order to assign labels to unclassified samples. In this paper, we study this classifier learning problem from a graph signal processing (GSP) perspective. Specifically, by viewing a binary classifier as a piecewise constant graph-signal in a high-dimensional feature space, we cast classifier learning as a signal restoration problem via a classical maximum a posteriori (MAP) formulation. Unlike previous graph-signal restoration works, we consider in addition edges with negative weights that signify anti-correlation between samples. One unfortunate consequence is that the graph Laplacian matrix $\mathbf{L}$ can be indefinite, and previously proposed graph-signal smoothness prior $\mathbf{x}^T \mathbf{L} \mathbf{x}$ for candidate signal $\mathbf{x}$ can lead to pathological solutions. In response, we derive an optimal perturbation matrix $\boldsymbol{\Delta}$ - based on a fast lower-bound computation of the minimum eigenvalue of $\mathbf{L}$ via a novel application of the Haynsworth inertia additivity formula---so that $\mathbf{L} + \boldsymbol{\Delta}$ is positive semi-definite, resulting in a stable signal prior. Further, instead of forcing a hard binary decision for each sample, we define the notion of generalized smoothness on graph that promotes ambiguity in the classifier signal. Finally, we propose an algorithm based on iterative reweighted least squares (IRLS) that solves the posed MAP problem efficiently. Extensive simulation results show that our proposed algorithm outperforms both SVM variants and graph-based classifiers using positive-edge graphs noticeably.
Conference Paper
Full-text available
We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of interpretability; and (iii) computational speed several orders of magnitude faster than the widely used LMNN and ITML methods. Furthermore, on standard benchmark datasets, our closed-form solution consistently attains higher classification accuracy.
Article
Full-text available
Inverse imaging problems are inherently under-determined, and hence, it is important to employ appropriate image priors for regularization. One recent popular prior—the graph Laplacian regularizer—assumes that the target pixel patch is smooth with respect to an appropriately chosen graph. However, the mechanisms and implications of imposing the graph Laplacian regularizer on the original inverse problem are not well understood. To address this problem, in this paper, we interpret neighborhood graphs of pixel patches as discrete counterparts of Riemannian manifolds and perform analysis in the continuous domain, providing insights into several fundamental aspects of graph Laplacian regularization for image denoising. Specifically, we first show the convergence of the graph Laplacian regularizer to a continuous-domain functional, integrating a norm measured in a locally adaptive metric space. Focusing on image denoising, we derive an optimal metric space assuming non-local self-similarity of pixel patches, leading to an optimal graph Laplacian regularizer for denoising in the discrete domain. We then interpret graph Laplacian regularization as an anisotropic diffusion scheme to explain its behavior during iterations, e.g., its tendency to promote piecewise smooth signals under certain settings. To verify our analysis, an iterative image denoising algorithm is developed. Experimental results show that our algorithm performs competitively with state-of-the-art denoising methods, such as BM3D for natural images, and outperforms them significantly for piecewise smooth images.
Article
Full-text available
While modern displays offer high dynamic range (HDR) with large bit-depth for each rendered pixel, the bulk of legacy image and video contents were captured using cameras with shallower bit-depth. In this paper, we study the bit-depth enhancement problem for images, so that a high bit-depth (HBD) image can be reconstructed from an input low bit-depth (LBD) image. The key idea is to apply appropriate smoothing given the constraints that reconstructed signal must lie within the per-pixel quantization bins. Specifically, we first define smoothness via a signal-dependent graph Laplacian, so that natural image gradients can nonetheless be interpreted as low frequencies. Given defined smoothness prior and observed LBD image, we then demonstrate that computing the most probable signal via maximum a posteriori (MAP) estimation can lead to large expected distortion. However, we argue that MAP can still be used to efficiently estimate the AC component of the desired HBD signal, which along with a distortion-minimizing DC component, can result in a good approximate solution that minimizes the expected distortion. Experimental results show that our proposed method outperforms existing bit-depth enhancement methods in terms of reconstruction error.
Article
Full-text available
We study the problem of selecting the best sampling set for bandlimited reconstruction of signals on graphs. A frequency domain representation for graph signals can be defined using the eigenvectors and eigenvalues of variation operators that take into account the underlying graph connectivity. Smoothly varying signals defined on the nodes are of particular interest in various applications, and tend to be approximately bandlimited in the frequency basis. Sampling theory for graph signals deals with the problem of choosing the best subset of nodes for reconstructing a bandlimited signal from its samples. Most approaches to this problem require a computation of the frequency basis (i.e., the eigenvectors of the variation operator), followed by a search procedure using the basis elements. This can be impractical, in terms of storage and time complexity, for real datasets involving very large graphs. We circumvent this issue in our formulation by introducing quantities called graph spectral proxies, defined using the powers of the variation operator, in order to approximate the spectral content of graph signals. This allows us to formulate a direct sampling set selection approach that does not require the computation and storage of the basis elements. We show that our approach also provides stable reconstruction when the samples are noisy or when the original signal is only approximately bandlimited. Furthermore, the proposed approach is valid for any choice of the variation operator, thereby covering a wide range of graphs and applications. We demonstrate its effectiveness through various numerical experiments.
Article
Full-text available
In applications such as social, energy, transportation, sensor, and neuronal networks, high-dimensional data naturally reside on the vertices of weighted graphs. The emerging field of signal processing on graphs merges algebraic and spectral graph theoretic concepts with computational harmonic analysis to process such signals on graphs. In this tutorial overview, we outline the main challenges of the area, discuss different ways to define graph spectral domains, which are the analogues to the classical frequency domain, and highlight the importance of incorporating the irregular structures of graph data domains when processing signals on graphs. We then review methods to generalize fundamental operations such as filtering, translation, modulation, dilation, and downsampling to the graph setting, and survey the localized, multiscale transforms that have been proposed to efficiently extract information from high-dimensional data on graphs. We conclude with a brief discussion of open issues and possible extensions.
Chapter
Full-text available
We describe graph implementations, a generic method for representing a convex function via its epigraph, described in a disciplined convex programming framework. This simple and natural idea allows a very wide variety of smooth and nonsmooth convex programs to be easily specified and efficiently solved, using interiorpoint methods for smooth or cone convex programs.
Article
Full-text available
In this article, we have provided general, comprehensive coverage of the SDR technique, from its practical deployments and scope of applicability to key theoretical results. We have also showcased several representative applications, namely MIMO detection, B¿ shimming in MRI, and sensor network localization. Another important application, namely downlink transmit beamforming, is described in [1]. Due to space limitations, we are unable to cover many other beautiful applications of the SDR technique, although we have done our best to illustrate the key intuitive ideas that resulted in those applications. We hope that this introductory article will serve as a good starting point for readers who would like to apply the SDR technique to their applications, and to locate specific references either in applications or theory.
Article
Research in graph signal processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper, we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing, along with a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas. We then summarize recent advances in developing basic GSP tools, including methods for sampling, filtering, or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning.
Article
Distance metric learning (DML) has achieved great success in many computer vision tasks. However, most existing DML algorithms are based on point estimation, and thus are sensitive to the choice of training examples and tend to be over-fitting in the presence of label noise. In this paper, we present a robust DML algorithm based on Bayesian inference. In particular, our method is essentially a Bayesian extension to a previous classic DML method—large margin nearest neighbor classification and we use stochastic variational inference to estimate the posterior distribution of the transformation matrix. Furthermore, we theoretically show that the proposed algorithm is robust against label noise in the sense that an arbitrary point with label noise has bounded influence on the learnt model. With some reasonable assumptions, we derive a generalization error bound of this method in the presence of label noise. We also show that the DML hypothesis class in which our model lies is probably approximately correct-learnable and give the sample complexity. The effectiveness of the proposed method <sup>1</sup> is demonstrated with state of the art performance on three popular data sets with different types of label noise. <sup>1</sup> A MATLAB implementation of this method is made available at http://parnec.nuaa.edu.cn/xtan/Publication.htm </fn
Conference Paper
The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.
Conference Paper
Learning of a binary classifier from partial labels is a fundamental and important task in image classification. Leveraging on recent advance in graph signal processing (GSP), a recent work poses classifier learning as a graph-signal restoration problem from partial observations, where the ill-posed problem is regularized using a graph-signal smoothness prior. In this paper, we extend this work by using the same smoothness prior to refine the underlying similarity graph also, so that the same graph-signal projected on the modified graph will be even smoother. Specifically, assuming an edge weight connecting two vertices i and j is computed as the exponential kernel of the weighted sum of feature function differences at the two vertices, we find locally “optimal” feature weights via iterative Newton's method. We show that the conditioning of the Hessian matrix reveals redundancy in the feature functions, which thus can be eliminated for improved computation efficiency. Experimental results show that our joint optimization of the classifier graph-signal and the underlying graph has better classification performance then the previous work and spectral clustering.
Article
Block-based image or video coding standards (e.g. JPEG) compress an image lossily by quantizing transform coefficients of non-overlapping pixel blocks. If the chosen quantization parameters (QP) are large, then hard decoding of a compressed image—using indexed quantization bin centers as reconstructed transform coefficients—can lead to unpleasant blocking artifacts. Leveraging on recent advances in graph signal processing (GSP), we propose a dequantization scheme specifically for piecewise smooth (PWS) images: images with sharp object boundaries and smooth interior surfaces. We first mathematically define a PWS image as a low-frequency signal with respect to an inter-pixel similarity graph with edges of weights 1 or 0. Using quantization bin boundaries as constraints, we then jointly optimize the desired graph-signal and the similarity graph in a unified framework. A generalization to consider generalized piecewise smooth (GPWS) images—where sharp object boundaries are replaced by transition regions—is also proposed. Experimental results show that our proposed scheme outperforms a state-of-the-art dequantization method by 1 dB on average in PSNR.
Article
The metric learning problem is concerned with learning a distance function tuned to a particular task, and has been shown to be useful when used in conjunction with nearest-neighbor methods and other techniques that rely on distances or similarities. This survey presents an overview of existing research in metric learning, including recent progress on scaling to high-dimensional feature spaces and to data sets with an extremely large number of data points. A goal of the survey is to present as unified as possible a framework under which existing research on metric learning can be cast. The first part of the survey focuses on linear metric learning approaches, mainly concentrating on the class of Mahalanobis distance learning methods. We then discuss nonlinear metric learning approaches, focusing on the connections between the nonlinear and linear approaches. Finally, we discuss extensions of metric learning, as well as applications to a variety of problems in computer vision, text analysis, program analysis, and multimedia.
The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.
Chapter
Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.
Article
We show that various inverse problems in signal recovery can be formulated as the generic problem of minimizing the sum of two convex functions with certain regularity properties. This formulation makes it possible to derive existence, uniqueness, characterization, and stability results in a unified and standardized fashion for a large class of apparently disparate problems. Recent results on monotone operator splitting methods are applied to establish the convergence of a forward-backward algorithm to solve the generic problem. In turn, we recover, extend, and provide a simplified analysis for a variety of existing iterative methods. Applications to geometry/texture image decomposition schemes are also discussed. A novelty of our framework is to use extensively the notion of a proximity operator, which was introduced by Moreau in the 1960s.
Book
In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research. Semi-Supervised Learning first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms that perform two-step learning. The book then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of extensive benchmark experiments. Finally, the book looks at interesting directions for SSL research. The book closes with a discussion of the relationship between semi-supervised learning and transduction.
Geometric mean metric learning
  • P H Zadeh
Comparison of righting reactions between elderly with and without stroke using solid angles
  • T Yagi
First-Order Methods in Optimization
  • A Beck