Article

Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To begin with, we propose an EV-accounted noisy LMM for hyperspectral pixels. The presence of outlying pixels is also taken into consideration, as pixels disobeying the LMM are widely observed [9], [35]. Our idea is to formulate the endmember/abundance estimation problem as a marginal maximum likelihood (MML) criterion and design a variational inference (VI) algorithm to handle the MML objective. ...
... , a N , E t is a perturbation term, and A represents the reference endmember matrix. For example, the extended LMM has E t = 0 [21]; the perturbed LMM has C t = I [22]; the model in [23] considers the more general form in (9). In the existing literature, it is common to embed spatial smoothness of the endmember variations in the model [21], [22]. ...
... The spectra of these detected pixels are likely outliers, as they appear at the boundary of two materials. It has been widely reported in the literature that nonlinear mixing effect happens around the shore of the Moffett data; see [9], [43], [53], [54]. ...
Preprint
Full-text available
This work proposes a variational inference (VI) framework for hyperspectral unmixing in the presence of endmember variability (HU-EV). An EV-accounted noisy linear mixture model (LMM) is considered, and the presence of outliers is also incorporated into the model. Following the marginalized maximum likelihood (MML) principle, a VI algorithmic structure is designed for probabilistic inference for HU-EV. Specifically, a patch-wise static endmember assumption is employed to exploit spatial smoothness and to try to overcome the ill-posed nature of the HU-EV problem. The design facilitates lightweight, continuous optimization-based updates under a variety of endmember priors. Some of the priors, such as the Beta prior, were previously used under computationally heavy, sampling-based probabilistic HU-EV methods. The effectiveness of the proposed framework is demonstrated through synthetic, semi-real, and real-data experiments.
... A recent work [12] took the insight of MVES to design a DNN training loss, which was shown to be more effective than the anchor point-based methods. However, MVES is sensitive to outliers due to its geometric nature; see [24], [25], [28], [29]. As a result, when some data deviate from the model assumptions (e.g., when some outlying samples have confusion matrices different from that of the majority of samples), the MVES-based learning criterion may produce poor classification performance. ...
... In this work, we propose to robustify the MVES-based DNN learning criterion. Our approach is inspired by the outlierrobust MVES concept that is often adopted in the hyperspectral unmixing community; see, e.g., [24], [25]. Unlike the hyperspectral imaging works that often use (quasi-)norm based loss functions, the proposed criterion utilizes a robust crossentropy loss [9], which is better suited for data classificationas the data labels are integers rather than continuous values. ...
... A notable challenge of MVES is that such a geometric NMF criterion is sensitive to outliers [24], [25], [28]. Even if there exists a single outlying data point, the MVES criterion may produce largely undesired solutions-as the minimum-volume enclosing simplex can be quite different from the ground-truth one; see an illustration in Fig. 1. ...
... Therefore, the SSMF approach based on simplex volume minimization (SVmin) [14] can be posed as an approximation to the maximum likelihood problem. The same reference also introduces importance sampling and variational approaches to obtain the solution for the PRISM framework. ...
... In this article, we propose a Bayesian formulation for the use of the determinant minimization criterion in structured matrix factorization frameworks such as SSMF [14] and PMF [6]. For this purpose, we assume that the left factor H is a random matrix whose rows are drawn independently from a Gaussian distribution whose correlation matrix is posed as unknown. ...
... The optimization problem in (11) is in the same form as the robust simplex volume minimization problem in [14] for the domain D = ∆ r and for the choice Ψ = τ I. This choice is motivated from an algorithmic advantage point of view and is introduced as a computational heuristic. ...
Preprint
Full-text available
We introduce a Bayesian perspective for the structured matrix factorization problem. The proposed framework provides a probabilistic interpretation for existing geometric methods based on determinant minimization. We model input data vectors as linear transformations of latent vectors drawn from a distribution uniform over a particular domain reflecting structural assumptions, such as the probability simplex in Nonnegative Matrix Factorization and polytopes in Polytopic Matrix Factorization. We represent the rows of the linear transformation matrix as vectors generated independently from a normal distribution whose covariance matrix is inverse Wishart distributed. We show that the corresponding maximum a posteriori estimation problem boils down to the robust determinant minimization approach for structured matrix factorization, providing insights about parameter selections and potential algorithmic extensions.
... A remedy is to exploit structural prior information of the latent factors. For example, to establish model identifiability, an important line of work in HU uses the convex geometry (CG) of the abundances, e.g., the existence of the so-called "pure pixels" [3], [4], [12], [13], [18] or the "sufficiently scattered condition" [7], [19], [20]. CG-based identfibiaility analysis has also been used in many machine learning tasks; see [21]. ...
... The identifiability problem has been studied extensively, primarily from a CG-based simplex-structured MF (SSMF) viewpoint [21]. In a nutshell, it has been established that if the abundance matrix S satisfies certain geometric conditions, namely, the pure pixel condition [3], [4], [12], [13] and the sufficiently scattered condition [7], [20], [21], then C and S can be identified up to column and row permutations, respectively. These important results reflect the long postulations in the HU community, i.e., the Winter's and Craig's beliefs [9], [11]. ...
... In the design of φ(S r ) function, the ℓ q function is an effective tool to promote the sparsity when q < 2 [7], [8]. Meanwhile, we set the parameter ε > 0 to make the function smooth and the objective function in (11) continuously differentiable. ...
Article
Full-text available
The block-term tensor decomposition model with multilinear rank- (Lr,Lr,1)(L_{r},L_{r},1) terms (or, the “LL1 tensor decomposition” in short) offers a valuable alternative formulation for hyperspectral unmixing (HU), which ensures identifiability of the endmembers/abudnaces in cases where classic matrix factorization (MF) approaches cannot provide such guarantees. However, existing LL1 tensor decomposition-based HU algorithms use a three-factor parameterization of the tensor (i.e., the hyperspectral image cube), which causes difficulties in incorporating structural prior information arising in HU. Consequently, their algorithms often exhibit high per-iteration complexity and slow convergence. This work focuses on LL1 tensor decomposition under structural constraints and regularization terms in HU. Our algorithm uses a two-factor re-parameterization of the tensor model. Like in the MF-based approaches, the factors correspond to the endmembers and abundances in the context of HU. Thus, the proposed framework is natural to incorporate physics-motivated priors in HU. To tackle the formulated optimization problem, a two-block alternating gradient projection (GP)-based algorithm is proposed. Carefully designed projection solvers are proposed to implement the GP algorithm with a relatively low per-iteration complexity. An extrapolation-based acceleration strategy is proposed to expedite the GP algorithm. Such extrapolated multi-block algorithm only had asymptotic convergence assurances in the literature. Our analysis shows that the algorithm converges to the vicinity of a stationary point within finite iterations, under reasonable conditions. Empirical study shows that the proposed algorithm often attains orders-of-magnitude speedup and substantial HU performance gains compared to the existing LL1 decomposition-based HU algorithms.
... Minimum-volume one-layer NMF (minVolNMF) is a well-known NMF variant [25,26,27] that encourages the basis vectors, that is, the columns of W , to have a small volume. Intuitively, this boils down to trying to make them as close as possible to the data points, which enhances the interpretability of the decomposition. ...
... To the best of our knowledge, minVolNMF has not been extended to the deep context yet. We extend the approach of [27] by incorporating a volume contribution at every layer to (3) and (4). Hence, we add the following quantity to each term of both loss functions: for l = 1, 2, . . . ...
... To solve the minVolNMF problem, a majorization-minimization (MM) framework is usually considered. This consists in minimizing a surrogate function, namely a strongly convex upper approximation of the loss function, see [27] and [28] for the details. The FPGM of Algorithm 4 can then be applied on this surrogate. ...
Article
Full-text available
Deep matrix factorizations (deep MFs) are recent unsupervised data mining techniques inspired by constrained low-rank approximations. They aim to extract complex hierarchies of features within high-dimensional datasets. Most of the loss functions proposed in the literature to evaluate the quality of deep MF models and the underlying optimization frameworks are not consistent because different losses are used at different layers. In this paper, we introduce two meaningful loss functions for deep MF and present a generic framework to solve the corresponding optimization problems. We illustrate the effectiveness of this approach through the integration of various constraints and regularizations, such as sparsity, nonnegativity and minimum-volume. The models are successfully applied on both synthetic and real data, namely for hyperspectral unmixing and extraction of facial features.
... Over the last decades, LMM is widely used in remote sensing image processing for hyper-spectral unmixing (HU) [30][31][32][33][34][35][36], and many different minimum volume simplex analysis approaches based on the model have been proposed, where a hyper-spectral pixel is assumed to be a linear combination of endmembers weighted by their abundances [37], and these endmembers are vertices of the simplex. In early research, with the assumption that pure pixels for endmembers exist in the hyper-spectral image, algorithm is to find pure pixels enclosed by the vertices of the data simplex, such as vertex component analysis [30]. ...
... In order to find more fast algorithm without pure pixel, structured matrix factorization-based (SMFbased) approaches are proposed, since the LMM could be reformulated as a matrix factorization form. Fu et al. [34] introduces the volume minimization criterion into SMF and use log determinant as simplified simplex volume computation. Since spectral data is usually nonnegative, [35] and [36] take advantage of the fast convergence of NMF, and propose algorithms to solve rank-deficient matrix problem and rare endmembers estimation respectively. ...
... Since the original matrix factorization problem in (3) is sensitive to initialization, the solution could fall into local optima. Inspired by [34], in this paper, we first randomly chooseĈ points that are close to each other from the given data according to Euclidean distance, then compute the mean value of them. Totally C mean values are computed and arranged as the columns of the initial A. Note that the number of columns of A should be (C + 1) for noise estimation, so the last column is initialized randomly. ...
Article
Full-text available
Scene attribute recognition is to identify attribute labels of one scene image based on scene representation for deeper semantic understanding of scenes. In the past decades, numerous algorithms for scene representation have been proposed by feature engineering or deep convolutional neural network. For models based on only one kind of image feature, it is still difficult to learn the representation of multiple attributes from local image region. For models based on deep learning, despite multi-label can be directly used for learning attributes representation, huge training data are usually necessary to build the multi-label model. In this paper, we investigate the problem by the way of scene representation modeling with multi-feature and non-deep learning. Firstly, we introduce linear mixing model (LMM) for scene image modeling, then present a novel approach, referred to as the mini-batch minimum simplex estimation (MMSE), for attribute-based scene representation learning from highly complex image data. Finally, a two-stage multi-feature fusion method is proposed to further improve the feature representation for scene attribute recognition. The proposed method takes advantage of the fast convergence of nonnegative matrix factorization (NMF) schemes, and at the same time using mini-batch to speed up the computation for large-scale scene dataset. The experimental results based on real image scene demonstrate that the proposed method outperforms several other advanced scene attribute recognition approaches.
... Minimum-volume one-layer NMF (minVolNMF) is a well-known NMF variant [25,26,27] that encourages the basis vectors, that is, the columns of W , to have a small volume. Intuitively, this boils down to trying to make them as close as possible to the data points, which enhances the interpretability of the decomposition. ...
... To the best of our knowledge, minVolNMF has not been extended to the deep context yet. We extend the approach of [27] by incorporating a volume contribution at every layer to (3) and (4). Hence, we add the following quantity to each term of both loss functions: for l = 1, 2, . . . ...
... To solve the minVolNMF problem, a majorization-minimization (MM) framework is usually considered. This consists in minimizing a surrogate function, namely a strongly convex upper approximation of the loss function, see [27] and [28] for the details. The FPGM of Algorithm 4 can then be applied on this surrogate. ...
Preprint
Full-text available
Deep matrix factorizations (deep MFs) are recent unsupervised data mining techniques inspired by constrained low-rank approximations. They aim to extract complex hierarchies of features within high-dimensional datasets. Most of the loss functions proposed in the literature to evaluate the quality of deep MF models and the underlying optimization frameworks are not consistent because different losses are used at different layers. In this paper, we introduce two meaningful loss functions for deep MF and present a generic framework to solve the corresponding optimization problems. We illustrate the effectiveness of this approach through the integration of various constraints and regularizations, such as sparsity, nonnegativity and minimum-volume. The models are successfully applied on both synthetic and real data, namely for hyperspectral unmixing and extraction of facial features.
... Under the LMM, the endmembers and abundances can be considered as the two "latent factors" of a matrix factorization (MF) model-which are non-identifiable in general. The remedy is to exploit the convex geometry (CG) of the abundances, e.g., the existence of the so-called "pure pixels" [3], [5], [6], [15], [16], [22] or the sufficient scattering condition [9], [23], [24]. CG-based identfibiaility analysis under the LMM has also influenced many BSS problems in other domains-in particular, machine learning-where similar models are around [23], [25] The CG and MF based HU algorithms have enjoyed many successes [23], [26]. ...
... The identifiability problem has been studied extensively, primarily from a CG-based simplex-structured MF (SSMF) viewpoint [23], [26]. In a nutshell, it has been established that if the abundance matrix S satisfies certain geometric conditions, namely, the pure pixel condition [3], [5], [6], [15], [16] and the sufficiently scattered condition [7], [9], [12], [14], [17], [23], then C and S can be identified up to column and row permutations, respectively. These important results reflect the long postulations in the HU community, i.e., the Winter's and Craig's beliefs [11], [13]. ...
... The second subproblem, i.e., (19b), admits an efficient solver. That is, projecting a column of F (k+1) onto the probability simplex can be solved in O(R log R) flops in the worst case by a water-filling type algorithm; see [9] and the references therein. The subproblem in (19a) also admits simple solutions. ...
Preprint
The block-term tensor decomposition model with multilinear rank-(Lr,Lr,1)(L_r,L_r,1) terms (or, the "LL1 tensor decomposition" in short) offers a valuable alternative for hyperspectral unmixing (HU) under the linear mixture model. Particularly, the LL1 decomposition ensures the endmember/abundance identifiability in scenarios where such guarantees are not supported by the classic matrix factorization (MF) approaches. However, existing LL1-based HU algorithms use a three-factor parameterization of the tensor (i.e., the hyperspectral image cube), which leads to a number of challenges including high per-iteration complexity, slow convergence, and difficulties in incorporating structural prior information. This work puts forth an LL1 tensor decomposition-based HU algorithm that uses a constrained two-factor re-parameterization of the tensor data. As a consequence, a two-block alternating gradient projection (GP)-based LL1 algorithm is proposed for HU. With carefully designed projection solvers, the GP algorithm enjoys a relatively low per-iteration complexity. Like in MF-based HU, the factors under our parameterization correspond to the endmembers and abundances. Thus, the proposed framework is natural to incorporate physics-motivated priors that arise in HU. The proposed algorithm often attains orders-of-magnitude speedup and substantial HU performance gains compared to the existing three-factor parameterization-based HU algorithms.
... sources. For example, nonnegative matrix factorization (NMF) [5,6,7,8] exploits the assumption that the sources are located in the nonnegative orthant. Antisparse bounded component analysis (BCA) algorithms make use of the sources' ℓ ∞ -norm ball membership feature [9,10,11]. ...
... However, we do not use it as an approximation tool; instead, we use it as a standalone measure relevant to the linear inference setting of the BSS problem. The resulting optimization objective function resembles but is different from those used in determinant maximization-based matrix factorization approaches [7,8,16]. The following is the article's organization: Section 2 provides the BSS setup used throughout the article. ...
... In the case of noisy mixtures, we can remove the equality constraint in (3b) and replace the objective function with [7,16]: ...
Preprint
We introduce a new information maximization (infomax) approach for the blind source separation problem. The proposed framework provides an information-theoretic perspective for determinant maximization-based structured matrix factorization methods such as nonnegative and polytopic matrix factorization. For this purpose, we use an alternative joint entropy measure based on the log-determinant of covariance, which we refer to as log-determinant (LD) entropy. The corresponding (LD) mutual information between two vectors reflects a level of their correlation. We pose the infomax BSS criterion as the maximization of the LD-mutual information between the input and output of the separator under the constraint that the output vectors lie in a presumed domain set. In contrast to the ICA infomax approach, the proposed information maximization approach can separate both dependent and independent sources. Furthermore, we can provide a finite sample guarantee for the perfect separation condition in the noiseless case.
... The second condition, (NMF.SS.ii), limits the tightness of the enclosure by constraining the points of tangency between C and cone(S). Lin et al. [20] introduced an alternative but related [32] condition for SSMF, which is based on the set R(a) = (aB 2 )∩ ∆ r , i.e., the intersection of the origin centered hypersphere with radius a and the unit simplex. The constant γ = sup{a ≤ 1 | R(a) ⊆ conv(S)}, where the columns of S are in ∆ r , is defined as the uniform pixel purity level. ...
... The main emphasis of the current article is laying out the PMF framework and the corresponding identifiability analysis. To illustrate its use, we adopt the iterative algorithm in [32] which is originally proposed for the SSMF framework. ...
... subject to Y = HS, (56b) S :,j ∈ P, j = 1, . . . , N. (56c) Similar to [32], we employ the Lagrangian optimization, ...
Preprint
We introduce Polytopic Matrix Factorization (PMF) as a novel data decomposition approach. In this new framework, we model input data as unknown linear transformations of some latent vectors drawn from a polytope. In this sense, the article considers a semi-structured data model, in which the input matrix is modeled as the product of a full column rank matrix and a matrix containing samples from a polytope as its column vectors. The choice of polytope reflects the presumed features of the latent components and their mutual relationships. As the factorization criterion, we propose the determinant maximization (Det-Max) for the sample autocorrelation matrix of the latent vectors. We introduce a sufficient condition for identifiability, which requires that the convex hull of the latent vectors contains the maximum volume inscribed ellipsoid of the polytope with a particular tightness constraint. Based on the Det-Max criterion and the proposed identifiability condition, we show that all polytopes that satisfy a particular symmetry restriction qualify for the PMF framework. Having infinitely many polytope choices provides a form of flexibility in characterizing latent vectors. In particular, it is possible to define latent vectors with heterogeneous features, enabling the assignment of attributes such as nonnegativity and sparsity at the subvector level. The article offers examples illustrating the connection between polytope choices and the corresponding feature representations.
... sources. For example, nonnegative matrix factorization (NMF) [5,6,7,8] exploits the assumption that the sources are located in the nonnegative orthant. Antisparse bounded component analysis (BCA) algorithms make use of the sources' ℓ ∞ -norm ball membership feature [9,10,11]. ...
... However, we do not use it as an approximation tool; instead, we use it as a standalone measure relevant to the linear inference setting of the BSS problem. The resulting optimization objective function resembles but is different from those used in determinant maximization-based matrix factorization approaches [7,8,16]. The following is the article's organization: Section 2 provides the BSS setup used throughout the article. ...
... In the case of noisy mixtures, we can remove the equality constraint in (3b) and replace the objective function with [7,16]: ...
Conference Paper
Full-text available
We introduce a new information maximization (infomax) approach for the blind source separation problem. The proposed framework provides an information-theoretic perspective for determinant maximization-based structured matrix factorization methods such as nonnegative and polytopic matrix factorization. For this purpose, we use an alternative joint entropy measure based on the log-determinant of covariance, which we refer to as log-determinant (LD) entropy. The corresponding (LD) mutual information between two vectors reflects a level of their correlation. We pose the infomax BSS criterion as the maximization of the LD-mutual information between the input and output of the separator under the constraint that the output vectors lie in a presumed domain set. In contrast to the ICA infomax approach, the proposed information maximization approach can separate both dependent and independent sources. Furthermore, we can provide a finite sample guarantee for the perfect separation condition in the noiseless case.
... The works of [45,81,102] showed that the NMF model is unique provided that the factors A and B have some zeros entries. In the case where the latent factors are strictly positive (and hence have only non-zero entries), other identifiability results were established, based on separability or minimum volume constraints [55]. ...
... First, some additional diversities used to establish uniqueness may lack physical interpretability. For instance, in NMF applications, only enforcing non-negativity of the factors is not enough [55] to guarantee an interpretable solution, hence other types of diversities must be envisioned, as described in the previous subsection. This is a scenario in which exploiting only "natural" diversities does not ensure uniqueness. ...
... However, related algorithms may suffer from high computational complexity. Moreover, uniqueness of the estimated SRI in the noiseless case can only be obtained under additional assumptions on the data or low-rank factors, such as non-negativity and minimum volume constraint or sparsity [55], see e.g., [105] that imposes sparsity on the low-rank factors. In the absence of such hypotheses, only a bound on the recovery error can be obtained [106]. ...
Thesis
Full-text available
Due to the recent emergence of new modalities, the amount of signals collected daily has been increasing. As a result, it frequently occurs that various signals provide information about the same phenomenon. However, a single signal may only contain partial information about this phenomenon. Multimodal data fusion was proposed to overcome this issue. It is defined as joint processing of datasets acquired from different modalities. The aim of data fusion is to enhance the capabilities of each modality to express their specific information about the phenomenon of interest; it is also expected from data fusion that it brings out additional information that would be ignored by separate processing. However, due to the complex interactions between the modalities, understanding the advantages and limits of data fusion may not be straightforward.In a lot of applications such as biomedical imaging or remote sensing, the observed signals are three-dimensional arrays called tensors, thus tensor-based data fusion can be envisioned. Tensor low-rank modeling preserves the multidimensional structure of the observations and enjoys interesting uniqueness properties arising from tensor decompositions. In this work, we address the problem of recovering a high-resolution tensor from tensor observations with some lower resolutions. In particular, hyperspectral super-resolution (HSR) aims at reconstructing a tensor from two degraded versions. While one is degraded in two (spatial) modes, the second is degraded in the third (spectral) mode. Recently, tensor-based approaches were proposed for solving the problem at hand. These works are based on the assumption that the target tensor admits a given low-rank tensor decomposition. The first work addressing the problem of tensor-based HSR was based on a coupled canonical polyadic (CP) decomposition of the observations. This approach gave rise to numerous following reconstruction methods based on coupled tensor models, including our work.The first part of this thesis is devoted to the design of tensor-based algorithms for solving the HSR problem. In Chapter 2, we propose to formulate the problem as a coupled Tucker decomposition. We introduce two simple but fast algorithms based on the higher-order singular value decomposition of the observations. Our experiments show that our algorithms have a competitive performance with state-of-the-art tensor and matrix methods, with a lower computational time. In Chapter 3, we consider spectral variability between the observations. We formulate the reconstruction problem as a coupled block-term decomposition. We impose non-negativity of the low-rank factors, so that they can be incorporated into a physically-informed mixing model. Thus the proposed approach provides a solution to the joint HSR and unmixing problems. The second part of this thesis adresses the performance analysis of the coupled tensor models. The aim of this part is to assess the efficiency of some algorithms introduced in the first part. In Chapter 4, we consider constrained Cramér-Rao lower bounds (CCRB) for coupled tensor CP models. We provide a closed-form expression for the constrained Fisher information matrix in two scenarios, whether i) we only consider the fully-coupled reconstruction problem or ii) if we are interested in comparing the performance of fully-coupled, partially-coupled and uncoupled approaches. We prove that the existing CP-based algorithms are asymptotically efficient. Chapter 5 addresses a non-standard estimation problem in which the constraints on the deterministic model parameters involve a random parameter. We show that in this case, the standard CCRB is a non-informative bound. As a result, we introduce a new randomly constrained Cramér-Rao bound (RCCRB). The relevance of the RCCRB is illustrated using a coupled block-termdecomposition model accounting for random uncertainties.
... Identifiability for min-vol NMF is a strong result that has been used successfully in many applications such as topic modeling and hyperspectral imaging [8], and audio source separation [7]. However, min-vol NMF is computationally hard to solve. ...
... F is the Frobenius norm, λ > 0 is a parameter balancing the two terms in the objective function, I r is the r × r identity matrix, and δ > 0 is a small parameter that prevents log det(W W) from going to −∞ if W is rank deficient [11]. The use of the logarithm of the determinant is less sensible to very disparate singular values of W, leading to better practical performances [8], [12]. ...
... This is the so-called Craig's belief [10]. In document classification, M is a word-by-document matrix so that the columns of W correspond to topics (that is, set of words found simultaneously in several documents) while the columns of H allow to assign each documents to the topics it discusses [8]. ...
... For the spectral unmixing, one is interested in the matrix factorization where columns of the endmember matrix are constrained in a simplex [2]. Over the decades, several simplexbased spectral unmixing algorithms have been proposed under volume minimization or maximization based matrix factorization [5], [6], [7], [8]. Equivalently, the ellipsoid geometry based simplex construction is also proposed [9]. ...
... In [18], volume minimization regularizer is presented; however, it is sensitive to outliers and also comes with a heavy computation burden. To address such issues, a noise-robust version of the volume regularizer [7] is recently derived, which adopts a modified log-determinant (logdet) loss function. The works in [19] and [20] use volume minimization NMF with successive convex optimization and alternating optimization, respectively. ...
... Calculating volume in wavelet space converts the highly correlated endmember signatures to less correlated endmember signatures making the better conditioned endmember matrix. Rvolmin [7] is a volume minimization technique developed under a noiseless setting and thus works well for data sets having higher SNR. Algorithm in [9] studies a maximum volume ellipsoid inscribed in the convex hull of the data points compared to a framework called minimum volume enclosing simplex. ...
Article
Full-text available
In this paper, a wavelet-based energy minimization framework is developed for joint estimation of endmembers and abundances without assuming pure pixels while considering the noisy scenario. Spectrally dense and overlapped hyperspectral data is represented using biorthogonal wavelet bases that yield a compact linear mixing model in the wavelet (sparse) domain. It acts as the data term and helps to reduce the solution space of the unmixed components. Three prior terms are added to better handle the ill-posedness, i.e., the logarithm of determinant volume regularizer enforces minimum endmember simplex, smoothness (spatial) prior to individual abundance maps, and spectral constraint through learning dictionary of abundances. Alternating nonnegative least-squares is employed to optimize the regularized nonnegative matrix factorization (NMF) functional in the wavelet domain. Experiments are conducted on synthetic and three real benchmark hyperspectral data AVIRIS Cuprite, HYDICE Urban, and AVIRIS Jasper Ridge. The efficacy of the proposed algorithm is evaluated by comparing results with state-of-the-art.
... or its variants (see, e.g., [7,23]). As can be seen in (2) and as illustrated in Figure 3, the goal is to find a simplex that encloses the data points and has the minimum volume. ...
... Also, note that γ = 1 corresponds to the pure-pixel case. Interested readers are referred to [41] for more explanations of γ, and [23,26,37] for concurrent and more recent results for theoretical MVES recovery. Loosely speaking, the premise in Theorem 2 should have a high probability to satisfy in practice as far as the data points are reasonably well spread. ...
Preprint
Consider a structured matrix factorization model where one factor is restricted to have its columns lying in the unit simplex. This simplex-structured matrix factorization (SSMF) model and the associated factorization techniques have spurred much interest in research topics over different areas, such as hyperspectral unmixing in remote sensing, topic discovery in machine learning, to name a few. In this paper we develop a new theoretical SSMF framework whose idea is to study a maximum volume ellipsoid inscribed in the convex hull of the data points. This maximum volume inscribed ellipsoid (MVIE) idea has not been attempted in prior literature, and we show a sufficient condition under which the MVIE framework guarantees exact recovery of the factors. The sufficient recovery condition we show for MVIE is much more relaxed than that of separable non-negative matrix factorization (or pure-pixel search); coincidentally it is also identical to that of minimum volume enclosing simplex, which is known to be a powerful SSMF framework for non-separable problem instances. We also show that MVIE can be practically implemented by performing facet enumeration and then by solving a convex optimization problem. The potential of the MVIE framework is illustrated by numerical results.
... where u = e j∠(Ax) is an auxiliary vector of unknown unitmodulus variables with its mth component being u m = e j∠(a H m x) . In the presence of impulsive noise, p -(quasi)-norm has be recognized as an effective tool for promoting sparsity and fending against outliers [23]- [25]. Therefore, we propose to employ an p -fitting based estimator instead of using the 2 -norm, which has the form of ...
... In practice, the above 'extrapolation' technique greatly expedites the whole process in various applications [23], [32]. Some numerical evidence can be seen in Fig. 1, where a simple comparison between the plain Algorithm 2 and the extrapolated version is presented. ...
Preprint
Phase retrieval has been mainly considered in the presence of Gaussian noise. However, the performance of the algorithms proposed under the Gaussian noise model severely degrades when grossly corrupted data, i.e., outliers, exist. This paper investigates techniques for phase retrieval in the presence of heavy-tailed noise -- which is considered a better model for situations where outliers exist. An p\ell_p-norm (0<p<20<p<2) based estimator is proposed for fending against such noise, and two-block inexact alternating optimization is proposed as the algorithmic framework to tackle the resulting optimization problem. Two specific algorithms are devised by exploring different local approximations within this framework. Interestingly, the core conditional minimization steps can be interpreted as iteratively reweighted least squares and gradient descent. Convergence properties of the algorithms are discussed, and the Cram\'er-Rao bound (CRB) is derived. Simulations demonstrate that the proposed algorithms approach the CRB and outperform state-of-the-art algorithms in heavy-tailed noise.
... However, we do not require that the templates, i.e., the columns of W , be convex combinations of the original data in X. A volume minimization criterion has been included in the NMF previously to find tight models [3,12,20,22,26]. While related to our methodology, volume minimization is not the correct measure of model tightness to use in our new method. ...
... Including area minimization in a NMF has been looked at previously [3,12,20,22,26]. ...
Preprint
The nonnegative matrix factorization is a widely used, flexible matrix decomposition, finding applications in biology, image and signal processing and information retrieval, among other areas. Here we present a related matrix factorization. A multi-objective optimization problem finds conical combinations of templates that approximate a given data matrix. The templates are chosen so that as far as possible only the initial data set can be represented this way. However, the templates are not required to be nonnegative nor convex combinations of the original data.
... • Symmetric NMF with applications in topic modeling [9,8]. ...
... 1: % First step: Check the NC-SSC. 2: for i = 1 : r do 3:if there does not exist a feasible solution to the system e − e i = Hx, x such that e ⊤ x = 1, H ⊤ x ≥ 0, and − 1 ≤ x i ≤ 1 for all i.9: if q * > 1 or there is an optimal solution x * ̸ = e i for i ∈ [r] ...
Preprint
Full-text available
The sufficiently scattered condition (SSC) is a key condition in the study of identifiability of various matrix factorization problems, including nonnegative, minimum-volume, symmetric, simplex-structured, and polytopic matrix factorizations. The SSC allows one to guarantee that the computed matrix factorization is unique/identifiable, up to trivial ambiguities. However, this condition is NP-hard to check in general. In this paper, we show that it can however be checked in a reasonable amount of time in realistic scenarios, when the factorization rank is not too large. This is achieved by formulating the problem as a non-convex quadratic optimization problem over a bounded set. We use the global non-convex optimization software Gurobi, and showcase the usefulness of this code on synthetic data sets and on real-world hyperspectral images.
... Proof. Let the set of feasible solutions S be defined as in (30). Also, w.l.o.g. ...
... Suppose α+β = 0: then α min f i +β = α(min f i −1) ≤ 0 (by assumption that min f i < 1). This violates the constraint in (30), so α + β must be non-zero. This follows similarly for γ + δ. ...
Preprint
The problem of foreground material signature extraction in an intimate (nonlinear) mixing setting is considered. It is possible for a foreground material signature to appear in combination with multiple background material signatures. We explore a framework for foreground material signature extraction based on a patch model that accounts for such background variation. We identify data conditions under which a foreground material signature can be extracted up to scaling and elementwise-inverse variations. We present algorithms based on volume minimization and endpoint member identification to recover foreground material signatures under these conditions. Numerical experiments on real and synthetic data illustrate the efficacy of the proposed algorithms.
... If we interpret the columns of W as a set of basis elements, then the matrix H contains the weights necessary to reconstruct the columns of X through linear combinations of these basis elements. This simple, yet powerful, data representation technique is applied in many domains, e.g., facial feature extraction [1], document clustering [2], blind source separation [3], [4], community detection [5], gene expression analysis [6], and recommender systems [7]. Depending on the application and the goal at hand (e.g., clustering, denoising, feature extraction), one may impose different structures/constraints on the factors W and/or H. ...
... This model, proposed in [9], [10], is also referred to as PMF, although it would have been more appropriate to refer to it as maximum-volume PMF. In fact, their proposed model could be viewed as a polytopic variant of minimum-volume semi-NMF [2], while our proposed model would rather be a polytopic variant of NMF. ...
Preprint
Full-text available
Polytopic matrix factorization (PMF) decomposes a given matrix as the product of two factors where the rows of the first factor belong to a given convex polytope and the columns of the second factor belong to another given convex polytope. In this paper we show that if the polytopes have certain invariant properties, and that if the rows of the first factor and the columns of the second factor are sufficiently scattered within their corresponding polytope, then this PMF is identifiable, that is, the factors are unique up to a signed permutation. The PMF framework is quite general, as it recovers other known structured matrix factorization models, and is highly customizable depending on the application. Hence, our result provides sufficient conditions that guarantee the identifiability of a large class of structured matrix factorization models.
... where ∈ ℝ + × represents the K-polymer fractions in every sample. This can be simply written as ≈ in the matrix form, which is called non-negative matrix factorization (NMF) (14)(15)(16). In this context, NMF is a mathematical expression of compositional analysis that 20 simultaneously identifies bases and their quantities , corresponding to "reference-free" quantitative MS (RQMS). ...
... Importantly, the first NMF, ≈ , is not unique because this is a spectral interpretation with no correct answer, whereas the second NMF, ≈ , should be unique to determine the polymer fraction with a single correct answer (20). The second NMF uniqueness can be assured by minimizing the volume of the simplex, which is spanned by row-vectors of and 35 contains all the datapoints, i.e., row-vectors of (16,21,22). The connection between the two non-unique and unique NMFs is key to the RQMS algorithm. ...
Preprint
Full-text available
A sequence—an arrangement of monomers—dominates polymer properties, as best exemplified by proteins; however, an efficient sequencing method for synthetic polymers has not been established yet. Herein, we propose a polymer sequencer based on mass spectrometry of pyrolyzed oligomeric fragments. By interpreting an observed fragment pattern as one generated from a mixture of sequence-defined copolymers, sequencing can be simplified to compositional analysis. Our key development is a reference-free quantitative mass spectrometry. The reference spectra of the hardly synthesizable sequence-defined copolymers were not actually measured but virtually inferred via unsupervised learning of the spectral dataset of easily synthesizable random copolymers. The polymer sequencer quantitatively evaluates complex sequence distribution in versatile multi-monomer systems, which would allow sequence–property correlation studies and practical sequence-controlled polymerization.
... where ∈ ℝ + × represents the K-polymer fractions in every sample. This can be simply written as ≈ in the matrix form, which is called non-negative matrix factorization (NMF) (14)(15)(16). In this context, NMF is a mathematical expression of compositional analysis that 20 simultaneously identifies bases and their quantities , corresponding to "reference-free" quantitative MS (RQMS). ...
... Importantly, the first NMF, ≈ , is not unique because this is a spectral interpretation with no correct answer, whereas the second NMF, ≈ , should be unique to determine the polymer fraction with a single correct answer (20). The second NMF uniqueness can be assured by minimizing the volume of the simplex, which is spanned by row-vectors of and 35 contains all the datapoints, i.e., row-vectors of (16,21,22). The connection between the two non-unique and unique NMFs is key to the RQMS algorithm. ...
Preprint
Full-text available
A sequence—an arrangement of monomers—dominates polymer properties, as best exemplified by proteins; however, an efficient sequencing method for synthetic polymers has not been established yet. Herein, we propose a polymer sequencer based on mass spectrometry of pyrolyzed oligomeric fragments. By interpreting an observed fragment pattern as one generated from a mixture of sequence-defined copolymers, sequencing can be simplified to compositional analysis. Our key development is a reference-free quantitative mass spectrometry. The reference spectra of the hardly synthesizable sequence-defined copolymers were not actually measured but virtually inferred via unsupervised learning of the spectral dataset of easily synthesizable random copolymers. The polymer sequencer quantitatively evaluates complex sequence distribution in versatile multi-monomer systems, which would allow sequence–property correlation studies and practical sequence-controlled polymerization.
... , a N −1 − a N ], with a i being the ith column of A. Recent studies have revealed that SVMin is more than an intuition. It is shown that, under some technical conditions which should hold for sufficiently well-spread s t 's, the optimal solution to the SVMin problem (2) is the ground truth A 0 or its column permutation [11,14,15]. In other words, SVMin is equipped with provable recovery guarantees. ...
... On the other hand, we want to leverage on the merits of extrapolation demonstrated in prior works. It is known that, when f 0 is convex and has Lipschitz continuous gradient, the extrapolated proximal gradient method can lead to faster convergence rates than the proximal gradient method, both provably and empirically [24]; and that, when f 0 is non-convex and has Lipschitz continuous gradient, the extrapolated proximal gradient method is shown to yield some stationarity guarantee [25,26], and similar methods were empirically found to lead to faster convergence speeds in some applications [11,22,27,28]. Our empirical experience with Algorithm 2 is good in terms of runtime speed and stability. ...
... where det(T T )/(N !) denotes the volume of a rparallelogram defined by the non-square matrix T in geometry [8,10]. Since N ! is a constant number, we discard this denominator in experiments. ...
... Since N ! is a constant number, we discard this denominator in experiments. log(·) function is to stabilize the optimization for computed determinant, following [8]. ...
Preprint
Full-text available
This paper studies a practical domain adaptive (DA) semantic segmentation problem where only pseudo-labeled target data is accessible through a black-box model. Due to the domain gap and label shift between two domains, pseudo-labeled target data contains mixed closed-set and open-set label noises. In this paper, we propose a simplex noise transition matrix (SimT) to model the mixed noise distributions in DA semantic segmentation and formulate the problem as estimation of SimT. By exploiting computational geometry analysis and properties of segmentation, we design three complementary regularizers, i.e. volume regularization, anchor guidance, convex guarantee, to approximate the true SimT. Specifically, volume regularization minimizes the volume of simplex formed by rows of the non-square SimT, which ensures outputs of segmentation model to fit into the ground truth label distribution. To compensate for the lack of open-set knowledge, anchor guidance and convex guarantee are devised to facilitate the modeling of open-set noise distribution and enhance the discriminative feature learning among closed-set and open-set classes. The estimated SimT is further utilized to correct noise issues in pseudo labels and promote the generalization ability of segmentation model on target domain data. Extensive experimental results demonstrate that the proposed SimT can be flexibly plugged into existing DA methods to boost the performance. The source code is available at \url{https://github.com/CityU-AIM-Group/SimT}.
... Following Theorem 1, it admits an essentially unique CPD representation if F ≤ 1024. This is far more relaxed compared to uniqueness conditions for matrix factorization, where the rank has to be lower than the matrix dimensions and nonnegativity, sparsity, and geometric conditions are required [31]- [34]. ...
Preprint
Hyperspectral super-resolution refers to the problem of fusing a hyperspectral image (HSI) and a multispectral image (MSI) to produce a super-resolution image (SRI) that has fine spatial and spectral resolution. State-of-the-art methods approach the problem via low-rank matrix approximations to the matricized HSI and MSI. These methods are effective to some extent, but a number of challenges remain. First, HSIs and MSIs are naturally third-order tensors (data "cubes") and thus matricization is prone to loss of structural information--which could degrade performance. Second, it is unclear whether or not these low-rank matrix-based fusion strategies can guarantee identifiability or exact recovery of the SRI. However, identifiability plays a pivotal role in estimation problems and usually has a significant impact on performance in practice. Third, the majority of the existing methods assume that there are known (or easily estimated) degradation operators applied to the SRI to form the corresponding HSI and MSI--which is hardly the case in practice. In this work, we propose to tackle the super-resolution problem from a tensor perspective. Specifically, we utilize the multidimensional structure of the HSI and MSI to propose a coupled tensor factorization framework that can effectively overcome the aforementioned issues. The proposed approach guarantees the identifiability of the SRI under mild and realistic conditions. Furthermore, it works with little knowledge of the degradation operators, which is clearly an advantage over the existing methods. Semi-real numerical experiments are included to show the effectiveness of the proposed approach.
... Identifiability is critical in parameter estimation and model recovery. In signal processing, many NMF-based approaches have therefore been proposed to handle problems such as blind source separation [3], spectrum sensing [4], and hyperspectral unmixing [5,6], where model identifiability plays an essential role. In machine learning, identifiability of NMF is also considered essential for applications such as latent mixture model recovery [7], topic mining [8], and social network clustering [9], where model identifiability is entangled with interpretability of the results. ...
Preprint
In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are \emph{sufficiently scattered} over the nonnegative orthant, while no structural assumption is imposed on the other factor except being full-rank. This is by far the mildest condition under which the latent factors are provably identifiable from the NMF model.
... The second NMF ≈ finds the FA of K-pure constituents ∈ ℝ + × and their concentration in each sample ∈ ℝ + × . The second NMF uses so-called volume minimization algorithm (VolMin), 3 which finds the volumeminimized simplex spanned by the rows of and enclosing all the rows of . This can be formulated as follows: ...
Article
Materials performance is primarily influenced by chemical composition, making compositional analysis (CA) essential in materials science. Traditional quantitative mass spectrometry, which deconvolutes analyte spectra into reference spectra, struggles with reactive...
... We want to find the most suitable C and P best representing the observed spectra X. The long history of NMF developments tackling difficult questions, 16 e.g., how to find a better solution 9 and how to ensure its uniqueness, 10 can be found in the references. Besides such mathematical discussions, we herein focus on only one practical issue specific to pyrolysis-MS; the direct factorization of X into the reference spectra P and their fraction C would not be successful, since pyrolysis-MS does not observe polymers themselves, but measures their thermally and/or ionically decomposed fragments. ...
Article
Full-text available
Accurate composition is determinable without using any prior knowledge or spectra of the system constituents via pyrolysis-mass spectrometry.
... where δ is a parameter that prevents the logdet from going to −∞ when W is rank deficient, and λ ≥ 0 balances the two terms. Note that the true volume spanned by the columns of W and the origin is equal to 1 r! det(W ⊤ W ), but minimizing logdet(W ⊤ W ) is equivalent in the exact case and makes the problem numerically easier to solve because the function logdet(·) is concave and it is easier to design a "nice" majorizer for it [10]. ...
Preprint
Full-text available
Low-rank matrix approximation is a standard, yet powerful, embedding technique that can be used to tackle a broad range of problems, including the recovery of missing data. In this paper, we focus on the performance of nonnegative matrix factorization (NMF) with minimum-volume (MinVol) regularization on the task of nonnegative data imputation. The particular choice of the MinVol regularization is justified by its interesting identifiability property and by its link with the nuclear norm. We show experimentally that MinVol NMF is a relevant model for nonnegative data recovery, especially when the recovery of a unique embedding is desired. Additionally, we introduce a new version of MinVol NMF that exhibits some promising results.
... As a remedy, information about A and/or s is often used to establish identifiability. For instance, statistical independence among [s ] i 's is used in independent component analysis (ICA) [1], and the nonnegativity of A ∈ R M ×K + and S = [s 1 , · · · , s N ] ∈ R K×N + are used in nonnegative matrix factorization (NMF) [3,4]; also see [5][6][7][32][33][34] for more conditions that can assist LMM identification. ...
Preprint
Full-text available
Unsupervised mixture learning (UML) aims at identifying linearly or nonlinearly mixed latent components in a blind manner. UML is known to be challenging: Even learning linear mixtures requires highly nontrivial analytical tools, e.g., independent component analysis or nonnegative matrix factorization. In this work, the post-nonlinear (PNL) mixture model -- where unknown element-wise nonlinear functions are imposed onto a linear mixture -- is revisited. The PNL model is widely employed in different fields ranging from brain signal classification, speech separation, remote sensing, to causal discovery. To identify and remove the unknown nonlinear functions, existing works often assume different properties on the latent components (e.g., statistical independence or probability-simplex structures). This work shows that under a carefully designed UML criterion, the existence of a nontrivial null space associated with the underlying mixing system suffices to guarantee identification/removal of the unknown nonlinearity. Compared to prior works, our finding largely relaxes the conditions of attaining PNL identifiability, and thus may benefit applications where no strong structural information on the latent components is known. A finite-sample analysis is offered to characterize the performance of the proposed approach under realistic settings. To implement the proposed learning criterion, a block coordinate descent algorithm is proposed. A series of numerical experiments corroborate our theoretical claims.
... The total variation (TV) of all estimated endmembers was considered in [25] as a geometrical penalty. To make the technique robust to noise and outliers, in [26], a logdeterminant of the estimated endmembers was considered as the geometrical penalty. Because natural images have a sparse representation in transformed domains (e.g., wavelet domain) ( [27]), and it is easier to remove outliers in these domains ( [28]), in [29], spectral unmixing was performed in the wavelet domain. ...
Article
Transformers have intrigued the vision research community with their state-of-the-art performance in natural language processing. With their superior performance, transformers have found their way into the field of hyperspectral image classification and achieved promising results. In this article, we harness the power of transformers to conquer the task of hyperspectral unmixing and propose a novel deep neural network-based unmixing model with transformers. A transformer network captures nonlocal feature dependencies by interactions between image patches, which are not employed in convolutional neural network (CNN) models, and hereby has the ability to enhance the quality of the endmember spectra and the abundance maps. The proposed model is a combination of a convolutional autoencoder and a transformer. The hyperspectral data is encoded by the convolutional encoder. The transformer captures long-range dependencies between the representations derived from the encoder. The data are reconstructed using a convolutional decoder. We applied the proposed unmixing model to three widely used unmixing datasets, that is, Samson, Apex, and Washington DC Mall, and compared it with the state-of-the-art in terms of root mean squared error and spectral angle distance. The source code for the proposed model will be made publicly available at https://github.com/preetam22n/DeepTrans-HSU .
... Extensive numerical experiments including diverse data (color images, color videos, and fluorescence images) and samplings (random and structural samplings) are also reported to validate the uniqueness and complementarity of different types of priors (global TT low-rank prior, nonlocal self-similar prior, local deep prior) for multi-dimensional image recovery. In the future, we will extend the proposed method to other image processing tasks, such as image denoising [57], [88], image deblurring [89], hyperspectral unmixing or fusion [90], [91], image classification [92], and target detection [93]. For visual data, one remaining challenge for our framework is to recover the large areas missing across all bands without side information (e.g., multi-temporal information). ...
Article
Completing missing entries in multidimensional visual data is a typical ill-posed problem that requires appropriate exploitation of prior information of the underlying data. Commonly used priors can be roughly categorized into three classes: global tensor low-rankness, local properties, and nonlocal self-similarity (NSS); most existing works utilize one or two of them to implement completion. Naturally, there arises an interesting question: can one concurrently make use of multiple priors in a unified way, such that they can collaborate with each other to achieve better performance? This work gives a positive answer by formulating a novel tensor completion framework which can simultaneously take advantage of the global-local-nonlocal priors. In the proposed framework, the tensor train (TT) rank is adopted to characterize the global correlation; meanwhile, two Plug-and-Play (PnP) denoisers, including a convolutional neural network (CNN) denoiser and the color block-matching and 3-D filtering (CBM3D) denoiser, are incorporated to preserve local details and exploit NSS, respectively. Then, we design a proximal alternating minimization algorithm to efficiently solve this model under the PnP framework. Under mild conditions, we establish the convergence guarantee of the proposed algorithm. Extensive experiments show that these priors organically benefit from each other to achieve state-of-the-art performance both quantitatively and qualitatively.
Preprint
Many contemporary signal processing, machine learning and wireless communication applications can be formulated as nonconvex nonsmooth optimization problems. Often there is a lack of efficient algorithms for these problems, especially when the optimization variables are nonlinearly coupled in some nonconvex constraints. In this work, we propose an algorithm named penalty dual decomposition (PDD) for these difficult problems and discuss its various applications. The PDD is a double-loop iterative algorithm. Its inner iterations is used to inexactly solve a nonconvex nonsmooth augmented Lagrangian problem via block-coordinate-descenttype methods, while its outer iteration updates the dual variables and/or a penalty parameter. In Part I of this work, we describe the PDD algorithm and rigorously establish its convergence to KKT solutions. In Part II we evaluate the performance of PDD by customizing it to three applications arising from signal processing and wireless communications.
Article
The sufficiently scattered condition (SSC) is a key condition in the study of identifiability of various matrix factorization problems, including nonnegative, minimum-volume, symmetric, simplex-structured, and polytopic matrix factorizations. The SSC allows one to guarantee that the computed matrix factorization is unique/identifiable, up to trivial ambiguities. However, this condition is NP-hard to check in general. In this paper, we show that it can however be checked in a reasonable amount of time in realistic scenarios, when the factorization rank is not too large. This is achieved by formulating the problem as a non-convex quadratic optimization problem over a bounded set. We use the global non-convex optimization software Gurobi, and showcase the usefulness of this code on real-world hyperspectral images.
Article
Hyperspectral imaging considers the measurement of spectral signatures in near and far field settings. In the far field setting, the interactions of material spectral signatures are typically modeled using linear mixing. In the near field setting, material signatures frequently interact in a nonlinear manner (e.g., intimate mixing). An important task in hyperspectral imaging is to estimate the distribution and spectral signatures of materials present in hyperspectral data, i.e., unmixing. Motivated by forensics, this work considers a specific unmixing task, namely, the problem of foreground material signature extraction in an intimate mixing setting where thin layers of foreground material are deposited on other (background) materials. The unmixing task presents a fundamental challenge of unique (identifiable) recovery of material signatures in this and other settings. We propose a novel model for this intimate mixing setting and explore a framework for the task of foreground material signature extraction with identifiability guarantees under this model. We identify solution criteria and data conditions under which a foreground material signature can be extracted up to scaling and elementwise-inverse variations with theoretical guarantees in a noiseless setting. We present algorithms based on two solution criteria (volume minimization and endpoint member identification) to recover foreground material signatures under these conditions. Numerical experiments on real and synthetic data illustrate the efficacy of the proposed algorithms.
Article
Classic and deep learning-based generalized canonical correlation analysis (GCCA) algorithms seek low-dimensional common representations of data entities from multiple “views” (e.g., audio and image) using linear transformations and neural networks, respectively. When the views are acquired and stored at different computing agents (e.g., organizations and edge devices) and data sharing is undesired due to privacy or communication cost considerations, federated learning -based GCCA is well-motivated. In federated learning, the views are kept locally at the agents and only derived, limited information exchange with a central server is allowed. However, applying existing GCCA algorithms onto such federated learning setting may incur prohibitively high communication overhead. This work puts forth a communication-efficient federated learning framework for both linear and deep GCCA under the maximum variance (MAX-VAR) formulation. The overhead issue is addressed by aggressively compressing (via quantization) the exchanging information between the computing agents and a central controller. Compared to the unquantized version, our empirical study shows that the proposed algorithm enjoys a substantial reduction of communication overheads with virtually no loss in accuracy and convergence speed. Rigorous convergence analyses are also presented, which is a nontrivial effort. Generic federated optimization results do not cover the special problem structure of GCCA, which is a manifold constrained multi-block nonconvex eigen problem. Our result shows that the proposed algorithms for both linear and deep GCCA converge to critical points in a sublinear rate, even under heavy quantization and stochastic approximations. In addition, in the linear MAX-VAR case, the quantized algorithm approaches a global optimum in a geometric rate under reasonable conditions. Synthetic and real-data experiments are used to showcase the effectiveness of the proposed approach.
Article
Full-text available
The recent emergence of sequence engineering in synthetic copolymers has been innovating polymer materials, where short sequences, hereinafter called "codons" using an analogy from nucleotide triads, play key roles in expressing functions. However, the codon compositions cannot be experimentally determined owing to the lack of efficient sequencing methods, hindering the integration of experiments and theories. Herein, we propose a polymer sequencer based on mass spectrometry of pyrolyzed oligomeric fragments. Despite the random fragmentation along copolymer main-chains, the characteristic fragment patterns of the codons are identified and quantified via unsupervised learning of a spectral dataset of random copolymers. The codon complexities increase with their length and monomer component number. Our data-driven approach accommodates the increasing complexities by expanding the dataset; the codon compositions of binary triads, binary pentads and ternary triads are quantifiable with small datasets (N < 100). The sequencer allows describing copolymers with their codon compositions/distributions, facilitating sequence engineering toward innovative polymer materials.
Article
This paper studies a practical domain adaptive (DA) semantic segmentation problem where only pseudo-labeled target data is accessible through a black-box model. Due to the domain gap and label shift between two domains, pseudo-labeled target data contains mixed closed-set and open-set label noises. In this paper, we propose a simplex noise transition matrix (SimT) to model the mixed noise distributions in DA semantic segmentation, and leverage SimT to handle open-set label noise and enable novel target recognition . When handling open-set noises, we formulate the problem as estimation of SimT. By exploiting computational geometry analysis and properties of segmentation, we design four complementary regularizers, i.e. volume regularization, anchor guidance, convex guarantee, and semantic constraint, to approximate the true SimT. Specifically, volume regularization minimizes the volume of simplex formed by rows of the non-square SimT, ensuring outputs of model to fit into the ground truth label distribution. To compensate for the lack of open-set knowledge, anchor guidance, convex guarantee, and semantic constraint are devised to enable the modeling of open-set noise distribution. The estimated SimT is utilized to correct noise issues in pseudo labels and promote the generalization ability of segmentation model on target domain data. In the task of novel target recognition, we first propose closed-to-open label correction (C2OLC) to explicitly derive the supervision signal for open-set classes by exploiting the estimated SimT, and then advance a semantic relation (SR) loss that harnesses the inter-class relation to facilitate the open-set class sample recognition in target domain. Extensive experimental results demonstrate that the proposed SimT can be flexibly plugged into existing DA methods to boost both closed-set and open-set class performance. The source code is available at https://github.com/CityU-AIM-Group/SimT</uri
Article
Nonnegative matrix factorization (NMF) often relies on the separability condition for tractable algorithm design. Separability-based NMF is mainly handled by two types of approaches, namely, greedy pursuit and convex programming. A notable convex NMF formulation is the so-called self-dictionary multiple measurement vectors (SD-MMV), which can work without knowing the matrix rank a priori , and is arguably more resilient to error propagation relative to greedy pursuit. However, convex SD-MMV renders a large memory cost that scales quadratically with the problem size. This memory challenge has been around for a decade, and a major obstacle for applying convex SD-MMV to big data analytics. This work proposes a memory-efficient algorithm for convex SD-MMV. Our algorithm capitalizes on the special update rules of a classic algorithm from the 1950s, namely, the Frank-Wolfe (FW) algorithm. It is shown that, under reasonable conditions, the FW algorithm solves the noisy SD-MMV problem with a memory cost that grows linearly with the amount of data. To handle noisier scenarios, a smoothed group sparsity regularizer is proposed to improve robustness while maintaining the low memory footprint with guarantees. The proposed approach presents the first linear memory complexity algorithmic framework for convex SD-MMV based NMF. The method is tested over a couple of unsupervised learning tasks, i.e., text mining and community detection, to showcase its effectiveness and memory efficiency.
Article
This study presents PRISM, a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data. The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning. PRISM uses a simple probabilistic model, namely, uniform simplex data distribution and additive Gaussian noise, and it carries out inference by maximum likelihood. The inference model is sound in the sense that the vertices are provably identifiable under some assumptions, and it suggests that PRISM can be effective in combating noise when the number of data points is large. PRISM has strong, but hidden, relationships with simplex volume minimization, a powerful geometric approach for the same problem. We study these fundamental aspects, and we also consider algorithmic schemes based on importance sampling and variational inference. In particular, the variational inference scheme is shown to resemble a matrix factorization problem with a special regularizer, which draws an interesting connection to the matrix factorization approach. Numerical results are provided to demonstrate the potential of PRISM.
Conference Paper
Full-text available
Principal Component Analysis (PCA) is the most widely used unsupervised dimensionality reduction approach. In recent research, several robust PCA algorithms were presented to enhance the robustness of PCA model. However, the existing robust PCA methods incorrectly center the data using the 2-norm distance to calculate the mean, which actually is not the optimal mean due to the 1-norm used in the objective functions. In this paper, we propose novel robust PCA objective functions with removing optimal mean automatically. Both theoretical analysis and empirical studies demonstrate our new methods can more effectively reduce data dimensionality than previous robust PCA methods.
Article
Full-text available
Flexible Cartesian robotic arms (CRAs) are typical multicoupling systems. Considering the elastic effects of bolted joints and the motion disturbances, this paper investigates the dynamic and stability of the flexible CRA. With the kinetic energy and potential energy of the comprising components, Hamilton's variational principle and Duhamel integral are utilized to derive the dynamic equation and vibration differential equation. Based on the proposed elastic restraint model of the bolted joints, boundary conditions and mode equations of the flexible CRA are determined with using the principle of virtual work. According to the mode frequencies and sensitivities analysis, it reveals that the connecting stiffness of the bolted joints has significant influences, and the mode frequencies are more sensitive to the tensional stiffness. Moreover, describing the motion displacement of the driving base as combination of an average motion displacement and a harmonic disturbance, the vibration responses of the system are studied. The result indicates that the motion disturbance has obvious influence on the vibration responses, and the influence enhances under larger accelerating operations. The multiple scales method is introduced to analyze the parametric stability of the system, as well as the influences of the tensional stiffness and the end-effector on the stability.
Article
Full-text available
Principal Component Analysis (PCA) is the most widely used unsupervised dimensionality reduc-tion approach. In recent research, several robust PCA algorithms were presented to enhance the robustness of PCA model. However, the existing robust PCA methods incorrectly center the data using the l2-norm distance to calculate the mean, which actually is not the optimal mean due to the l2-norm used in the objective functions. In this paper, we propose novel robust PCA objective functions with removing optimal mean automatically. Both theoretical analysis and empirical studies demonstrate our new methods can more effectively reduce data dimensionality than previous robust PCA methods. Copyright © (2014) by the International Machine Learning Society (IMLS) All rights reserved.
Article
Full-text available
We consider factoring low-rank tensors in the presence of outlying slabs. This problem is important in practice, because data collected in many real-world applications, such as speech, fluorescence, and some social network data, fit this paradigm. Prior work tackles this problem by iteratively selecting a fixed number of slabs and fitting, a procedure which may not converge. We formulate this problem from a group-sparsity promoting point of view, and propose an alternating optimization framework to handle the corresponding p\ell_p (0<p10<p\leq 1) minimization-based low-rank tensor factorization problem. The proposed algorithm features a similar per-iteration complexity as the plain trilinear alternating least squares (TALS) algorithm. Convergence of the proposed algorithm is also easy to analyze under the framework of alternating optimization and its variants. In addition, regularization and constraints can be easily incorporated to make use of \emph{a priori} information on the latent loading factors. Simulations and real data experiments on blind speech separation, fluorescence data analysis, and social network mining are used to showcase the effectiveness of the proposed algorithm.
Article
Full-text available
This paper revisits blind source separation of instantaneously mixed quasi-stationary sources (BSS-QSS), motivated by the observation that in certain applications (e.g., speech) there exist time frames during which only one source is active, or locally dominant. Combined with nonnegativity of source powers, this endows the problem with a nice convex geometry that enables elegant and efficient BSS solutions. Local dominance is tantamount to the so-called pure pixel/separability assumption in hyperspectral unmixing/nonnegative matrix factorization, respectively. Building on this link, a very simple algorithm called successive projection algorithm (SPA) is considered for estimating the mixing system in closed form. To complement SPA in the specific BSS-QSS context, an algebraic preprocessing procedure is proposed to suppress short-term source cross-correlation interference. The proposed procedure is simple, effective, and supported by theoretical analysis. Solutions based on volume minimization (VolMin) are also considered. By theoretical analysis, it is shown that VolMin guarantees perfect mixing system identifiability under an assumption more relaxed than (exact) local dominance—which means wider applicability in practice. Exploiting the specific structure of BSS-QSS, a fast VolMin algorithm is proposed for the overdetermined case. Careful simulations using real speech sources showcase the simplicity, efficiency, and accuracy of the proposed algorithms.
Conference Paper
Full-text available
This paper considers the problem of separating the power spectra and mapping the locations of co-channel transmitters using compound measurements from multiple sensors. This kind of situational awareness is important in cognitive radio practice, for spatial spectrum interpolation, transmission opportunity mining, and interference avoidance. Using temporal auto- and cross-correlations of the sensor outputs, it is shown that the power spectra separation task can be cast as a tensor decomposition problem in the Fourier domain. In particular, a joint diagonalization or (symmetric) parallel factor analysis (PARAFAC) model emerges, with one loading matrix containing the sought power spectra - hence being nonnegative, and locally sparse. Exploiting the latter two properties, it is shown that a very simple algebraic algorithm can be used to speed up the factorization. Assuming a path loss model, it is then possible to identify the transmitter locations by focusing on exclusively used (e.g., carrier) frequencies. The proposed approaches offer identifiability guarantees, and simplicity of implementation. Simulations show that the proposed approaches are effective in separating the spectra and localizing the transmitters.
Article
Full-text available
This paper considers a recently emerged hyperspectral unmixing formulation based on sparse regression of a self-dictionary multiple measurement vector (SD-MMV) model, wherein the measured hyperspectral pixels are used as the dictionary. Operating under the pure pixel assumption, this SD-MMV formalism is special in enabling simultaneous identification of the endmember spectral signatures and the number of endmembers. Previous SD-MMV studies mainly focus on convex relaxations. In this study, we explore the alternative of greedy pursuit, which generally provides efficient and simple algorithms. In particular, we design a greedy SD-MMV algorithm using simultaneous orthogonal matching pursuit. Intriguingly, the proposed greedy algorithm is shown to be closely related to some existing pure pixel search algorithms, especially, the successive projection algorithm (SPA). Thus, a link between SD-MMV and pure pixel search is revealed. We then perform exact recovery analyses, and prove that the proposed greedy algorithm is robust to noise---including its identification of the (unknown) number of endmembers---under a sufficiently low noise level. The identification performance of the proposed greedy algorithm is demonstrated through both synthetic and real-data experiments.
Article
Full-text available
This paper considers regularized block multi-convex optimization, where the feasible set and objective function are generally non-convex but convex in each block of variables. We review some of its interesting examples and propose a generalized block coordinate descent method. Under certain conditions, we show that any limit point satisfies the Nash equi-librium conditions. Furthermore, we establish its global convergence and estimate its asymptotic convergence rate by assuming a property based on the Kurdyka-Lojasiewicz inequality. The proposed algorithms are adapted for factorizing nonnegative matrices and tensors, as well as completing them from their incomplete observations. The algorithms were tested on synthetic data, hyperspectral data, as well as image sets from the CBCL and ORL databases. Compared to the existing state-of-the-art algorithms, the proposed algorithms demonstrate superior performance in both speed and solution quality. The Matlab code of nonnegative matrix/tensor decomposition and completion, along with a few demos, are accessible from the authors' homepages.
Article
Full-text available
The composite Lq (0<q<1)L_q~(0<q<1) minimization problem over a general polyhedron has received various applications in machine learning, wireless communications, image restoration, signal reconstruction, etc. This paper aims to provide a theoretical study on this problem. Firstly, we show that for any fixed 0<q<10<q<1, finding the global minimizer of the problem, even its unconstrained counterpart, is strongly NP-hard. Secondly, we derive Karush-Kuhn-Tucker (KKT) optimality conditions for local minimizers of the problem. Thirdly, we propose a smoothing sequential quadratic programming framework for solving this problem. The framework requires a (approximate) solution of a convex quadratic program at each iteration. Finally, we analyze the worst-case iteration complexity of the framework for returning an ϵ\epsilon-KKT point; i.e., a feasible point that satisfies a perturbed version of the derived KKT optimality conditions. To the best of our knowledge, the proposed framework is the first one with a worst-case iteration complexity guarantee for solving composite LqL_q minimization over a general polyhedron.
Article
Full-text available
In blind hyperspectral unmixing (HU), the pure-pixel assumption is well-known to be powerful in enabling simple and effective blind HU solutions. However, the pure-pixel assumption is not always satisfied in an exact sense, especially for scenarios where pixels are all intimately mixed. In the no pure-pixel case, a good blind HU approach to consider is the minimum volume enclosing simplex (MVES). Empirical experience has suggested that MVES algorithms can perform well without pure pixels, although it was not totally clear why this is true from a theoretical viewpoint. This paper aims to address the latter issue. We develop an analysis framework wherein the perfect identifiability of MVES is studied under the noiseless case. We prove that MVES is indeed robust against lack of pure pixels, as long as the pixels do not get too heavily mixed and too asymmetrically spread. Also, our analysis reveals a surprising and counter-intuitive result, namely, that MVES becomes more robust against lack of pure pixels as the number of endmembers increases. The theoretical results are verified by numerical simulations.
Article
Full-text available
Blind hyperspectral unmixing (HU), also known as unsupervised HU, is one of the most prominent research topics in signal processing (SP) for hyperspectral remote sensing [1], [2]. Blind HU aims at identifying materials present in a captured scene, as well as their compositions, by using high spectral resolution of hyperspectral images. It is a blind source separation (BSS) problem from a SP viewpoint. Research on this topic started in the 1990s in geoscience and remote sensing [3]-[7], enabled by technological advances in hyperspectral sensing at the time. In recent years, blind HU has attracted much interest from other fields such as SP, machine learning, and optimization, and the subsequent cross-disciplinary research activities have made blind HU a vibrant topic. The resulting impact is not just on remote sensing - blind HU has provided a unique problem scenario that inspired researchers from different fields to devise novel blind SP methods. In fact, one may say that blind HU has established a new branch of BSS approaches not seen in classical BSS studies. In particular, the convex geometry concepts - discovered by early remote sensing researchers through empirical observations [3]-[7] and refined by later research - are elegant and very different from statistical independence-based BSS approaches established in the SP field. Moreover, the latest research on blind HU is rapidly adopting advanced techniques, such as those in sparse SP and optimization. The present development of blind HU seems to be converging to a point where the lines between remote sensing-originated ideas and advanced SP and optimization concepts are no longer clear, and insights from both sides would be used to establish better methods.
Article
Full-text available
Nonnegative matrix factorization (NMF) has become a widely used tool for the analysis of high-dimensional data as it automatically extracts sparse and meaningful features from a set of nonnegative data vectors. We first illustrate this property of NMF on three applications, in image processing, text mining and hyperspectral imaging --this is the why. Then we address the problem of solving NMF, which is NP-hard in general. We review some standard NMF algorithms, and also present a recent subclass of NMF problems, referred to as near-separable NMF, that can be solved efficiently (that is, in polynomial time), even in the presence of noise --this is the how. Finally, we briefly describe some problems in mathematics and computer science closely related to NMF via the nonnegative rank.
Article
Full-text available
Article
Full-text available
We interpret non-negative matrix factorization geometrically, as the problem of finding a simplicial cone which contains a cloud of data points and which is contained in the positive orthant. We show that under certain conditions, basically requiring that some of the data are spread across the faces of the positive orthant, there is a unique such simplicial cone. We give examples of synthetic image articulation databases which obey these conditions; these require separated support and factorial sampling. For such databases there is a generative model in terms of `parts' and NMF correctly identifies the `parts'. We show that our theoretical results are predictive of the performance of published NMF code, by running the published algorithms on one of our synthetic image articulation databases.
Article
Full-text available
In this paper, we study the nonnegative matrix factorization problem under the separability assumption (that is, there exists a cone spanned by a small subset of the columns of the input nonnegative data matrix containing all columns), which is equivalent to the hyperspectral unmixing problem under the linear mixing model and the pure-pixel assumption. We present a family of fast recursive algorithms, and prove they are robust under any small perturbations of the input data matrix. This family generalizes several existing hyperspectral unmixing algorithms and hence provides for the first time a theoretical justification of their better practical performance.
Conference Paper
Full-text available
This paper studies the robust weighted-sum rate optimization problem in the presence of channel uncertainty over a K-user Gaussian Interference Channel (GIFC), where multiple antennas are present at all transmitters and receivers. Motivated by recent results on interference alignment that show the optimality of linear precoders and simple receivers in achieving the maximum degrees-of-freedom available in the GIFC, we consider linear transmit precoding and two simple decoding schemes: single-stream decoding and single-user decoding. The resulting precoder design problems are then posed as specific optimization problems. Unfortunately, due to the hardness of these problems, optimal solutions cannot be efficiently obtained. Instead of resorting to ad-hoc algorithms, we show that it is possible to design algorithms using a systematic approach. Towards this end, this paper develops new provably convergent iterative algorithms for precoder design through ingenious sub-problem formulations such that each of these sub-problems can be solved optimally. The sub-problems are solved in closed-form for certain cases and formulated as standard convex problems for the rest. To complement these contributions on achievable schemes, we generalize the genie-MAC outer bounding technique to incorporate channel uncertainty using notions of compound-MAC capacity and then obtain computable outer bounds using an alternating optimization approach. Thus, we introduce one of the first approaches to obtain tighter outer bounds on the capacity region of the GIFC in the presence of channel uncertainty.
Article
Full-text available
Effective unmixing of hyperspectral data cube under a noisy scenario has been a challenging research problem in remote sensing arena. A branch of existing hyperspectral unmixing algorithms is based on Craig's criterion, which states that the vertices of the minimum-volume simplex enclosing the hyperspectral data should yield high fidelity estimates of the endmember signatures associated with the data cloud. Recently, we have developed a minimum-volume enclosing simplex (MVES) algorithm based on Craig's criterion and validated that the MVES algorithm is very useful to unmix highly mixed hyperspectral data. However, the presence of noise in the observations expands the actual data cloud, and as a consequence, the endmember estimates obtained by applying Craig-criterion-based algorithms to the noisy data may no longer be in close proximity to the true endmember signatures. In this paper, we propose a robust MVES (RMVES) algorithm that accounts for the noise effects in the observations by employing chance constraints. These chance constraints in turn control the volume of the resulting simplex. Under the Gaussian noise assumption, the chance-constrained MVES problem can be formulated into a deterministic nonlinear program. The problem can then be conveniently handled by alternating optimization, in which each subproblem involved is handled by using sequential quadratic programming solvers. The proposed RMVES is compared with several existing benchmark algorithms, including its predecessor, the MVES algorithm. Monte Carlo simulations and real hyperspectral data experiments are presented to demonstrate the efficacy of the proposed RMVES algorithm.
Article
Full-text available
Hyperspectral unmixing aims at identifying the hidden spectral signatures (or endmembers) and their corresponding proportions (or abundances) from an observed hyperspectral scene. Many existing hyperspectral unmixing algorithms were developed under a commonly used assumption that pure pixels exist. However, the pure-pixel assumption may be seriously violated for highly mixed data. Based on intuitive grounds, Craig reported an unmixing criterion without requiring the pure-pixel assumption, which estimates the endmembers by vertices of a minimum-volume simplex enclosing all the observed pixels. In this paper, we incorporate convex analysis and Craig's criterion to develop a minimum-volume enclosing simplex (MVES) formulation for hyperspectral unmixing. A cyclic minimization algorithm for approximating the MVES problem is developed using linear programs (LPs), which can be practically implemented by readily available LP solvers. We also provide a non-heuristic guarantee of our MVES problem formulation, where the existence of pure pixels is proved to be a sufficient condition for MVES to perfectly identify the true endmembers. Some Monte Carlo simulations and real data experiments are presented to demonstrate the efficacy of the proposed MVES algorithm over several existing hyperspectral unmixing methods.
Book
Proximal Algorithms discusses proximal operators and proximal algorithms, and illustrates their applicability to standard and distributed convex optimization in general and many applications of recent interest in particular. Much like Newton's method is a standard tool for solving unconstrained smooth optimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed versions of these problems. They are very generally applicable, but are especially well-suited to problems of substantial recent interest involving large or high-dimensional datasets. Proximal methods sit at a higher level of abstraction than classical algorithms like Newton's method: the base operation is evaluating the proximal operator of a function, which itself involves solving a small convex optimization problem. These subproblems, which generalize the problem of projecting a point onto a convex set, often admit closed-form solutions or can be solved very quickly with standard or simple specialized methods. Proximal Algorithms discusses different interpretations of proximal operators and algorithms, looks at their connections to many other topics in optimization and applied mathematics, surveys some popular algorithms, and provides a large number of examples of proximal operators that commonly arise in practice.
Article
This article presents a powerful algorithmic framework for big data optimization, called the block successive upper-bound minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the block coordinate descent (BCD) method, the convex-concave procedure (CCCP) method, the block coordinate proximal gradient (BCPG) method, the nonnegative matrix factorization (NMF) method, the expectation maximization (EM) method, etc. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation, and the required communication overhead. Illustrative examples from networking, signal processing, and machine learning are presented to demonstrate the practical performance of the BSUM framework.
Article
This paper introduces a robust linear model to describe hyperspec-tral data arising from the mixture of several pure spectral signatures. This new model not only generalizes the commonly used linear mixing model but also allows for possible nonlinear effects to be handled, relying on mild assumptions regarding these nonlin-earities. Based on this model, a nonlinear unmixing procedure is proposed. The standard nonnegativity and sum-to-one constraints inherent to spectral unmixing are coupled with a group-sparse constraint imposed on the nonlinearity component. The resulting objective function is minimized using a multiplicative algorithm. Simulation results obtained on synthetic and real data show that the proposed strategy competes with state-of-the-art linear and nonlinear unmixing methods.
Article
This monograph is about a class of optimization algorithms called proximal algorithms. Much like Newton's method is a standard tool for solving unconstrained smooth optimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed versions of these problems. They are very generally applicable, but are especially well-suited to problems of substantial recent interest involving large or high-dimensional datasets. Proximal methods sit at a higher level of abstraction than classical algorithms like Newton's method: the base operation is evaluating the proximal operator of a function, which itself involves solving a small convex optimization problem. These subproblems, which generalize the problem of projecting a point onto a convex set, often admit closed-form solutions or can be solved very quickly with standard or simple specialized methods. Here, we discuss the many different interpretations of proximal operators and algorithms, describe their connections to many other topics in optimization and applied mathematics, survey some popular algorithms, and provide a large number of examples of proximal operators that commonly arise in practice.
Article
Nonnegative matrix factorization (NMF) is a useful tool in a broad range of applications, from signal separation to computer vision and machine learning. NMF is a hard (NP-hard) computational problem for which various approximate solutions have been developed over the years. Given the widespread interest in NMF and its applications, it is perhaps surprising that the pertinent Cram?r?Rao lower bound (CRLB) on the accuracy of the nonnegative latent factor estimates has not been worked out in the literature. In hindsight, one reason may be that the required computations are more subtle than usual: the problem involves constraints and ambiguities that must be dealt with, and the Fisher information matrix is always singular. We provide a concise tutorial derivation of the CRLB for both symmetric NMF and asymmetric NMF, using the latest CRLB tools, which should be of broad interest for analogous derivations in related factor analysis problems. We illustrate the behavior of these bounds with respect to model parameters and put some of the best NMF algorithms to the test against one another and the CRLB. The results help illuminate what can be expected from the current state of art in NMF algorithms, and they are reassuring in that the gap to optimality is small in relatively sparse and low rank scenarios.
Article
Non-negative matrix factorization (NMF) has found numerous applications, due to its ability to provide interpretable decompositions. Perhaps surprisingly, existing results regarding its uniqueness properties are rather limited, and there is much room for improvement in terms of algorithms as well. Uniqueness aspects of NMF are revisited here from a geometrical point of view. Both symmetric and asymmetric NMF are considered, the former being tantamount to element-wise non-negative square-root factorization of positive semidefinite matrices. New uniqueness results are derived, e.g., it is shown that a sufficient condition for uniqueness is that the conic hull of the latent factors is a superset of a particular second-order cone. Checking this condition is shown to be NP-complete; yet this and other results offer insights on the role of latent sparsity in this context. On the computational side, a new algorithm for symmetric NMF is proposed, which is very different from existing ones. It alternates between Procrustes rotation and projection onto the non-negative orthant to find a non-negative matrix close to the span of the dominant subspace. Simulation results show promising performance with respect to the state-of-art. Finally, the new algorithm is applied to a clustering problem for co-authorship data, yielding meaningful and interpretable results.
Article
This paper introduces a robust mixing model to describe hyperspectral data resulting from the mixture of several pure spectral signatures. This new model not only generalizes the commonly used linear mixing model, but also allows for possible nonlinear effects to be easily handled, relying on mild assumptions regarding these nonlinearities. The standard nonnegativity and sum-to-one constraints inherent to spectral unmixing are coupled with a group-sparse constraint imposed on the nonlinearity component. This results in a new form of robust nonnegative matrix factorization. The data fidelity term is expressed as a ?-divergence, a continuous family of dissimilarity measures that takes the squared Euclidean distance and the generalized Kullback-Leibler divergence as special cases. The penalized objective is minimized with a block-coordinate descent that involves majorizationminimization updates. Simulation results obtained on synthetic and real data show that the proposed strategy competes with state-of-the-art linear and nonlinear unmixing methods.
Article
The block coordinate descent (BCD) method is widely used for minimizing a continuous function f of several block variables. At each iteration of this method, a single block of variables is optimized, while the remaining variables are held fixed. To ensure the convergence of the BCD method, the subproblem to be optimized in each iteration needs to be solved exactly to its unique optimal solution. Unfortunately, these requirements are often too restrictive for many practical scenarios. In this paper, we study an alternative inexact BCD approach which updates the variable blocks by successively minimizing a sequence of approximations of f which are either locally tight upper bounds of f or strictly convex local approximations of f. We focus on characterizing the convergence properties for a fairly wide class of such methods, especially for the cases where the objective functions are either non-differentiable or nonconvex. Our results unify and extend the existing convergence results for many classical algorithms such as the BCD method, the difference of convex functions (DC) method, the expectation maximization (EM) algorithm, as well as the alternating proximal minimization algorithm.
Article
In this paper, we study the nonnegative matrix factorization problem under the separability assumption (that is, there exists a cone spanned by a small subset of the columns of the input nonnegative data matrix containing all columns), which is equivalent to the hyperspectral unmixing problem under the linear mixing model and the pure-pixel assumption. We present a family of fast recursive algorithms, and prove they are robust under any small perturbations of the input data matrix. This family generalizes several existing hyperspectral unmixing algorithms and hence provides for the first time a theoretical justification of their better practical performance.
Article
We provide an elementary proof of a simple, efficient algorithm for computing the Euclidean projection of a point onto the probability simplex. We also show an application in Laplacian K-modes clustering.
Article
In this paper we discuss algorithms forL p-methods, i.e. minimizers of theL p-norm of the residual vector. The statistical “goodness” of the different methods when applied to regression problems is compared in a Monte Carlo experiment.
Article
When considering the problem of unmixing hyperspectral images, most of the literature in the geoscience and image processing areas relies on the widely used linear mixing model (LMM). However, the LMM may be not valid and other nonlinear models need to be considered, for instance, when there are multi-scattering effects or intimate interactions. Consequently, over the last few years, several significant contributions have been proposed to overcome the limitations inherent in the LMM. In this paper, we present an overview of recent advances in nonlinear unmixing modeling.
Chapter
Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.
Conference Paper
This paper presents a new linear hyperspectral unmixing method of the minimum volume class, termed simplex identification via split augmented Lagrangian (SISAL). Following Craig's seminal ideas, hyperspectral linear unmixing amounts to finding the minimum volume simplex containing the hyperspectral vectors. This is a nonconvex optimization problem with convex constraints. In the proposed approach, the positivity constraints, forcing the spectral vectors to belong to the convex hull of the end member signatures, are replaced by soft constraints. The obtained problem is solved by a sequence of augmented Lagrangian optimizations. The resulting algorithm is very fast and able so solve problems far beyond the reach of the current state-of-the art algorithms. The effectiveness of SISAL is illustrated with simulated data.
Conference Paper
This paper presents a new method of minimum volume class for hyperspectral unmixing, termed minimum volume simplex analysis (MVSA). The underlying mixing model is linear; i.e., the mixed hyperspectral vectors are modeled by a linear mixture of the end-member signatures weighted by the correspondent abundance fractions. MVSA approaches hyperspectral unmixing by fitting a minimum volume simplex to the hyperspectral data, constraining the abundance fractions to belong to the probability simplex. The resulting optimization problem is solved by implementing a sequence of quadratically constrained subproblems. In a final step, the hard constraint on the abundance fractions is replaced with a hinge type loss function to account for outliers and noise. We illustrate the state-of-the-art performance of the MVSA algorithm in unmixing simulated data sets. We are mainly concerned with the realistic scenario in which the pure pixel assumption (i.e., there exists at least one pure pixel per end member) is not fulfilled. In these conditions, the MVSA yields much better performance than the pure pixel based algorithms.
Article
Nonlinear models have recently shown interesting properties for spectral unmixing. This paper studies a generalized bilinear model and a hierarchical Bayesian algorithm for unmixing hyperspectral images. The proposed model is a generalization not only of the accepted linear mixing model but also of a bilinear model that has been recently introduced in the literature. Appropriate priors are chosen for its parameters to satisfy the positivity and sum-to-one constraints for the abundances. The joint posterior distribution of the unknown parameter vector is then derived. Unfortunately, this posterior is too complex to obtain analytical expressions of the standard Bayesian estimators. As a consequence, a Metropolis-within-Gibbs algorithm is proposed, which allows samples distributed according to this posterior to be generated and to estimate the unknown model parameters. The performance of the resulting unmixing strategy is evaluated via simulations conducted on synthetic and real data.
Article
In this work we address the subspace clustering problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to cluster the samples into their respective subspaces and remove possible outliers as well. To this end, we propose a novel objective function named Low-Rank Representation (LRR), which seeks the lowest-rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, we prove that LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for data corrupted by arbitrary sparse errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace clustering and error correction, in an efficient and effective way.
Conference Paper
In this paper, we propose a new data clustering method called concept factorization that models each concept as a linear combination of the data points, and each data point as a linear combination of the concepts. With this model, the data clustering task is accomplished by computing the two sets of linear coefficients, and this linear coefficients computation is carried out by finding the non-negative solution that minimizes the reconstruction error of the data points. The cluster label of each data point can be easily derived from the obtained linear coefficients. This method differs from the method of clustering based on non-negative matrix factorization (NMF) \citeXu03 in that it can be applied to data containing negative values and the method can be implemented in the kernel space. Our experimental results show that the proposed data clustering method and its variations performs best among 11 algorithms and their variations that we have evaluated on both TDT2 and Reuters-21578 corpus. In addition to its good performance, the new method also has the merit in its easy and reliable derivation of the clustering results.
Article
We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an ex- tension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising nu- merical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.
Article
Previous studies have demonstrated that document clustering performance can be improved significantly in lower dimensional linear subspaces. Recently, matrix factorization-based techniques, such as Nonnegative Matrix Factorization (NMF) and Concept Factorization (CF), have yielded impressive results. However, both of them effectively see only the global euclidean geometry, whereas the local manifold geometry is not fully considered. In this paper, we propose a new approach to extract the document concepts which are consistent with the manifold geometry such that each concept corresponds to a connected component. Central to our approach is a graph model which captures the local geometry of the document submanifold. Thus, we call it Locally Consistent Concept Factorization (LCCF). By using the graph Laplacian to smooth the document-to-concept mapping, LCCF can extract concepts with respect to the intrinsic manifold structure and thus documents associated with the same concept can be well clustered. The experimental results on TDT2 and Reuters-21578 have shown that the proposed approach provides a better representation and achieves better clustering results in terms of accuracy and mutual information.
Article
Relative to a given convex bodyC, aj-simplexS inC islargest if it has maximum volume (j-measure) among allj-simplices contained inC, andS isstable (resp.rigid) if vol(S)≥vol(S′) (resp. vol(S)>vol(S′)) for eachj-simplexS′ that is obtained fromS by moving a single vertex ofS to a new position inC. This paper contains a variety of qualitative results that are related to the problems of finding a largest, a stable, or a rigidj-simplex in a givenn-dimensional convex body or convex polytope. In particular, the computational complexity of these problems is studied both for -polytopes (presented as the convex hull of a finite set of points) and forℋ-polytopes (presented as an intersection of finitely many half-spaces).
Book
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
Article
Nonnegative matrix factorization (NMF) with minimum-volume-constraint (MVC) is exploited in this paper. Our results show that MVC can actually improve the sparseness of the results of NMF. This sparseness is L(0)-norm oriented and can give desirable results even in very weak sparseness situations, thereby leading to the significantly enhanced ability of learning parts of NMF. The close relation between NMF, sparse NMF, and the MVC_NMF is discussed first. Then two algorithms are proposed to solve the MVC_NMF model. One is called quadratic programming_MVC_NMF (QP_MVC_NMF) which is based on quadratic programming and the other is called negative glow_MVC_NMF (NG_MVC_NMF) because it uses multiplicative updates incorporating natural gradient ingeniously. The QP_MVC_NMF algorithm is quite efficient for small-scale problems and the NG_MVC_NMF algorithm is more suitable for large-scale problems. Simulations show the efficiency and validity of the proposed methods in applications of blind source separation and human face images analysis.
Article
This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.