Article

A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We propose a general algorithmic framework for constrained matrix and tensor factorization, which is widely used in machine learning. The new framework is a hybrid between alternating optimization (AO) and the alternating direction method of multipliers (ADMM): each matrix factor is updated in turn, using ADMM, hence the name AO-ADMM. This combination can naturally accommodate a great variety of constraints on the factor matrices, and almost all possible loss measures for the fitting. Computation caching and warm start strategies are used to ensure that each update is evaluated efficiently, while the outer AO framework exploits recent developments in block coordinate descent (BCD)-type methods which help ensure that every limit point is a stationary point, as well as faster and more robust convergence in practice. Three special cases are studied in detail: non-negative matrix/tensor factorization, constrained matrix/tensor completion, and dictionary learning. Extensive simulations and experiments with real data are used to showcase the effectiveness and broad applicability of the proposed framework.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In (Huang, Sidiropoulos, & Liavas, 2016) authors have applied ADMM to a task of finding a CP-decomposition that satisfied particular constraints like non-negativity, sparsity, etc.. They called this approach AO-ADMM, since one of the objectives was an Alternating Optimization(in particular, Alternating Least Squares) that is used to find CP factors. ...
... Finding a decomposition that produces less error after quantization can be formulated as a constrained tensor factorization problem. A method called AO-ADMM(a hybrid of the Alternating Optimization and the Alternating Direction Method of Multipliers) has been shown (Huang et al., 2016) to successfully solve problems that involve non-negativity, sparsity or simplex constraints. In our work we extend the field of its application by introducing a constraint function that ensures low quantization error of the derived factors and construct a corresponding algorithm. ...
... For X ≈ AB T matrix factorization G is a Gram matrix of the factor that is fixed on current iteration of Alternating Least Squares and K is an original matrix multiplied by a fixed factor (A T A and X T A from (13)). We adopt expressions for r, s and ρ from (Huang et al., 2016). Similarly, the task is solved for the remaining factor through minimizing ||X −Ã T B T || 2 F over set Q. ...
Article
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it’s prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.
... In the ensuing subsection, the proposed solver is developed using by alternating optimization with ADMM intermediate steps; see e.g. [42] and [43]. ...
... Following the steps in [43], auxiliary variableĀ is introduced to account for the nonnegativity constraint, and the augmented Lagrangian of (4) is ...
... until a convergence criterion is met, namely whether the maximum number of iterations is exceeded, i.e., r = I max,ADMM , or a prescribed -accuracy is met, i.e., (6) denotes the element-wise projection of the input matrix onto the positive orthant, and its use enables theĀ (r) update to be carried at a very low cost. The Lagrange multiplier is set to ρ = H A 2 F /Ka value that is empirically shown to yield similar performance to that of the optimal value [43]. The final A (r) iterate in the ADMM solver will be used to update A (k) . ...
Preprint
Detection of overlapping communities in real-world networks is a generally challenging task. Upon recognizing that a network is in fact the union of its egonets, a novel network representation using multi-way data structures is advocated in this contribution. The introduced sparse tensor-based representation exhibits richer structure compared to its matrix counterpart, and thus enables a more robust approach to community detection. To leverage this structure, a constrained tensor approximation framework is introduced using PARAFAC decomposition. The arising constrained trilinear optimization is handled via alternating minimization, where intermediate subproblems are solved using the alternating direction method of multipliers (ADMM) to ensure convergence. The factors obtained provide soft community memberships, which can further be exploited for crisp, and possibly-overlapping community assignments. The framework is further broadened to include time-varying graphs, where the edgeset as well as the underlying communities evolve through time. Performance of the proposed approach is assessed via tests on benchmark synthetic graphs as well as real-world networks. As corroborated by numerical tests, the proposed tensor-based representation captures multi-hop nodal connections, that is, connectivity patterns within single-hop neighbors, whose exploitation yields a more robust community identification in the presence of mixing as well as overlapping communities.
... Due to the inherent non-uniqueness and instability encountered in solving the inverse problem, it is an effective strategy to constrain the solution space by incorporating the prior information of signals. Low-rankness, as a prior, has been widely utilized for extracting the internal structure of signals in various real-world applications [18][19][20][21][22][23][24][25][26][27][28][29]. ...
... where C k = Φ (X k ) (see Theorem 3). The model (31) serves as a general format representing both the model (26) and model (27), where it is equivalent to the model (26) when the T k is orthogonal transform, and the model (27) when the T k is semi-orthogonal transform, respectively. To convert problem (31) into an unconstrained problem, we introduce the indicator functions as ...
... where C k = Φ (X k ) (see Theorem 3). The model (31) serves as a general format representing both the model (26) and model (27), where it is equivalent to the model (26) when the T k is orthogonal transform, and the model (27) when the T k is semi-orthogonal transform, respectively. To convert problem (31) into an unconstrained problem, we introduce the indicator functions as ...
Article
Full-text available
Recently, tensor singular value decomposition (t- SVD), based on the tensor-tensor product (t-product), has become a powerful tool for processing third-order tensor data. However, constrained by the fact that the basic element is the fiber (i.e., vector) in the t-product, higher-order tensor data (i.e., order d > 3) are usually unfolded into third-order tensors to satisfy the classical t-product setting, which leads to the destroying of high-dimensional structure. By revisiting the basic element in the t-product, we suggest a generalized t-product called elementbased tensor-tensor product (elt-product) as an alternative of the classic t-product, where the basic element is a ( d - 2)th-order tensor instead of a vector. The benefit of the elt-product is that it can better preserve high-dimensional structures and that it can explore more complex interactions via higher-order convolution instead of first-order convolution in classic t-product. Starting from the elt-product, we develop new tensor-SVD and low-rank tensor metrics (e.g., rank and nuclear norm). Equipped with the suggested metrics, we present a tensor completion model for high-order tensor data and prove the exact recovery guarantees. To harness the resulting nonconvex optimization problem, we apply an alternating direction method of the multiplier (ADMM) algorithm with a theoretical convergence guarantee. Extensive experimental results on the simulated and real-world data (color videos, light-field images, light-field videos, and traffic data) demonstrate the superiority of the proposed model against the state-of-the-art baseline models.
... This constraint allows better convergence especially when the columns of the factors are collinear. Nonnegativity constraint can be imposed on the factors during optimization by projecting the values of each factor matrix on  þ [33]. This method of applying nonnegativity constraint sometimes shows its limits in certain cases due to the change of scale. ...
... Nonnegativity can also be imposed by choosing an optimization method [14,17,35] or even using an exponential variable change [31]. In the literature, the projection method is mostly used such as in approach [14,17,18,33,36,37]. In signal or data processing, the nonnegative CPD is usually used as a mathematical model to fit a data tensor T of size (I, J, K): ...
... This is also the case of fluorescence spectroscopy where the data from the sensor is nonnegative [4,30,31,47]. Nonnegativity constraint can be imposed on the factors during optimization by projecting the values of each factor matrix on  þ [33]. This method of imposing nonnegativity sometimes shows its limits in certain cases because of the change of scale. ...
Chapter
Full-text available
The canonical polyadic decomposition (CPD) is now widely used in signal processing to decompose multi-way arrays. In many applications, it is important to add constraints to quickly converge on an optimal solution. In contrast to classical CPD, we then focus on online CPD. In this context, the number of relevant factors is usually unknown and can vary with time. We propose two algorithms to compute the online CPD based on sparse dictionary learning. We also introduce an application example in environmental sciences and evaluate the performances of the proposed approaches in this context on real data.
... ADMM in its simplest form is summarized in Algorithm 1. The subproblems for each constrained factor matrix (or coupled factor matrices) are transformed into ADMM form by introducing split variables Z which separate the factorization from the constraint, as first proposed for constrained CP decompositions in [48]. For instance, for uncoupled factor matrix A in model (4), the problem takes the form ...
... For others, efficient estimation algorithms exist, as shown in [50,51] and here 1 . This use of proximal operators facilitates the flexibility of AO-ADMM, as it allows to apply constraints and regularizations in a mix-and-match, plug-and-play fashion [48]. ...
... Following the argument in [41,48], we choose an individual step-size parameter ρ c k for each k = 1, ..., K and set ...
Preprint
Data fusion models based on Coupled Matrix and Tensor Factorizations (CMTF) have been effective tools for joint analysis of data from multiple sources. While the vast majority of CMTF models are based on the strictly multilinear CANDECOMP/PARAFAC (CP) tensor model, recently also the more flexible PARAFAC2 model has been integrated into CMTF models. PARAFAC2 tensor models can handle irregular/ragged tensors and have shown to be especially useful for modelling dynamic data with unaligned or irregular time profiles. However, existing PARAFAC2-based CMTF models have limitations in terms of possible regularizations on the factors and/or types of coupling between datasets. To address these limitations, in this paper we introduce a flexible algorithmic framework that fits PARAFAC2-based CMTF models using Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM). The proposed framework allows to impose various constraints on all modes and linear couplings to other matrix-, CP- or PARAFAC2-models. Experiments on various simulated and a real dataset demonstrate the utility and versatility of the proposed framework as well as its benefits in terms of accuracy and efficiency in comparison with state-of-the-art methods.
... As the native structure of our data is a tensor, we consider the problem in terms of tensor decomposition [41], which is the natural framework for processing multimodal data in the signal and image processing community [47]- [49]. There are many types of decomposition, such as tucker decomposition, block term decomposition (BTD), canonical polyadic decomposition (CPD), etc [50]. ...
... In [24], the nonnegative CPD is computed using the projected compressed ALS (ProCo-ALS) algorithm, which is considerably fast [47] but not so flexible with additional constraints. Finally, in [29]- [31], an alternative algorithm is proposed based on alternating optimization alternating direction method of multipliers (AO-ADMM) [49] with compression and nonnegative constraints, which is flexible and stable with large datasets, but has not yet addressed MultiHU-TD which requires further modeling (i.e., sparsity, ASC). ...
... • We propose a methodological framework for dealing with MultiHU-TD based on AO-ADMM by Huang [49], and expand it to incorporate ASC with joint nonnegativity and sparsity. The proposed AO-ADMM-ASC is a general algorithm that can be applied in other domains of BSS where convex combinations of sources apply. ...
Article
Full-text available
Hyperspectral unmixing allows to represent mixed pixels as a set of pure materials weighted by their abundances. Spectral features alone are often insufficient, so it is common to rely on other features of the scene. Matrix models become insufficient when the hyperspectral image is represented as a high-order tensor with additional features in a multimodal, multi-feature framework. Tensor models such as Canonical polyadic decomposition allow for this kind of unmixing, but lack a general framework and interpretability of the results. In this paper, we propose an interpretable methodological framework for low-rank Multi-feature hyperspectral unmixing based on tensor decomposition (MultiHU-TD) which incorporates the abundance sum-to-one constraint in the Alternating optimization ADMM algorithm, and provide in-depth mathematical, physical and graphical interpretation and connections with the extended linear mixing model. As additional features, we propose to incorporate mathematical morphology and reframe a previous work on neighborhood patches within MultiHU-TD. Experiments on real hyperspectral images showcase the interpretability of the model and the analysis of the results.
... In [9] authors have applied ADMM to a task of finding a CP-decomposition that satisfied particular constraints like non-negativity, sparsity, etc.. They called this approach AO-ADMM, since one of the objectives was an Alternating Optimization(in particular, Alternating Least Squares) that is used to find CP factors. ...
... Finding a decomposition that produces less error after quantization can be formulated as a constrained tensor factorization problem. A method called AO-ADMM(a hybrid of the Alternating Optimization and the Alternating Direction Method of Multipliers) has been shown [9] to successfully solve problems that involve non-negativity, sparsity or simplex constraints. In our work we extend the field of its application by introducing a constraint function that ensures low quantization error of the derived factors and construct a corresponding algorithm. ...
... Overall, the ADMM procedure for the factor, B, is defined in Algorithm 2. For X ≈ AB T matrix factorization G is a Gram matrix of the factor that is fixed on current iteration of Alternating Least Squares and K is an original matrix multiplied by a fixed factor (A T A and X T A from (13)). We adopt expressions for r, s and ρ from [9]. Similarly, the task is solved for the remaining factor through minimizing ||X −Ã T B T || 2 F over set Q. ...
Preprint
Full-text available
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.
... Then Pock and Sabach [10] introduced an inertial variant of PALM (iPALM). Huang et al. [11] introduced a primal-dual algorithm AO-ADMM, which is a hybrid between alternating optimization and ADMM. There are also some other first-order type algorithms [12,13] and (quasi-)second-order methods [14,15]. ...
... In this paper, instead of using vanilla stochastic gradient (11) directly, we employ the variance-reduced stochastic gradient estimators, such as SAGA [58] and SARAH [59]. SAGA is unbiased, while SARAH is biased, but both are computed based on (11). Despite the potential computational overhead and storage requirements associated with variance-reduced stochastic gradient methods, they offer the advantage of not necessitating tuning of the stepsize, unlike their vanilla stochastic gradient counterparts. ...
Article
Full-text available
The majority of classic tensor CP decomposition models are designed for squared loss, utilizing Euclidean distance as a local proximal term. However, the Euclidean distance is unsuitable for the generalized loss function applicable to diverse types of real-world data, such as integer and binary data. Consequently, algorithms developed under the squared loss are not easily adaptable to handle these generalized losses, partially due to the absence of the gradient Lipschitz continuity. This paper explores generalized tensor CP decomposition, employing the Bregman distance as the proximal term and introducing an inertial accelerated block randomized stochastic mirror descent algorithm (iTableSMD). Within a broader multi-block variance reduction and inertial acceleration framework, we demonstrate the sublinear convergence rate for the subsequential sequence produced by the iTableSMD algorithm. We further show that iTableSMD requires at most O(ε2)\mathcal {O}(\varepsilon ^{-2}) iterations in expectation to attain an ε\varepsilon -stationary point and establish the global convergence of the sequence. Numerical experiments on real datasets demonstrate that our proposed algorithm is efficient and achieves better performance than the existing state-of-the-art methods.
... The algorithm iterates between the two steps (Equations 2 and 3) until convergence is reached. AO-ADMM Recently, a hybrid algorithmic framework, AO-ADMM [19], was proposed for constrained CP factorization based on alternating optimization (AO) and the alternating direction method of multipliers (ADMM). Under this approach, each factor matrix is updated iteratively using ADMM while the other factors are fixed. ...
... We use the AO-ADMM approach [19] to compute H, V , and W , where S k = diag(W (k, :)) and W ∈ R K×R . Each factor matrix update is converted to a constrained matrix factorization problem by performing the mode-n matricization of Y (n) in Equation 7. As the updates for H, W , and V take on similar forms, we will illustrate the steps for updating W . Thus, the equivalent objective for W using the 3rd mode matricization of Y (Y (3) ∈ R K×RJ ) is: ...
Preprint
PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a {\it CO}nstrained {\it PA}RAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36 times faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert.
... Computation of ∂A ∂X is fairly straight word using Pytorch. More details on implementation of combining gradients from implicit differentiation in Jax [4,32] and gradient from Pytorch computation is detailed in Section A. Once we precompute the "concept bank" for base classes using equation 5 and 6, we fixed the Q and allows only to compute optimal coefficients P (x) for any input x using NNLS ...
... For τ function, it corresponds to randomly choosing 10 cropped patches of size 18 × 18 on CIFAR-100 and miniImageNet and patch size of 64 × 64 for CUB-200. We didn't use the scikit-learn implementation [55] of NMF, we leverage the work of [19,32], which uses Jax [4,55] implementation of ADMM [5] using Jaxopt library. We convert the Jax array to tensor array to be able to combine with the tensor array obtained from Pytorch on the gradient Training Details: Data augmentation strategies like random crop, horizontal flip, rotation, brightness variation, cutout, resizing, flipping and color jittering were all applied following the recent works [56,68,80]. ...
Preprint
Full-text available
Few Shot Class Incremental Learning (FSCIL) with few examples per class for each incremental session is the realistic setting of continual learning since obtaining large number of annotated samples is not feasible and cost effective. We present the framework MASIL as a step towards learning the maximal separable classifier. It addresses the common problem i.e forgetting of old classes and over-fitting to novel classes by learning the classifier weights to be maximally separable between classes forming a simplex Equiangular Tight Frame. We propose the idea of concept factorization explaining the collapsed features for base session classes in terms of concept basis and use these to induce classifier simplex for few shot classes. We further adds fine tuning to reduce any error occurred during factorization and train the classifier jointly on base and novel classes without retaining any base class samples in memory. Experimental results on miniImageNet, CIFAR-100 and CUB-200 demonstrate that MASIL outperforms all the benchmarks.
... To be specific, the spatial smoothing MU-SIC method incorporates the ideas of spatial smoothing and coprime subarray decomposition, while the coarray ESPRIT method applies spatial smoothing to the secondorder coarray statistics for decorrelated DOA estimation. The RMSE of DOA estimation defined in (47) is used as the evaluation metric, where the real and imaginary parts of β are both randomly generated following the zero-mean unit-variance Gaussian distribution for each trial. ...
... represents the singular value matrix, where η l denotes the singular value of X, l ∈ 1, 2, · · · , min(X 1 , X 2 ) , diag(·) forms a diagonal matrix from its arguments, min(·) and max(·) respectively represent the minimum and maximum operators. Note that, the convergence of (51) can be accelerated by adaptively increasing the penalty constant ρ [31,47]. In our simulations, we iteratively increase ρ by multiplying it with a positive constant u > 1, i.e., ρ (nA+1) = uρ (nA) . ...
Article
Full-text available
Sparse array direction-of-arrival (DOA) estimation using tensor model has been developed to handle multi-dimensional sub-Nyquist sampled signals. Furthermore, augmented virtual arrays can be derived for Nyquist-matched coarray tensor processing. However, the partially augmentable sparse array corresponds to a discontinuous virtual array, whereas the existing methods can only utilize its continuous part. Conventional virtual linear array interpolation techniques complete coarray covariance matrices with dispersed missing elements, but fail to complete the coarray tensor with whole missing slices. In this paper, we propose a coarray tensor completion algorithm for two-dimensional DOA estimation, where the coarray tensor statistics can be entirely exploited. In particular, in order to impose an effective low-rank regularization on the slice-missing coarray tensor, we propose shift dimensional augmenting and coarray tensor reshaping approaches to reformulate a structured coarray tensor with sufficiently dispersed missing elements. Furthermore, the shape of the reformulated coarray tensor is optimized by maximizing the dispersion-to-percentage ratio of missing elements. As such, a coarray tensor nuclear norm minimization problem can be designed to optimize the completed coarray tensor corresponding to a filled virtual array, based on which the closed-form DOA estimation is achieved. Meanwhile, the global convergence of the coarray tensor completion is theoretically proved. Simulation results demonstrate the effectiveness of the proposed algorithm compared to other matrix-based and tensor-based methods.
... The work [23] established sequence convergence for general alternating minimization algorithms with an additional proximal term and a boundedness assumption on the iterates. When specified to NMF, as pointed out in [29], with the aid of an additional proximal term as well as an additional constraint bounding the factors, the sequence convergence of ANLS and HALS can be established from [23], [30]. Although the convergence of these algorithms are observed without the proximal term and bounded constraint (which are indeed not used in practice), these are in general necessary to formally show the convergence of the algorithms. ...
... We emphasize that the specific structure within (3) enables Theorem 3 to get rid of both the assumption on the boundedness of the iterates {(U k , V k )} k≥0 and the requirement of an additional proximal term, which are usually required for convergence analysis though are not necessary in practice [23], [32]. For example, the previous work [29] provides convergence guarantee for the standard ANLS when used to solve the general nonsymmetric NMF (1) by adding an additional proximal term as well as an additional constraint to force the factors bounded. To establish the convergence for the standard HALS for solving (1) [15], [22], one needs the assumption that every column in (U k , V k ) is away from zero through all iterations. ...
Preprint
Full-text available
The symmetric Nonnegative Matrix Factorization (NMF), a special but important class of the general NMF, has found numerous applications in data analysis such as various clustering tasks. Unfortunately, designing fast algorithms for the symmetric NMF is not as easy as for its nonsymmetric counterpart, since the latter admits the splitting property that allows state-of-the-art alternating-type algorithms. To overcome this issue, we first split the decision variable and transform the symmetric NMF to a penalized nonsymmetric one, paving the way for designing efficient alternating-type algorithms. We then show that solving the penalized nonsymmetric reformulation returns a solution to the original symmetric NMF. Moreover, we design a family of alternating-type algorithms and show that they all admit strong convergence guarantee: the generated sequence of iterates is convergent and converges at least sublinearly to a critical point of the original symmetric NMF. Finally, we conduct experiments on both synthetic data and real image clustering to support our theoretical results and demonstrate the performance of the alternating-type algorithms.
... We solve this optimization problem using an Alternating Optimization -Alternating Direction Method of Multipliers (AO-ADMM)-based algorithm [21,20]. This algorithmic framework is highly flexible, allowing for the straightforward inclusion of various types of regularization in all modes to accommodate different application requirements. ...
Preprint
Multiway datasets are commonly analyzed using unsupervised matrix and tensor factorization methods to reveal underlying patterns. Frequently, such datasets include timestamps and could correspond to, for example, health-related measurements of subjects collected over time. The temporal dimension is inherently different from the other dimensions, requiring methods that account for its intrinsic properties. Linear Dynamical Systems (LDS) are specifically designed to capture sequential dependencies in the observed data. In this work, we bridge the gap between tensor factorizations and dynamical modeling by exploring the relationship between LDS, Coupled Matrix Factorizations (CMF) and the PARAFAC2 model. We propose a time-aware coupled factorization model called d(ynamical)CMF that constrains the temporal evolution of the latent factors to adhere to a specific LDS structure. Using synthetic datasets, we compare the performance of dCMF with PARAFAC2 and t(emporal)PARAFAC2 which incorporates temporal smoothness. Our results show that dCMF and PARAFAC2-based approaches perform similarly when capturing smoothly evolving patterns that adhere to the PARAFAC2 structure. However, dCMF outperforms alternatives when the patterns evolve smoothly but deviate from the PARAFAC2 structure. Furthermore, we demonstrate that the proposed dCMF method enables to capture more complex dynamics when additional prior information about the temporal evolution is incorporated.
... Furthermore, ADMM has been applied to generalized versions of NMF where the objective function is the general beta-divergence [26]. A hybrid alternating optimization and ADMM method was proposed for NMF, as well as tensor factorization, under a variety of constraints and loss measures in [27]. However, despite the promising numerical results, none of the works discussed above has rigorous theoretical justification for SymNMF. ...
Preprint
Symmetric nonnegative matrix factorization (SymNMF) has important applications in data analytics problems such as document clustering, community detection and image segmentation. In this paper, we propose a novel nonconvex variable splitting method for solving SymNMF. The proposed algorithm is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the nonconvex SymNMF problem. Furthermore, it achieves a global sublinear convergence rate. We also show that the algorithm can be efficiently implemented in parallel. Further, sufficient conditions are provided which guarantee the global and local optimality of the obtained solutions. Extensive numerical results performed on both synthetic and real data sets suggest that the proposed algorithm converges quickly to a local minimum solution.
... Both Problems (16) and (17) are linearly constrained quadratic programs, and can be solved to optimality by many standard solvers. Here, we propose to employ the Alternating Direction Method of Multipliers (ADMM) to solve these two sub-problems because of its flexibility and effectiveness in handling large-scale tensor decomposition [30], [31]. Details of the ADMM algorithm for solving Problems (16)- (17) can be found in the Appendix B. The whole procedure is listed in Algorithm 1. ...
Preprint
Estimating the joint probability mass function (PMF) of a set of random variables lies at the heart of statistical learning and signal processing. Without structural assumptions, such as modeling the variables as a Markov chain, tree, or other graphical model, joint PMF estimation is often considered mission impossible - the number of unknowns grows exponentially with the number of variables. But who gives us the structural model? Is there a generic, `non-parametric' way to control joint PMF complexity without relying on a priori structural assumptions regarding the underlying probability model? Is it possible to discover the operational structure without biasing the analysis up front? What if we only observe random subsets of the variables, can we still reliably estimate the joint PMF of all? This paper shows, perhaps surprisingly, that if the joint PMF of any three variables can be estimated, then the joint PMF of all the variables can be provably recovered under relatively mild conditions. The result is reminiscent of Kolmogorov's extension theorem - consistent specification of lower-dimensional distributions induces a unique probability measure for the entire process. The difference is that for processes of limited complexity (rank of the high-dimensional PMF) it is possible to obtain complete characterization from only three-dimensional distributions. In fact not all three-dimensional PMFs are needed; and under more stringent conditions even two-dimensional will do. Exploiting multilinear algebra, this paper proves that such higher-dimensional PMF completion can be guaranteed - several pertinent identifiability results are derived. It also provides a practical and efficient algorithm to carry out the recovery task. Judiciously designed simulations and real-data experiments on movie recommendation and data classification are presented to showcase the effectiveness of the approach.
... where X (1) is a matrix unfolding of the tensor X. There are three matrix unfoldings for this three-way tensor that admit similar model expressions (because one can permute modes and A, B, C accordingly) Like NMF case, PARAFAC is NP-hard in general, but there exist many algorithms offering good performance and flexibility to incorporate constraints, e.g., [31], [32]. Our work brings these factor analysis models together with a variety of clustering tools ranging from K-means to K-subspace [21], [33] clustering to devise novel joint factorization and latent clustering formulations and companion algorithms that outperform the prior art -including two-step and joint approaches, such as RKM and FKM. ...
Preprint
Dimensionality reduction techniques play an essential role in data analytics, signal processing and machine learning. Dimensionality reduction is usually performed in a preprocessing stage that is separate from subsequent data analysis, such as clustering or classification. Finding reduced-dimension representations that are well-suited for the intended task is more appealing. This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data. The proposed approach leverages matrix and tensor factorization models that produce essentially unique latent representations of the data to unravel latent cluster structure -- which is otherwise obscured because of the freedom to apply an oblique transformation in latent space. At the same time, latent cluster structure is used as prior information to enhance the performance of factorization. Specific contributions include several custom-built problem formulations, corresponding algorithms, and discussion of associated convergence properties. Besides extensive simulations, real-world datasets such as Reuters document data and MNIST image data are also employed to showcase the effectiveness of the proposed approaches.
... . The problem (7) is then amenable to the alternating direction method of multipliers (ADMM), a successful approach for tensor decomposition with additional assumptions [34], [41]. We thus obtainλ,Q, andQ ′ via ADMM, which yields our estimateQ, from which we deriveR and thenP. ...
Preprint
Full-text available
This work presents a low-rank tensor model for multi-dimensional Markov chains. A common approach to simplify the dynamical behavior of a Markov chain is to impose low-rankness on the transition probability matrix. Inspired by the success of these matrix techniques, we present low-rank tensors for representing transition probabilities on multi-dimensional state spaces. Through tensor decomposition, we provide a connection between our method and classical probabilistic models. Moreover, our proposed model yields a parsimonious representation with fewer parameters than matrix-based approaches. Unlike these methods, which impose low-rankness uniformly across all states, our tensor method accounts for the multi-dimensionality of the state space. We also propose an optimization-based approach to estimate a Markov model as a low-rank tensor. Our optimization problem can be solved by the alternating direction method of multipliers (ADMM), which enjoys convergence to a stationary solution. We empirically demonstrate that our tensor model estimates Markov chains more efficiently than conventional techniques, requiring both fewer samples and parameters. We perform numerical simulations for both a synthetic low-rank Markov chain and a real-world example with New York City taxi data, showcasing the advantages of multi-dimensionality for modeling state spaces.
... In addressing infeasibility issues within AO and SCA algorithms, common strategies include feasibility checks, warm-starts, and scaling methods[45]. Scaling adjusts constraints to handle poorly scaled optimization problems, while warm-starts utilize previous feasible solutions for quicker convergence[46,47]. ...
Article
Full-text available
Reflecting intelligent surface (RIS) is one of the key enabling technologies for beyond fifth generation wireless networks to further improve coverage, spectral‐ and energy‐efficiency of wireless networks. Recently, a novel simultaneous transmission and reflection RIS (STAR‐RIS) technology is introduced which enables more degrees‐of‐freedom in simultaneously serving users in reflection and transmission regions of RIS. This paper investigates a STAR‐RIS assisted two‐user full‐duplex communication system, wherein multi‐antenna base station (BS) and single‐antenna users are subject to maximum power constraints. Firstly, weighted sum rates are computed for three well‐known STAR‐RIS protocols, e.g. energy splitting, mode selection, and time splitting. Then, for each of the cases, weighted sum rate optimization problem is investigated to find optimal resource allocations, i.e. base station's precoding and combining matrices, coefficients of STAR‐RIS, and power allocations. The optimization problem is transferred into multiple convex sub‐problems using equivalent weighted minimum mean‐square‐error forms. Also, by use of the successive convex approximation method, tunable parameters of the STAR‐RIS are optimized. Although the original problem is a non‐convex one, proposed iterative alternating technique achieves an acceptable sub‐optimal performance. Finally, performance of the system is analysed and compared to some baselines to highlight superiority of the STAR‐RIS compared to the conventional RIS schemes.
... For the first experiment, we ran ZIPTF without consensus aggregation to evaluate the advantages of using the ZIP model alone. In this comparative analysis, we initially compare our method with the traditional non-Bayesian tensor method, Non-Negative CP decomposition via Alternating-Least Squares (NNCP-ALS) along with Sparse PARAFAC, which efficiently handles sparsity with L2 regularization [10,11,44]. Afterward, we transition to Bayesian methods, starting with the Bayesian Tensor Factorization using a Truncated Gaussian Model (TGTF) [12]. ...
Article
Full-text available
In the past two decades, genomics has advanced significantly, with single-cell RNA-sequencing (scRNA-seq) marking a pivotal milestone. ScRNA-seq provides unparalleled insights into cellular diversity and has spurred diverse studies across multiple conditions and samples, resulting in an influx of complex multidimensional genomics data. This highlights the need for robust methodologies capable of handling the complexity and multidimensionality of such genomics data. Furthermore, single-cell data grapples with sparsity due to issues like low capture efficiency and dropout effects. Tensor factorizations (TF) have emerged as powerful tools to unravel the complex patterns from multi-dimensional genomics data. Classic TF methods, based on maximum likelihood estimation, struggle with zero-inflated count data, while the inherent stochasticity in TFs further complicates result interpretation and reproducibility. Our paper introduces Zero Inflated Poisson Tensor Factorization (ZIPTF), a novel method for high-dimensional zero-inflated count data factorization. We also present Consensus-ZIPTF (C-ZIPTF), merging ZIPTF with a consensus-based approach to address stochasticity. We evaluate our proposed methods on synthetic zero-inflated count data, simulated scRNA-seq data, and real multi-sample multi-condition scRNA-seq datasets. ZIPTF consistently outperforms baseline matrix and tensor factorization methods, displaying enhanced reconstruction accuracy for zero-inflated data. When dealing with high probabilities of excess zeros, ZIPTF achieves up to 2.4×2.4×2.4\times better accuracy. Moreover, C-ZIPTF notably enhances the factorization’s consistency. When tested on synthetic and real scRNA-seq data, ZIPTF and C-ZIPTF consistently uncover known and biologically meaningful gene expression programs. Access our data and code at: https://github.com/klarman-cell-observatory/scBTF and https://github.com/klarman-cell-observatory/scbtf_experiments.
... Bearing this concept in mind, the final bilinear model is obtained through the smart combination of multiple factorizations issued from the analysis of the complete submultisets forming the original incomplete structure. The multiple factorization concept has also been used in other contexts, such as in Combined Tensor and Matrix Factorization (CMTF) methods, another example where the pieces of information to be connected, i.e., matrices and tensors, do not have an equivalent data configuration [24,25]. ...
... The question of how to solve (1.3) has been studied extensively in different communities. These include, for instance, multiplicative updates [26], hierarchical alternating least-squares [51], alternating direction method of multipliers [18] related to non-negative matrix factorization, or more general interior-point methods [48] for quadratic programs; see, e.g., [5], [22,Section 5.6], and [10,Chapter 4] for overviews. Extending vanilla alternating non-negative strategies, further acceleration and extrapolation methods are developed in order to improve (empirical) convergence speed for alternating non-negative matrix and tensor factorization; see, e.g., [47,Section 3.4] as well as [29,31] for some recent works. ...
... The factorization then approximates the auxiliary tensor, while the auxiliary tensor is fitted to the data tensor at the known entries. This strategy has first been proposed in the AO-ADMM framework [33] for constrained CP models. It can be considered as a variation of the EM approach described above where the missing entries are imputed in each inner ADMM iteration instead of after one full outer AO iteration as in Algorithm 3. The approach has been extended to PARAFAC2 models with missing entries in REPAIR [24], which addresses the additional problem of erroneous entries alongside missing data. ...
Preprint
Tensor factorizations have been widely used for the task of uncovering patterns in various domains. Often, the input is time-evolving, shifting the goal to tracking the evolution of underlying patterns instead. To adapt to this more complex setting, existing methods incorporate temporal regularization but they either have overly constrained structural requirements or lack uniqueness which is crucial for interpretation. In this paper, in order to capture the underlying evolving patterns, we introduce t(emporal)PARAFAC2 which utilizes temporal smoothness regularization on the evolving factors. We propose an algorithmic framework that employs Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM) to fit the model. Furthermore, we extend the algorithmic framework to the case of partially observed data. Our numerical experiments on both simulated and real datasets demonstrate the effectiveness of the temporal smoothness regularization, in particular, in the case of data with missing entries. We also provide an extensive comparison of different approaches for handling missing data within the proposed framework.
... This is an interesting but challenging modification of the reconstruction which can be computationally expensive. Finally, another possible improvement would be to use an AO-ADMM solver (Huang et al., 2016;Roald et al., 2022), which is known to increase the stability of tensor decomposition (Becker et al., 2023). ...
Article
Full-text available
Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records. However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.
... The BCD-based NCP algorithm is favourable for decomposing large-scale tensor data, especially those contaminated by considerable noise. To efficiently solve the con-strained tensor decomposition, numerous optimization methods have been developed, such as the alternating proximal gradient (APG) [5]- [7], alternating nonnegative quadratic programming (ANQP) [8], [9] and alternating optimization-based alternating direction method of multipliers (AO-ADMM) [10]. ...
Article
Nonnegative CANDECOMP/PARAFAC (NCP) tensor decomposition is a powerful tool for multiway signal processing. The optimization algorithm alternating direction method of multipliers (ADMM) has become increasingly popular for solving tensor decomposition problems in the block coordinate descent framework. However, the ADMM-based NCP suffers from rank deficiency and slow convergence for some large-scale and highly sparse tensor data. The proximal algorithm is preferred to enhance optimization algorithms and improve convergence properties. In this study, we propose a novel NCP algorithm using the alternating direction proximal method of multipliers (ADPMM) that consists of the proximal algorithm. The proposed NCP algorithm can guarantee convergence and overcome the rank deficiency. Moreover, we implement the proposed NCP using an inexact scheme that alternatively optimizes the subproblems. Each subproblem is optimized by a finite number of inner iterations yielding fast computation speed. Our NCP algorithm is a hybrid of alternating optimization and ADPMM and is named A²DPMM. The experimental results on synthetic and real-world tensors demonstrate the effectiveness and efficiency of our proposed algorithm.
... We solve the problem in (2) using the AO-ADMM framework, first introduced for matrix and tensor decompositions in [19] and extended for PARAFAC2 in [18]. Contrary to the standard Alternating Least Squares -based algorithm for PARAFAC2 [17], AO-ADMM makes it possible to impose proximable constraints on any factor, including those subject to the PARAFAC2 constraint (1). ...
Preprint
Time-evolving data sets can often be arranged as a higher-order tensor with one of the modes being the time mode. While tensor factorizations have been successfully used to capture the underlying patterns in such higher-order data sets, the temporal aspect is often ignored, allowing for the reordering of time points. In recent studies, temporal regularizers are incorporated in the time mode to tackle this issue. Nevertheless, existing approaches still do not allow underlying patterns to change in time (e.g., spatial changes in the brain, contextual changes in topics). In this paper, we propose temporal PARAFAC2 (tPARAFAC2): a PARAFAC2-based tensor factorization method with temporal regularization to extract gradually evolving patterns from temporal data. Through extensive experiments on synthetic data, we demonstrate that tPARAFAC2 can capture the underlying evolving patterns accurately performing better than PARAFAC2 and coupled matrix factorization with temporal smoothness regularization.
... Because tensor completion is an ill-posed problem, we consider the prior structure in the target tensor to narrow down the solution set. The completion value should be appropriate as per the properties of the analyzed data, and the prior structure includes smoothness [15], [53], [48], nonnegativity [17], [43], [39], sparsity [23], low-rank [14], [24], etc. ...
Article
Full-text available
Tensor completion is the problem of filling-in missing parts of multidimensional data using the values of the reference elements. Recently, Multiway Delay-embedding Transform (MDT), which considers a low-dimensional space in a delay-embedded space with high expressive capability, has attracted attention as a tensor completion method. Although MDT has a high complementary performance, its computational cost is considerably high. Therefore, we propose a new model called smooth convolutional tensor factorization (SCTF), for tensor completion based on a delay-embedded space. The proposed method is small in computational complexity because of its concise model of rank-1 decomposition in the delay-embedded space, and because it does not directly perform optimization in the delay-embedded space. In addition, a smooth constraint term is assigned to the factor tensors as a prior data structure in the optimization to improve the completion accuracy further. In our experiments, we completed clipped and random missing image data and confirmed that the proposed method achieved high completion accuracy without high computational cost.
... The framework accommodates linear couplings with (multiple) matrix-or CP-decompositions ( Fig. 1 and 2), and a variety of possible constraints and regularizations on all modes. Our algorithmic approach builds onto the AO-ADMM algorithm [19] for constrained PARAFAC2, which allows for any proximal constraint in any mode, and the flexible framework for CP-based CMTF [20,21]. Using numerical experiments, we demonstrate the flexibility and accuracy of the proposed approach with different constraints and linear couplings. ...
... By exploring the block proximal update with Nesterov-based acceleration [14] or block prox-linear update with extrapolation [13], fast algorithms were developed. Furthermore, a hybrid scheme [15] combining alternating direction method of multipliers (ADMM) and alternating optimization (AO), and a framework [16] integrating randomized BCD and stochastic proximal gradient (SPG), were proposed to enable the nonnegative factor matrix learning with high computational speed and low memory requirement. ...
Article
Full-text available
Recently, Bayesian modeling and variational inference (VI) were leveraged to enable the nonnegative factor matrix learning with automatic rank determination in tensor canonical polyadic decomposition (CPD), which has found various applications in big data analytics. However, since VI inherently performs block coordinate descent (BCD) steps over the functional space, it generally does not allow integration with modern large-scale optimization methods, making the scalability a critical issue. In this paper, it is revealed that the expectations of the variables updated by the VI algorithm is equivalent to the block minimization steps of a deterministic optimization problem. This equivalence further enables the adoption of inexact BCD method for devising a fast nonnegative factor matrix learning algorithm with automatic tensor rank determination. Numerical results using synthetic data and real-world applications show that the performance of the proposed algorithm is comparable with that of the VI-based algorithm, but with computation times reduced significantly.
Preprint
Full-text available
Despite the fundamental importance of clustering, to this day, much of the relevant research is still based on ambiguous foundations, leading to an unclear understanding of whether or how the various clustering methods are connected with each other. In this work, we provide an additional stepping stone towards resolving such ambiguities by presenting a general clustering framework that subsumes a series of seemingly disparate clustering methods, including various methods belonging to the wildly popular spectral clustering framework. In fact, the generality of the proposed framework is additionally capable of shedding light to the largely unexplored area of multi-view graphs whose each view may have differently clustered nodes. In turn, we propose GenClus: a method that is simultaneously an instance of this framework and a generalization of spectral clustering, while also being closely related to k-means as well. This results in a principled alternative to the few existing methods studying this special type of multi-view graphs. Then, we conduct in-depth experiments, which demonstrate that GenClus is more computationally efficient than existing methods, while also attaining similar or better clustering performance. Lastly, a qualitative real-world case-study further demonstrates the ability of GenClus to produce meaningful clusterings.
Preprint
Many contemporary signal processing, machine learning and wireless communication applications can be formulated as nonconvex nonsmooth optimization problems. Often there is a lack of efficient algorithms for these problems, especially when the optimization variables are nonlinearly coupled in some nonconvex constraints. In this work, we propose an algorithm named penalty dual decomposition (PDD) for these difficult problems and discuss its various applications. The PDD is a double-loop iterative algorithm. Its inner iterations is used to inexactly solve a nonconvex nonsmooth augmented Lagrangian problem via block-coordinate-descenttype methods, while its outer iteration updates the dual variables and/or a penalty parameter. In Part I of this work, we describe the PDD algorithm and rigorously establish its convergence to KKT solutions. In Part II we evaluate the performance of PDD by customizing it to three applications arising from signal processing and wireless communications.
Article
Tensor ring (TR) decomposition demonstrates superior performance in handling high-order tensors. However, traditional TR-based decomposition algorithms face limitations in real-world applications due to large data sizes, missing entries, and outlier corruption. To address these challenges, we propose a scalable and robust TR decomposition algorithm for large-scale tensor data that effectively handles missing entries and gross corruptions. Our method introduces a novel auto-weighted scaled steepest descent approach that adaptively identifies outliers and completes missing entries during decomposition. Additionally, leveraging the tensor ring decomposition model, we develop a Fast Gram Matrix Computation (FGMC) technique and a Randomized Subtensor Sketching (RStS) strategy, significantly reducing storage and computational complexity. Experimental results demonstrate that the proposed method outperforms existing TR decomposition and tensor completion methods.
Article
The problem of nonconvex and nonsmooth optimization (NNO) has been extensively studied in the machine learning community, leading to the development of numerous fast and convergent numerical algorithms. Existing algorithms typically employ unified iteration schemes and require explicit solutions to subproblems for ensuring convergence. However, these inflexible iteration schemes overlook task-specific details and may encounter difficulties in providing explicit solutions to subproblems. In contrast, there is evidence suggesting that practical applications can benefit from approximately solving subproblems; however, many existing works fail to establish the theoretical validity of such approximations. In this paper, the authors propose a hybrid inexact proximal alternating method (hiPAM), which addresses a general NNO problem with coupled terms while overcoming all aforementioned challenges. The proposed hiPAM algorithm offers a flexible yet highly efficient approach by seamlessly integrating any efficient methods for approximate subproblem solving that cater to specificities. Additionally, the authors have devised a simple yet implementable stopping criterion that generates a Cauchy sequence and ultimately converges to a critical point of the original NNO problem. The proposed numerical experiments using both simulated and real data have demonstrated that hiPAM represents an exceedingly efficient and robust approach to NNO problems.
Article
Tensor decomposition is an essential tool for multiway signal processing. At present, large-scale high-order tensor data require fast and efficient decomposing algorithms. In this paper, we propose accelerated regularized tensor decomposition algorithms using the alternating direction method of multipliers with multiple Nesterov’s extrapolations in the block coordinate descent framework. We implement the acceleration in three cases: only in the inner loop, only in the outer loop, and in both the inner and outer loops. Adaptive safeguard strategies are developed following the acceleration to guarantee monotonic convergence. Afterwards, we utilize the proposed algorithms to accelerate two types of conventional decomposition: nonnegative CANDECOMP/PARAFAC (NCP) and sparse CANDECOMP/PARAFAC (SCP). The experimental results on synthetic and real-world tensors demonstrate that the proposed algorithms achieve significant acceleration effects and outperform state-of-the-art algorithms. The accelerated algorithm with extrapolations in both the inner and outer loops has the fastest convergence speed and takes almost one-third of the running time of typical algorithms.
Conference Paper
In this paper, we study three recently proposed probability mass function (PMF) estimation methods for flow cytometry data analysis. By modeling the PMFs as a mixture of simpler distributions, we can reformulate the PMF estimation problem as three different tensor-based approaches: a least squares coupled tensor factorization approach, a least squares partially coupled tensor factorization approach, and a Kullback-Leibler divergence (KLD)-based expectation-maximization (EM) approach. In the coupled methods, the full PMF is estimated from lower-order empirical marginal distributions, while the EM approach estimates the full PMF directly from the observed data. The three approaches are evaluated in the context of simulated and real data experiments.
Chapter
Complex-valued shift-invariant canonical polyadic decomposition (CPD) under a spatial phase sparsity constraint (pcsCPD) showed satisfying separation performance of decomposing three-way multi-subject fMRI data into shared spatial maps (SMs), shared time courses (TCs), time delays and subject-specific intensities. However, pcsCPD exploits alternating least squares (ALS) updating rule, which converges slowly and requires data strictly conforming to the shift-invariant CPD model. As the lower rank approximation can relax the CPD model, we propose to improve pcsCPD with rank-R and rank-1 ALS to further relax shift-invariant CPD model. This proposed method firstly updates shared SMs and the aggregating mixing matrix which contains the information of shared TCs, time delays and subject-specific intensities using the rank-R ALS. The shared SMs then are second updated by exploiting the phase sparsity constraint. We further update the shared TCs, time delays and subject-specific intensities of each component by the rank-1 ALS on the matrix constructed by each column of the aggregating mixing matrix, for each iteration until convergence. Experiment results from simulated and experimental fMRI data demonstrate that the proposed method achieves better separation performance than pcsCPD and widely-used tensor-based spatial independent component analysis, suggesting the efficacy of relaxing the shift-invariant CPD modelling of multi-subject fMRI data.KeywordsCPDALSfMRIshift-invariancephase sparsity constraint
Chapter
In previous chapters, the probabilistic modeling and inference algorithm for the CPD with no constraint are discussed. In practical data analysis, one usually has additional prior structural information for the factor matrices, e.g., nonnegativeness and orthogonality. Encoding this structural information into the probabilistic tensor modeling while still achieving tractable inference remains a critical challenge. In this chapter, we introduce the development of Bayesian tensor CPD with nonnegative factors, with an integrated feature of automatic tensor rank learning. We will also connect the algorithm to the inexact block coordinate descent (BCD) to obtain a fast algorithm.
Article
Coupled matrix factorization (CMF) models jointly decompose a collection of matrices with one shared mode. For interpretable decompositions, constraints are often needed, and variations of constrained CMF models have been used in various fields, including data mining, chemometrics and remote sensing. Although such models are broadly used, there is a lack of easy-to-use, documented, and open-source implementations for fitting CMFs with user-specified constraints on all modes. We address this need with MatCoupLy, a Python package that implements a state-of-the-art algorithm for CMF and PARAFAC2 that supports any proximable constraint on any mode. This paper outlines the functionality of MatCoupLy, including three examples demonstrating the flexibility and extendibility of the package.
Article
Full-text available
Hematopoiesis is a progressive process collectively controlled by an elaborate network of transcription factors (TFs). Among these TFs, GATA2 has been implicated to be critical for regulating multiple steps of hematopoiesis in mouse models. However, whether similar function of GATA2 is conserved in human hematopoiesis, especially during early embryonic development stage, is largely unknown. To examine the role of GATA2 in human background, we generated homozygous GATA2 knockout human embryonic stem cells (GATA2 (-/-) hESCs) and analyzed their blood differentiation potential. Our results demonstrated that GATA2 (-/-) hESCs displayed attenuated generation of CD34(+)CD43(+) hematopoietic progenitor cells (HPCs), due to the impairment of endothelial to hematopoietic transition (EHT). Interestingly, GATA2 (-/-) hESCs retained the potential to generate erythroblasts and macrophages, but never granulocytes. We further identified that SPI1 downregulation was partially responsible for the defects of GATA2 (-/-) hESCs in generation of CD34(+)CD43(+) HPCs and granulocytes. Furthermore, we found that GATA2 (-/-) hESCs restored the granulocyte potential in the presence of Notch signaling. Our findings revealed the essential roles of GATA2 in EHT and granulocyte development through regulating SPI1, and uncovered a role of Notch signaling in granulocyte generation during hematopoiesis modeled by human ESCs.
Article
Full-text available
We consider factoring low-rank tensors in the presence of outlying slabs. This problem is important in practice, because data collected in many real-world applications, such as speech, fluorescence, and some social network data, fit this paradigm. Prior work tackles this problem by iteratively selecting a fixed number of slabs and fitting, a procedure which may not converge. We formulate this problem from a group-sparsity promoting point of view, and propose an alternating optimization framework to handle the corresponding p\ell_p (0<p10<p\leq 1) minimization-based low-rank tensor factorization problem. The proposed algorithm features a similar per-iteration complexity as the plain trilinear alternating least squares (TALS) algorithm. Convergence of the proposed algorithm is also easy to analyze under the framework of alternating optimization and its variants. In addition, regularization and constraints can be easily incorporated to make use of \emph{a priori} information on the latent loading factors. Simulations and real data experiments on blind speech separation, fluorescence data analysis, and social network mining are used to showcase the effectiveness of the proposed algorithm.
Conference Paper
Full-text available
Low-rank tensor decomposition has many applica-tions in signal processing and machine learning, and is becoming increasingly important for analyzing big data. A significant challenge is the computation of intermediate products which can be much larger than the final result of the computation, or even the original tensor. We propose a scheme that allows memory-efficient in-place updates of intermediate matrices. Motivated by recent advances in big tensor decomposition from multiple compressed replicas, we also consider the related problem of memory-efficient tensor compression. The resulting algorithms can be parallelized, and can exploit but do not require sparsity.
Conference Paper
Full-text available
Multi-dimensional arrays, or tensors, are increas-ingly found in fields such as signal processing and recommender systems. Real-world tensors can be enormous in size and often very sparse. There is a need for efficient, high-performance tools capable of processing the massive sparse tensors of today and the future. This paper introduces SPLATT, a C library with shared-memory parallelism for three-mode tensors. SPLATT contains algorithmic improvements over competing state of the art tools for sparse tensor factorization. SPLATT has a fast, parallel method of multiplying a matricized tensor by a Khatri-Rao product, which is a key kernel in tensor factorization methods. SPLATT uses a novel data structure that exploits the sparsity patterns of tensors. This data structure has a small memory footprint similar to competing methods and allows for the computational improvements featured in our work. We also present a method of finding cache-friendly reorderings and utilizing them with a novel form of cache tiling. To our knowledge, this is the first work to investigate reordering and cache tiling in this context. SPLATT averages almost 30× speedup compared to our baseline when using 16 threads and reaches over 80× speedup on NELL-2.
Article
Full-text available
This paper considers regularized block multi-convex optimization, where the feasible set and objective function are generally non-convex but convex in each block of variables. We review some of its interesting examples and propose a generalized block coordinate descent method. Under certain conditions, we show that any limit point satisfies the Nash equi-librium conditions. Furthermore, we establish its global convergence and estimate its asymptotic convergence rate by assuming a property based on the Kurdyka-Lojasiewicz inequality. The proposed algorithms are adapted for factorizing nonnegative matrices and tensors, as well as completing them from their incomplete observations. The algorithms were tested on synthetic data, hyperspectral data, as well as image sets from the CBCL and ORL databases. Compared to the existing state-of-the-art algorithms, the proposed algorithms demonstrate superior performance in both speed and solution quality. The Matlab code of nonnegative matrix/tensor decomposition and completion, along with a few demos, are accessible from the authors' homepages.
Conference Paper
Full-text available
In this paper we consider the dictionary learning problem for sparse representation. We first show that this problem is NP-hard and then propose an efficient dictionary learning scheme to solve several practical formulations of this problem. Unlike many existing algorithms in the literature, such as K-SVD, our proposed dictionary learning scheme is theoretically guar-anteed to converge to the set of stationary points under cer-tain mild assumptions. For the image denoising application, the performance and the efficiency of the proposed dictionary learning scheme are comparable to that of K-SVD algorithm in simulation.
Article
Full-text available
We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization. By exploiting properties of the Khatri-Rao product, we show how to efficiently address a computationally challenging sub-step of both algorithms. Our algorithm, DFacTo, only requires two sparse matrix-vector products and is easy to parallelize. DFacTo is not only scalable but also on average 4 to 10 times faster than competing algorithms on a variety of datasets. For instance, DFacTo only takes 480 seconds on 4 machines to perform one iteration of the ALS algorithm and 1,143 seconds to perform one iteration of the GD algorithm on a 6.5 million x 2.5 million x 1.5 million dimensional tensor with 1.2 billion non-zero entries.
Article
Full-text available
Article
Full-text available
Many data are modeled as tensors, or multi dimensional arrays. Examples include the predicates (subject, verb, object) in knowledge bases, hyperlinks and anchor texts in the Web graphs, sensor streams (time, location, and type), social networks over time, and DBLP conference-author-keyword relations. Tensor decomposition is an important data mining tool with various applications including clustering, trend detection, and anomaly detection. However, current tensor decomposition algorithms are not scalable for large tensors with billions of sizes and hundreds millions of nonzeros: the largest tensor in the literature remains thousands of sizes and hundreds thousands of nonzeros. Consider a knowledge base tensor consisting of about 26 million noun-phrases. The intermediate data explosion problem, associated with naive implementations of tensor decomposition algorithms, would require the materialization and the storage of a matrix whose largest dimension would be ≈7 x 10¹⁴; this amounts to ~10 Petabytes, or equivalently a few data centers worth of storage, thereby rendering the tensor analysis of this knowledge base, in the naive way, practically impossible. In this paper, we propose GIGATENSOR, a scalable distributed algorithm for large scale tensor decomposition. GIGATENSOR exploits the sparseness of the real world tensors, and avoids the intermediate data explosion problem by carefully redesigning the tensor decomposition algorithm. Extensive experiments show that our proposed GIGATENSOR solves 100 times bigger problems than existing methods. Furthermore, we employ GIGATENSOR in order to analyze a very large real world, knowledge base tensor and present our astounding findings which include discovery of potential synonyms among millions of noun-phrases (e.g. the noun 'pollutant' and the noun-phrase 'greenhouse gases').
Article
Full-text available
SUMMARY We generalize Kruskal's fundamental result on the uniqueness of trilinear decomposition of three-way arrays to the case of multilinear decomposition of four- and higher-way arrays. The result is surprisingly general and simple and has several interesting ramifications. Copyright © 2000 John Wiley & Sons, Ltd.
Article
Full-text available
The alternating direction method of multipliers (ADMM) has emerged as a powerful technique for large-scale structured optimization. Despite many recent results on the convergence properties of ADMM, a quantitative characterization of the impact of the algorithm parameters on the convergence times of the method is still lacking. In this paper we find the optimal algorithm parameters that minimize the convergence factor of the ADMM iterates in the context of l2-regularized minimization and constrained quadratic programming. Numerical examples show that our parameter selection rules significantly outperform existing alternatives in the literature.
Article
Full-text available
CANDECOMP/PARAFAC (CP) has found numer-ous applications in wide variety of areas such as in chemomet-rics, telecommunication, data mining, neuroscience, separated representations. For an order-N tensor, most CP algorithms can be computationally demanding due to computation of gradients which are related to products between tensor unfoldings and Khatri-Rao products of all factor matrices except one. These products have the largest workload in most CP algorithms. In this paper, we propose a fast method to deal with this issue. The method also reduces the extra memory requirements of CP algorithms. As a result, we can accelerate the standard alternating CP algorithms 20-30 times for order-5 and order-6 tensors, and even higher ratios can be obtained for higher order tensors (e.g., N ≥ 10). The proposed method is more efficient than the state-of-the-art ALS algorithm which operates two modes at a time (ALSo2) in the Eigenvector PLS toolbox, especially for tensors with order N ≥ 5 and high rank. Index Terms—CANDECOMP/PARAFAC (CP), tensor factor-ization, canonical decomposition, gradient, ALS
Article
Full-text available
Demixing problems in many areas such as hyperspectral imaging and differential optical absorption spectroscopy (DOAS) often require finding sparse nonnegative linear combinations of dictionary elements that match observed data. We show how aspects of these problems, such as misalignment of DOAS references and uncertainty in hyperspectral endmembers, can be modeled by expanding the dictionary with grouped elements and imposing a structured sparsity assumption that the combinations within each group should be sparse or even 1-sparse. If the dictionary is highly coherent, it is difficult to obtain good solutions using convex or greedy methods, such as non-negative least squares (NNLS) or orthogonal matching pursuit. We use penalties related to the Hoyer measure, which is the ratio of the l1l_1 and l2l_2 norms, as sparsity penalties to be added to the objective in NNLS-type models. For solving the resulting nonconvex models, we propose a scaled gradient projection algorithm that requires solving a sequence of strongly convex quadratic programs. We discuss its close connections to convex splitting methods and difference of convex programming. We also present promising numerical results for example DOAS analysis and hyperspectral demixing problems.
Article
Full-text available
In this paper a modification of the standard algorithm for non-negativity-constrained linear least squares regression is proposed. The algorithm is specifically designed for use in multiway decomposition methods such as PARAFAC and N-mode principal component analysis. In those methods the typical situation is that there is a high ratio between the numbers of objects and variables in the regression problems solved. Furthermore, very similar regression problems are solved many times during the iterative procedures used. The algorithm proposed is based on the defacto standard algorithm NNLS by Lawson and Hanson, but modified to take advantage of the special characteristics of iterative algorithms involving repeated use of non-negativity constraints. The principle behind the NNLS algorithm is described in detail and a comparison is made between this standard algorithm and the new algorithm called FNNLS (fast NNLS)
Article
Full-text available
We introduce an efficient algorithm for computing a low-rank non-negative CANDECOMP/PARAFAC (NNCP) decomposition. In text mining, signal processing, and computer vision among other areas, imposing non-negativity constraints to low-rank factors has been shown an effective tech-nique providing physically meaningful interpretation. A principled method-ology for computing NNCP is alternating nonnegative least squares, in which nonnegativity-constrained least squares (NNLS) problems are solved in each iteration. In this chapter, we propose to solve the NNLS problems using the block principal pivoting method. The block principal pivoting method overcomes some difficulties of the classical active method for NNLS prob-lems with large variables. We introduce techniques to accelerate the block principal pivoting method for multiple right-hand sides, which is typical in NNCP computation. Computational experiments show the state-of-the-art performance of the proposed method.
Conference Paper
Full-text available
In this paper we propose an algorithm to estimate missing values in tensors of visual data. The values can be missing due to problems in the acquisition process, or because the user manually identified unwanted outliers. Our algorithm works even with a small amount of samples and it can propagate structure to fill larger missing regions. Our methodology is built on recent studies about matrix completion using the matrix trace norm. The contribution of our paper is to extend the matrix case to the tensor case by laying out the theoretical foundations and then by building a working algorithm. First, we propose a definition for the tensor trace norm, that generalizes the established definition of the matrix trace norm. Second, similar to matrix completion, the tensor completion is formulated as a convex optimization problem. Unfortunately, the straightforward problem extension is significantly harder to solve than the matrix case because of the dependency among multiple constraints. To tackle this problem, we employ a relaxation technique to separate the dependant relationships and use the block coordinate descent (BCD) method to achieve a globally optimal solution. Our experiments show potential applications of our algorithm and the quantitative evaluation indicates that our method is more accurate and robust than heuristic approaches.
Article
Full-text available
A multitude of algorithms have been developed to fit a trilinear PARAFAC model to a three-way array. Limits and advantages of some of the available methods (i.e. GRAM-DTLD, PARAFAC-ALS, ASD, SWATLD, PMF3 and dGN) are compared. The algorithms are explained in general terms together with two approaches to accelerate them: line search and compression. In order to compare the different methods, 720 sets of artificial data were generated with varying level and type of noise, collinearity of the factors and rank. Two PARAFAC models were fitted on each data set: the first having the correct number of factors F and the second with F+1 components (the objective being to assess the sensitivity of the different approaches to the over-factoring problem, i.e. when the number of extracted components exceeds the rank of the array). The algorithms have also been tested on two real data sets of fluorescence measurements, again by extracting both the right and an exceeding number of factors. The evaluations are based on: number of iterations necessary to reach convergence, time consumption, quality of the solution and amount of resources required for the calculations (primarily memory).
Article
Full-text available
This communication describes a free toolbox for MATLAB® for analysis of multiway data. The toolbox is called “The N-way Toolbox for MATLAB” and is available on the internet at http://www.models.kvl.dk/source/. This communication is by no means an attempt to summarize or review the extensive work done in multiway data analysis but is intended solely for informing the reader of the existence, functionality, and applicability of the N-way Toolbox for MATLAB.
Article
Full-text available
The non-negative matrix factorization (NMF) determines a lower rank approximation of a matrix where an interger is given and nonnegativity is imposed on all components of the factors and . The NMF has attracted much attention for over a decade and has been successfully applied to numerous data analysis problems. In applications where the components of the data are necessarily non- negative such as chemical concentrations in experimental results or pixels in digital images, the NMF provides a more relevant interpretation of the results since it gives non-subtractive combinations of non-negative basis vectors. In this paper, we introduce an algorithm for the NMF based on alternating non-negativity constrained least squares (NMF/ANLS) and the active set based fast algorithm for non-negativity constrained least squares with multiple right hand side vectors, and discuss its convergence properties and a rigorous convergence criterion based on the Karush-Kuhn-Tucker (KKT) conditions. In addition, we also describe algorithms for sparse NMFs and regularized NMF. We show how we impose a sparsity constraint on one of the factors by -norm minimization and discuss its convergence properties. Our algorithms are compared to other commonly used NMF algorithms in the literature on several test data sets in terms of their convergence behavior.
Article
Full-text available
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ℓ1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.
Article
Full-text available
Nonnegative matrix factorization (NMF) is a dimension reduction method that has been widely used for numerous applications, including text mining, computer vision, pattern discovery, and bioinformatics. A mathematical formulation for NMF appears as a nonconvex optimization problem, and various types of algorithms have been devised to solve the problem. The alternating nonnegative least squares (ANLS) framework is a block coordinate descent approach for solving NMF, which was recently shown to be theoretically sound and empirically efficient. In this paper, we present a novel algorithm for NMF based on the ANLS framework. Our new algorithm builds upon the block principal pivoting method for the nonnegativity-constrained least squares problem that overcomes a limitation of the active set method. We introduce ideas that efficiently extend the block principal pivoting method within the context of NMF computation. Our algorithm inherits the convergence property of the ANLS framework and can easily be extended to other constrained NMF formulations. Extensive computational comparisons using data sets that are from real life applications as well as those artificially generated show that the proposed algorithm provides state-of-the-art performance in terms of computational speed.
Conference Paper
We propose a general algorithmic framework for constrained matrix and tensor factorization, which is widely used in unsupervised learning. The new framework is a hybrid between alternating optimization (AO) and the alternating direction method of multipliers (ADMM): each matrix factor is updated in turn, using ADMM. This combination can naturally accommodate a great variety of constraints on the factor matrices, hence the term ‘universal’. Computation caching and warm start strategies are used to ensure that each update is evaluated efficiently, while the outer AO framework guarantees that the algorithm converges monotonically. Simulations on synthetic data show significantly improved performance relative to state-of-the-art algorithms.
Article
The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and bandpass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive field properties may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images. Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcomplete--i.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions are non-orthogonal and not linearly independent of each other, sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the input-output function will deviate from being purely linear. These deviations from linearity provide a potential explanation for the weak forms of non-linearity observed in the response properties of cortical simple cells, and they further make predictions about the expected interactions among units in response to naturalistic stimuli.
Article
Non-negative matrix factorization (NMF) is a recently developed technique for finding parts-based, linear representations of non-negative data. Although it has successfully been applied in several applications, it does not always result in parts-based representations. In this paper, we show how explicitly incorporating the notion of 'sparseness' improves the found decompositions. Additionally, we provide complete MATLAB code both for standard NMF and for our extension. Our hope is that this will further the application of these methods to solving novel data-analysis problems.
Article
In this paper, the term tensor refers simply to a multidimensional or N-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: A Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank-1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.
Article
We present structured data fusion (SDF) as a framework for the rapid prototyping of knowledge discovery in one or more possibly incomplete data sets. In SDF, each data set—stored as a dense, sparse, or incomplete tensor—is factorized with a matrix or tensor decomposition. Factorizations can be coupled, or fused, with each other by indicating which factors should be shared between data sets. At the same time, factors may be imposed to have any type of structure that can be constructed as an explicit function of some underlying variables. With the right choice of decomposition type and factor structure, even well-known matrix factorizations such as the eigenvalue decomposition, singular value decomposition and QR factorization can be computed with SDF. A domain specific language (DSL) for SDF is implemented as part of the software package Tensorlab, with which we offer a library of tensor decompositions and factor structures to choose from. The versatility of the SDF framework is demonstrated by means of four diverse applications, which are all solved entirely within Tensorlab’s DSL.
Article
In signal processing, tensor decompositions have gained in popularity this last decade. In the meantime, the volume of data to be processed has drastically increased. This calls for novel methods to handle Big Data tensors. Since most of these huge data are issued from physical measurements, which are intrinsically real nonnegative, being able to compress nonnegative tensors has become mandatory. Following recent works on HOSVD compression for Big Data, we detail solutions to decompose a nonnegative tensor into decomposable terms in a compressed domain.
Article
This monograph is about a class of optimization algorithms called proximal algorithms. Much like Newton's method is a standard tool for solving unconstrained smooth optimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed versions of these problems. They are very generally applicable, but are especially well-suited to problems of substantial recent interest involving large or high-dimensional datasets. Proximal methods sit at a higher level of abstraction than classical algorithms like Newton's method: the base operation is evaluating the proximal operator of a function, which itself involves solving a small convex optimization problem. These subproblems, which generalize the problem of projecting a point onto a convex set, often admit closed-form solutions or can be solved very quickly with standard or simple specialized methods. Here, we discuss the many different interpretations of proximal operators and algorithms, describe their connections to many other topics in optimization and applied mathematics, survey some popular algorithms, and provide a large number of examples of proximal operators that commonly arise in practice.
Conference Paper
Non-negative matrix factorization (NMF) is a popular method for learning interpretable features from non-negative data, such as counts or magnitudes. Different cost functions are used with NMF in different applications. We develop an algorithm, based on the alternating direction method of multipliers, that tackles NMF problems whose cost function is a beta-divergence, a broad class of divergence functions. We derive simple, closed-form updates for the most commonly used beta-divergences. We demonstrate experimentally that this algorithm has faster convergence and yields superior results to state-of-the-art algorithms for this problem.
Article
Tensor factorization has proven useful in a wide range of applications, from sensor array processing to communications, speech and audio signal processing, and machine learning. With few recent exceptions, all tensor factorization algorithms were originally developed for centralized, in-memory computation on a single machine; and the few that break away from this mold do not easily incorporate practically important constraints, such as nonnegativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration; and it naturally leads to distributed algorithms suitable for parallel implementation on regular high-performance computing (e.g., mesh) architectures. This opens the door for many emerging big data-enabled applications. The methodology is exemplified using nonnegativity as a baseline constraint, but the proposed framework can more-or-less readily incorporate many other types of constraints. Numerical experiments are very encouraging, indicating that the ADMoM-based nonnegative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches.
Article
A survey of the development of algorithms for enforcing nonnegativity constraints in scientific computation is given. Special emphasis is placed on such constraints in least squares computations in numerical linear algebra and in nonlinear optimization. Techniques involving nonnegative low-rank matrix and tensor factorizations are also emphasized. Details are provided for some important classical and modern applications in science and engineering. For completeness, this report also includes an effort towards a literature survey of the various algorithms and applications of nonnegativity constraints in numerical analysis.
Article
Higher-order tensors and their decompositions are abundantly present in domains such as signal processing (e.g., higher-order statistics [1] and sensor array processing [2]), scientific computing (e.g., discretized multivariate functions [3]?[6]), and quantum information theory (e.g., representation of quantum many-body states [7]). In many applications, the possibly huge tensors can be approximated well by compact multilinear models or decompositions. Tensor decompositions are more versatile tools than the linear models resulting from traditional matrix approaches. Compared to matrices, tensors have at least one extra dimension. The number of elements in a tensor increases exponentially with the number of dimensions, and so do the computational and memory requirements. The exponential dependency (and the problems that are caused by it) is called the curse of dimensionality. The curse limits the order of the tensors that can be handled. Even for a modest order, tensor problems are often large scale. Large tensors can be handled, and the curse can be alleviated or even removed by using a decomposition that represents the tensor instead of using the tensor itself. However, most decomposition algorithms require full tensors, which renders these algorithms infeasible for many data sets. If a tensor can be represented by a decomposition, this hypothesized structure can be exploited by using compressed sensing (CS) methods working on incomplete tensors, i.e., tensors with only a few known elements.
Article
Tensor decompositions are at the core of many blind source separation (BSS) algorithms, either explicitly or implicitly. In particular, the canonical polyadic (CP) tensor decomposition plays a central role in the identification of underdetermined mixtures. Despite some similarities, CP and singular value decomposition (SVD) are quite different. More generally, tensors and matrices enjoy different properties, as pointed out in this brief introduction.
Article
Three-way Candecomp/Parafac (CP) is a three-way generalization of principal component analysis (PCA) for matrices. Contrary to PCA, a CP decomposition is rotationally unique under mild conditions. However, a CP analysis may be hampered by the non-existence of a best-fitting CP decomposition with R≥2R≥2 components. In this case, fitting CP to a three-way data array results in diverging CP components. Recently, it has been shown that this can be solved by fitting a decomposition with several interaction terms, using initial values obtained from the diverging CP decomposition. The new decomposition is called CPlimit, since it is the limit of the diverging CP decomposition. The practical merits of this procedure are demonstrated for a well-known three-way dataset of TV-ratings. CPlimit finds main components with the same interpretation as Tucker models or when imposing orthogonality in CP. However, CPlimit has higher joint fit of the main components than Tucker models, contains only one small interaction term, and does not impose the unnatural constraint of orthogonality. The uniqueness properties of the CPlimit decomposition are discussed in detail.
Article
Non-negative matrix factorization (NMF) has found numerous applications, due to its ability to provide interpretable decompositions. Perhaps surprisingly, existing results regarding its uniqueness properties are rather limited, and there is much room for improvement in terms of algorithms as well. Uniqueness aspects of NMF are revisited here from a geometrical point of view. Both symmetric and asymmetric NMF are considered, the former being tantamount to element-wise non-negative square-root factorization of positive semidefinite matrices. New uniqueness results are derived, e.g., it is shown that a sufficient condition for uniqueness is that the conic hull of the latent factors is a superset of a particular second-order cone. Checking this condition is shown to be NP-complete; yet this and other results offer insights on the role of latent sparsity in this context. On the computational side, a new algorithm for symmetric NMF is proposed, which is very different from existing ones. It alternates between Procrustes rotation and projection onto the non-negative orthant to find a non-negative matrix close to the span of the dominant subspace. Simulation results show promising performance with respect to the state-of-art. Finally, the new algorithm is applied to a clustering problem for co-authorship data, yielding meaningful and interpretable results.
Article
The block coordinate descent (BCD) method is widely used for minimizing a continuous function f of several block variables. At each iteration of this method, a single block of variables is optimized, while the remaining variables are held fixed. To ensure the convergence of the BCD method, the subproblem to be optimized in each iteration needs to be solved exactly to its unique optimal solution. Unfortunately, these requirements are often too restrictive for many practical scenarios. In this paper, we study an alternative inexact BCD approach which updates the variable blocks by successively minimizing a sequence of approximations of f which are either locally tight upper bounds of f or strictly convex local approximations of f. We focus on characterizing the convergence properties for a fairly wide class of such methods, especially for the cases where the objective functions are either non-differentiable or nonconvex. Our results unify and extend the existing convergence results for many classical algorithms such as the BCD method, the difference of convex functions (DC) method, the expectation maximization (EM) algorithm, as well as the alternating proximal minimization algorithm.
Article
The nonnegative tensor (matrix) factorization finds more and more applications in various disciplines including machine learning, data mining, and blind source separation, etc. In computation, the optimization problem involved is solved by alternatively minimizing one factor while the others are fixed. To solve the subproblem efficiently, we first exploit a variable regularization term which makes the subproblem far from ill-condition. Second, an augmented Lagrangian alternating direction method is employed to solve this convex and well-conditioned regularized subproblem, and two accelerating skills are also implemented. Some preliminary numerical experiments are performed to show the improvements of the new method.
Article
Non-negative matrix factorization (NMF) is a recently developed technique for finding parts-based, linear representations of non-negative data. Although it has successfully been applied in several applications, it does not always result in parts-based representations. In this paper, we show how explicitly incorporating the notion of 'sparseness' improves the found decompositions. Additionally, we provide complete MATLAB code both for standard NMF and for our extension. Our hope is that this will further the application of these methods to solving novel data-analysis problems.
Article
This book is an introduction to the field of multi-way analysis for chemists and chemometricians. Its emphasis is on the ideas behind the method and its pratical applications. Sufficient mathematical background is given to provide a solid understanding of the ideas behind the method. There are currently no other books on the market which deal with this method from the viewpoint of its applications in chemistry. Applicable in many areas of chemistry. No comparable volume currently available. The field is becoming increasingly important.
Article
We describe efficient algorithms for projecting a vector onto the l1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the l1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with l1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity.
Article
In this paper we consider sparsity on a tensor level, as given by the n-rank of a tensor. In an important sparse-vector approximation problem (compressed sensing) and the low-rank matrix recovery problem, using a convex relaxation technique proved to be a valuable solution strategy. Here, we will adapt these techniques to the tensor setting. We use the n-rank of a tensor as a sparsity measure and consider the low-n-rank tensor recovery problem, i.e. the problem of finding the tensor of the lowest n-rank that fulfills some linear constraints. We introduce a tractable convex relaxation of the n-rank and propose efficient algorithms to solve the low-n-rank tensor recovery problem numerically. The algorithms are based on the Douglas–Rachford splitting technique and its dual variant, the alternating direction method of multipliers.
Article
Algorithms for multivariate image analysis and other large-scale applications of multivariate curve resolution (MCR) typically employ constrained alternating least squares (ALS) procedures in their solution. The solution to a least squares problem under general linear equality and inequality constraints can be reduced to the solution of a non-negativity-constrained least squares (NNLS) problem. Thus the efficiency of the solution to any constrained least square problem rests heavily on the underlying NNLS algorithm. We present a new NNLS solution algorithm that is appropriate to large-scale MCR and other ALS applications. Our new algorithm rearranges the calculations in the standard active set NNLS method on the basis of combinatorial reasoning. This rearrangement serves to reduce substantially the computational burden required for NNLS problems having large numbers of observation vectors. Copyright © 2005 John Wiley & Sons, Ltd.
Article
We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys m ≥ C n^(1.2)r log n for some positive numerical constant C, then with very high probability, most n×n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
Article
We study the convergence properties of a (block) coordinate descent method applied to minimize a nondifferentiable (nonconvex) function f(x 1, . . . , x N ) with certain separability and regularity properties. Assuming that f is continuous on a compact level set, the subsequence convergence of the iterates to a stationary point is shown when either f is pseudoconvex in every pair of coordinate blocks from among N-1 coordinate blocks or f has at most one minimum in each of N-2 coordinate blocks. If f is quasiconvex and hemivariate in every coordinate block, then the assumptions of continuity of f and compactness of the level set may be relaxed further. These results are applied to derive new (and old) convergence results for the proximal minimization algorithm, an algorithm of Arimoto and Blahut, and an algorithm of Han. They are applied also to a problem of blind source separation.
Article
We describe methods for learning dictionaries that are appropriate for the representation of given classes of signals and multisensor data. We further show that dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analy sis or classification when the learning includes a class separability criteria in the objective function. The benefits of dictionary learning clearly show that a proper understanding of causes underlying the sensed world is key to task-specific representation of relevant information in high-dimensional data sets.
Article
We give new convergence results for the block Gauss–Seidel method for problems where the feasible set is the Cartesian product of m closed convex sets, under the assumption that the sequence generated by the method has limit points. We show that the method is globally convergent for m=2 and that for m>2 convergence can be established both when the objective function f is componentwise strictly quasiconvex with respect to m−2 components and when f is pseudoconvex. Finally, we consider a proximal point modification of the method and we state convergence results without any convexity assumption on the objective function.
Article
In this paper we propose an algorithm to estimate missing values in tensors of visual data. Our methodology is built on recent studies about matrix completion using the matrix trace norm. The contribution of our paper is to extend the matrix case to the tensor case by proposing the first definition of the trace norm for tensors and then by building a working algorithm. First, we propose a definition for the tensor trace norm, that generalizes the established definition of the matrix trace norm. Second, similar to matrix completion, the tensor completion is formulated as a convex optimization problem. We developed three algorithms: SiLRTC, FaLRTC, and HaLRTC. The SiLRTC algorithm is simple to implement and employs a relaxation technique to separate the dependant relationships and uses the block coordinate descent (BCD) method to achieve a globally optimal solution; The FaLRTC algorithm utilizes a smoothing scheme to transform the original nonsmooth problem into a smooth one; The HaLRTC algorithm applies the alternating direction method of multipliers (ADMM) to our problem. Our experiments show potential applications of our algorithms and the quantitative evaluation indicates that our methods are more accurate and robust than heuristic approaches.
Article
In this paper, the term tensor refers simply to a multidimensional or N-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: a Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank-1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.