Preprint

A Deep-Unfolded Spatiotemporal RPCA Network For L+S Decomposition

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Low-rank and sparse decomposition based methods find their use in many applications involving background modeling such as clutter suppression and object tracking. While Robust Principal Component Analysis (RPCA) has achieved great success in performing this task, it can take hundreds of iterations to converge and its performance decreases in the presence of different phenomena such as occlusion, jitter and fast motion. The recently proposed deep unfolded networks, on the other hand, have demonstrated better accuracy and improved convergence over both their iterative equivalents as well as over other neural network architectures. In this work, we propose a novel deep unfolded spatiotemporal RPCA (DUST-RPCA) network, which explicitly takes advantage of the spatial and temporal continuity in the low-rank component. Our experimental results on the moving MNIST dataset indicate that DUST-RPCA gives better accuracy when compared with the existing state of the art deep unfolded RPCA networks.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Decision making algorithms are used in a multitude of different applications. Conventional approaches for designing decision algorithms employ principled and simplified modelling, based on which one can determine decisions via tractable optimization. More recently, deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models, are becoming increasingly popular. Model-based optimization and data-centric deep learning are often considered to be distinct disciplines. Here, we characterize them as edges of a continuous spectrum varying in specificity and parameterization, and provide a tutorial-style presentation to the methodologies lying in the middle ground of this spectrum, referred to as model-based deep learning. We accompany our presentation with running examples in super-resolution and stochastic control, and show how they are expressed using the provided characterization and specialized in each of the detailed methodologies. The gains of combining model-based optimization and deep learning are demonstrated using experimental results in various applications, ranging from biomedical imaging to digital communications.
Article
Full-text available
Robust PCA (RPCA) via decomposition into low-rank plus sparse matrices offers a powerful framework for a large variety of applications such as image processing, video processing and 3D computer vision. Indeed, most of the time these applications require to detect sparse outliers from the observed imagery data that can be approximated by a low-rank matrix. Moreover, most of the time experiments show that RPCA with additional spatial and/or temporal constraints often outperforms the state-of-the-art algorithms in these applications. Thus, the aim of this paper is to survey the applications of RPCA in computer vision. In the first part of this paper, we review representative image processing applications as follows: (1) low-level imaging such as image recovery and denoising, image composition , image colorization, image alignment and rectification, multi-focus image and face recognition, (2) medical imaging like dynamic Magnetic Resonance Imaging (MRI) for acceleration of data acquisition, background suppression and learning of inter-frame motion fields, and (3) imaging for 3D computer vision with additional depth information like in Structure from Motion (SfM) and 3D motion recovery. In the second part, we present the applications of RPCA in video processing which utilize additional spatial and temporal information compared to image processing. Specifically, we investigate video denoising and restoration, hyperspectral video and background/foreground separation. Finally, we provide perspectives on possible future research directions and algorithmic frameworks that are suitable for these applications.
Article
Full-text available
We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences - patches of image pixels and high-level representations ("percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to see how well the model can extrapolate the learned video representation into the future and into the past. We try to visualize and interpret the learned features. We stress test the model by running it on longer time scales and on out-of-domain data. We further evaluate the representations by finetuning them for a supervised learning problem - human action recognition on the UCF-101 and HMDB-51 datasets. We show that the representations help improve classification accuracy, especially when there are only a few training examples. Even models pretrained on unrelated datasets (300 hours of YouTube videos) can help action recognition performance.
Article
Full-text available
In this paper, we formulate particle filter-based object tracking as an exclusive sparse learning problem that exploits contextual information. To achieve this goal, we propose the context-aware exclusive sparse tracker (CEST) to model particle appearances as linear combinations of dictionary templates that are updated dynamically. Learning the representation of each particle is formulated as an exclusive sparse representation problem, where the overall dictionary is composed of multiple {group} dictionaries that can contain contextual information. With context, CEST is less prone to tracker drift. Interestingly, we show that the popular L₁ tracker [1] is a special case of our CEST formulation. The proposed learning problem is efficiently solved using an accelerated proximal gradient method that yields a sequence of closed form updates. To make the tracker much faster, we reduce the number of learning problems to be solved by using the dual problem to quickly and systematically rank and prune particles in each frame. We test our CEST tracker on challenging benchmark sequences that involve heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that CEST consistently outperforms state-of-the-art trackers.
Conference Paper
Full-text available
Low-rank and sparse structures have been pro-foundly studied in matrix completion and com-pressed sensing. In this paper, we develop "Go Decomposition" (GoDec) to efficiently and ro-bustly estimate the low-rank part L and the sparse part S of a matrix X = L + S + G with noise G. GoDec alternatively assigns the low-rank ap-proximation of X − S to L and the sparse ap-proximation of X − L to S. The algorithm can be significantly accelerated by bilateral random projections (BRP). We also propose GoDec for matrix completion as an important variant. We prove that the objective value ∥X − L − S∥ 2 F converges to a local minimum, while L and S lin-early converge to local optimums. Theoretically, we analyze the influence of L, S and G to the asymptotic/convergence speeds in order to dis-cover the robustness of GoDec. Empirical stud-ies suggest the efficiency, robustness and effec-tiveness of GoDec comparing with representative matrix decomposition and completion tools, e.g., Robust PCA and OptSpace.
Article
Clouds, together with their shadows, usually occlude ground-cover features in optical remote sensing images. This hinders the utilization of these images for a range of applications such as earth observation, land-cover classification and urban planning. In this work, we propose a deep unfolded and prior-aided robust principal component analysis (DUPA-RPCA) network for removing clouds and recovering ground-cover information in multi-temporal satellite images. We model these cloud-contaminated images as a sum of low rank and sparse elements and then unfold an iterative RPCA algorithm that has been designed for reweighted 1\ell _{1} minimization. As a result, the activation function in DUPA-RPCA adapts for every input at each layer of the network. Our experimental results on both Landsat and Sentinel images indicate that our method gives better accuracy and efficiency when compared with existing state of the art methods.
Book
Connecting theory with practice, this systematic and rigorous introduction covers the fundamental principles, algorithms and applications of key mathematical models for high-dimensional data analysis. Comprehensive in its approach, it provides unified coverage of many different low-dimensional models and analytical techniques, including sparse and low-rank models, and both convex and non-convex formulations. Readers will learn how to develop efficient and scalable algorithms for solving real-world problems, supported by numerous examples and exercises throughout, and how to use the computational tools learnt in several application contexts. Applications presented include scientific imaging, communication, face recognition, 3D vision, and deep networks for classification. With code available online, this is an ideal textbook for senior and graduate students in computer science, data science, and electrical engineering, as well as for those taking courses on sparsity, low-dimensional structures, and high-dimensional data. Foreword by Emmanuel Candès.
Article
Deep neural networks provide unprecedented performance gains in many real-world problems in signal and image processing. Despite these gains, the future development and practical deployment of deep networks are hindered by their black-box nature, i.e., a lack of interpretability and the need for very large training sets. An emerging technique called algorithm unrolling, or unfolding, offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are widely used in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention, and it is rapidly growing in both theoretic investigations and practical applications. The increasing popularity of unrolled deep networks is due, in part, to their potential in developing efficient, high-performance (yet interpretable) network architectures from reasonably sized training sets.
Book
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers argues that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas-Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ?1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, it discusses applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. It also discusses general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.
Article
Contrast enhanced ultrasound is a radiation-free imaging modality which uses encapsulated gas microbubbles for improved visualization of the vascular bed deep within the tissue. It has recently been used to enable imaging with unprecedented subwavelength spatial resolution by relying on super-resolution techniques. A typical preprocessing step in super-resolution ultrasound is to separate the microbubble signal from the cluttering tissue signal. This step has a crucial impact on the final image quality. Here, we propose a new approach to clutter removal based on robust principle component analysis (PCA) and deep learning. We begin by modeling the acquired contrast enhanced ultrasound signal as a combination of low rank and sparse components. This model is used in robust PCA and was previously suggested in the context of ultrasound Doppler processing and dynamic magnetic resonance imaging. We then illustrate that an iterative algorithm based on this model exhibits improved separation of microbubble signal from the tissue signal over commonly practiced methods. Next, we apply the concept of deep unfolding to suggest a deep network architecture tailored to our clutter filtering problem which exhibits improved convergence speed and accuracy with respect to its iterative counterpart. We compare the performance of the suggested deep network on both simulations and in-vivo rat brain scans, with a commonly practiced deep-network architecture and with the fast iterative shrinkage algorithm. We show that our architecture exhibits better image quality and contrast.
Article
In this paper, we propose a novel and robust tracking framework based on online discriminative and low-rank dictionary learning. The primary aim of this paper is to obtain compact and low-rank dictionaries that can provide good discriminative representations of both target and background. We accomplish this by exploiting the recovery ability of low-rank matrices. That is if we assume that the data from the same class are linearly correlated, then the corresponding basis vectors learned from the training set of each class shall render the dictionary to become approximately low-rank. The proposed dictionary learning technique incorporates a reconstruction error that improves the reliability of classification. Also, a multiconstraint objective function is designed to enable active learning of a discriminative and robust dictionary. Further, an optimal solution is obtained by iteratively computing the dictionary, coefficients, and by simultaneously learning the classifier parameters. Finally, a simple yet effective likelihood function is implemented to estimate the optimal state of the target during tracking. Moreover, to make the dictionary adaptive to the variations of the target and background during tracking, an online update criterion is employed while learning the new dictionary. Experimental results on a publicly available benchmark dataset have demonstrated that the proposed tracking algorithm performs better than other state-of-the-art trackers.
Article
We propose an online estimated local dictionary based single-channel speech enhancement algorithm, which focuses on low rank and sparse matrix decomposition. In this proposed algorithm, a noisy speech spectral matrix is considered as the summation of low rank background noise components and an activation of the online speech dictionary, on which both low rank and sparsity constraints are imposed. This decomposition takes the advantage of local estimated dictionary high expressiveness on speech components. The local dictionary can be obtained through estimating the speech presence probability by applying Expectation Maximal algorithm, in which a generalized Gamma prior for speech magnitude spectrum is used. The evaluation results show that the proposed algorithm achieves significant improvements when compared to four other speech enhancement algorithms.
Article
An effective representation model, which aims to mine the most meaningful information in the data, plays an important role in visual tracking. Some recent particle-filter-based trackers achieve promising results by introducing the low-rank assumption into the representation model. However, their assumed low-rank structure of candidates limits the robustness when facing severe challenges such as abrupt motion. To avoid the above limitation, we propose a temporal restricted reverse-low-rank learning algorithm for visual tracking with the following advantages: 1) the reverse-low-rank model jointly represents target and background templates via candidates, which exploits the low-rank structure among consecutive target observations and enforces the temporal consistency of target in a global level; 2) the appearance consistency may be broken when target suffers from sudden changes. To overcome this issue, we propose a local constraint via ℓ₁,₂ mixed-norm, which can not only ensures the local consistency of target appearance, but also tolerates the sudden changes between two adjacent frames; and 3) to alleviate the inference of unreasonable representation values due to outlier candidates, an adaptive weighted scheme is designed to improve the robustness of the tracker. By evaluating on 26 challenge video sequences, the experiments show the effectiveness and favorable performance of the proposed algorithm against 12 state-of-the-art visual trackers.
Article
We consider algorithms and recovery guarantees for the analysis sparse model where the signal is sparse with respect to a highly coherent frame. We first consider the use of the monotone version of the fast iterative shrinkage- thresholding algorithm (MFISTA) to solve the analysis sparse recovery problem. Since the proximal operator in MFISTA does not have a closed-form solution for the analysis model, it cannot be applied directly. Instead, we examine two alternatives based on smoothing and decomposition transformations that relax the original sparse recovery problem, and then implement MFISTA on the relaxed formulation. We refer to these two methods as smoothing-based MFISTA and decomposition-based MFISTA. We analyze the convergence of both algorithms, and establish that smoothing-based MFISTA converges more rapidly when applied to general nonsmooth optimization problems. We then derive a performance bound on the reconstruction error using these algorithms. The bound proves that our methods can recover a sparse signal in terms of a redundant tight frame when the measurement matrix satisfies a properly adapted restricted isometry property. Extensive numerical examples demonstrate the performance of our algorithms and show that smoothing- based MFISTA converges faster than the decomposition-based alternative in real applications, such as CT image reconstruction.
Article
Abstract This paper introduces a novel algorithm to approximate the matrix with minimum,nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the convex relaxation of a rank minimization problem, and arises in many important applications as in the task of recovering a large matrix from a small subset of its entries (the famous Netix problem). O-the-shelf,algorithms such as interior point methods are not directly amenable to large problems of this kind with over a million unknown,entries. This paper develops a simple,rst-order and easy-to-implement algorithm that is extremely ecient,at addressing problems in which the optimal solution has low rank. The algorithm is iterative and produces a sequence of matricesfX,g is empirically nondecreasing. Both these facts allow the algorithm to make use of very minimal storage space and keep the computational cost of each iteration low. On the theoretical side, we provide a convergence analysis showing that the sequence of iterates converges. On the practical side, we provide numerical examples in which 1; 000 1; 000 matrices are recovered in less than a minute on a modest desktop computer. We also demonstrate that our approach is amenable to very large scale problems by recovering matrices of rank about 10 with nearly a billion unknowns from just about 0.4% of their sampled entries. Our methods are connected with the recent literature on linearized Bregman iterations for ‘1 minimization, and we develop a framework in which one can understand these algorithms in terms of well-known Lagrange multiplier algorithms. Keywords. Nuclear norm minimization, matrix completion, singular value thresholding, La-
Article
We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an ex- tension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising nu- merical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.
Article
Low-rank representation (LRR) is an effective method for subspace clustering and has found wide applications in computer vision and machine learning. The existing LRR solver is based on the alternating direction method (ADM). It suffers from O(n3)O(n^3) computation complexity due to the matrix-matrix multiplications and matrix inversions, even if partial SVD is used. Moreover, introducing auxiliary variables also slows down the convergence. Such a heavy computation load prevents LRR from large scale applications. In this paper, we generalize ADM by linearizing the quadratic penalty term and allowing the penalty to change adaptively. We also propose a novel rule to update the penalty such that the convergence is fast. With our linearized ADM with adaptive penalty (LADMAP) method, it is unnecessary to introduce auxiliary variables and invert matrices. The matrix-matrix multiplications are further alleviated by using the skinny SVD representation technique. As a result, we arrive at an algorithm for LRR with complexity O(rn2)O(rn^2), where r is the rank of the representation matrix. Numerical experiments verify that for LRR our LADMAP method is much faster than state-of-the-art algorithms. Although we only present the results on LRR, LADMAP actually can be applied to solving more general convex programs.
Article
This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.
Rethinking pca for modern data sets: Theory, algorithms, and applications
  • Namrata Vaswani
  • Yuejie Chi
  • Thierry Bouwmans
Namrata Vaswani, Yuejie Chi, and Thierry Bouwmans. Rethinking pca for modern data sets: Theory, algorithms, and applications [scanning the issue]. Proceedings of the IEEE, 106(8):1274-1276, 2018.
Learned robust pca: A scalable deep unfolding approach for high-dimensional outlier detection
  • Hanqin Cai
  • Jialin Liu
  • Wotao Yin
HanQin Cai, Jialin Liu, and Wotao Yin. Learned robust pca: A scalable deep unfolding approach for high-dimensional outlier detection. Advances in Neural Information Processing Systems, 34:16977-16989, 2021.