Thesis

Approches bayésiennes non paramétriques et apprentissage de dictionnaire pour les problèmes inverses en traitement d'image

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

L'apprentissage de dictionnaire pour la représentation parcimonieuse est bien connu dans le cadre de la résolution de problèmes inverses. Les méthodes d'optimisation et les approches paramétriques ont été particulièrement explorées. Ces méthodes rencontrent certaines limitations, notamment liées au choix de paramètres. En général, la taille de dictionnaire doit être fixée à l'avance et une connaissance des niveaux de bruit et éventuellement de parcimonie sont aussi nécessaires. Les contributions méthodologiques de cette thèse concernent l'apprentissage conjoint du dictionnaire et de ses paramètres, notamment pour les problèmes inverses en traitement d'image. Nous étudions et proposons la méthode IBP-DL (Indien Buffet Process for Dictionary Learning) en utilisant une approche bayésienne non paramétrique. Une introduction sur les approches bayésiennes non paramétriques est présentée. Le processus de Dirichlet et son dérivé, le processus du restaurant chinois, ainsi que le processus Bêta et son dérivé, le processus du buffet indien, sont décrits. Le modèle proposé pour l'apprentissage de dictionnaire s'appuie sur un a priori de type Buffet Indien qui permet d'apprendre un dictionnaire de taille adaptative. Nous détaillons la méthode de Monte-Carlo proposée pour l'inférence. Le niveau de bruit et celui de la parcimonie sont aussi échantillonnés, de sorte qu'aucun réglage de paramètres n'est nécessaire en pratique. Des expériences numériques illustrent les performances de l'approche pour les problèmes du débruitage, de l'inpainting et de l'acquisition compressée. Les résultats sont comparés avec l'état de l'art.Le code source en Matlab et en C est mis à disposition.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... (2.4.9) This model for FA is very close to those refer to as Beta-Bernoulli process in the literature (Zhou et al., 2011;Dang, 2016), used for model order selection. The sampling of the binaries q is implemented through an acceptance-rejection step, which determines whether the model order is updated (the details of this step are given in Sec. ...
Thesis
Full-text available
With the emergence of MEMS and the overall decrease in the cost of sensors, the acquisitions multichannel are becoming more widespread, particularly in the field of acoustic source identification. The quality of source localization and quantification can be degraded by the presence of ambient or electronic noise. In particular, in the case of in flow measurements, the turbulent boundary layer that develops over the measuring system can induce pressure fluctuations that are much greater than those of acoustic sources. It then becomes necessary to process the acquisitions to extract each component of the measured field. For this purpose, it is proposed in this thesis to decompose the measured spectral matrix into the sum of a matrix associated with the acoustic contribution and a matrix for aerodynamic noise. This decomposition exploits the statistical properties of each pressure field. Assuming that the acoustic contribution is highly correlated on the sensors, the rank of the corresponding cross-spectral matrix is limited to the number of equivalent uncorrelated sources. Concerning the aerodynamic noise matrix, two statistical models are proposed. A first model assumes a totally uncorrelated field on the sensors, and a second is based on a pre-existing physical model. This separation problem is solved by a Bayesian optimization approach, which takes into account the uncertainties on each component of the model. The performance of this method is first evaluated on wind tunnel measurements and then on particularly noisy industrial measurement, coming from microphones flushmounted on the fuselage of an inflight large aircraft.
Article
Full-text available
Ill-posed inverse problems call for some prior model to define a suitable set of solutions. A wide family of approaches relies on the use of sparse representations. Dictionary learning precisely permits to learn a redundant set of atoms to represent the data in a sparse manner. Various approaches have been proposed, mostly based on optimization methods. We propose a Bayesian non-parametric approach called IBP-DL that uses an Indian Buffet Process prior. This method yields an efficient dictionary with an adaptive number of atoms. Moreover the noise and sparsity levels are also inferred so that no parameter tuning is needed. We elaborate on the IBP-DL model to propose a model for linear inverse problems such as inpainting and compressive sensing beyond basic denoising. We derive a collapsed and an accelerated Gibbs samplers and propose a marginal maximum a posteriori estimator of the dictionary. Several image processing experiments are presented and compared to other approaches for illustration.
Article
Full-text available
Solving inverse problems usually calls for adapted priors such as the definition of a well chosen representation of possible solutions. One family of approaches relies on learning redundant dictionaries for sparse representation. In image processing, dictionary learning is applied to sets of patches. Many methods work with a dictionary with a number of atoms that is fixed in advance. Moreover optimization methods often call for the prior knowledge of the noise level to tune regularization parameters. We propose a Bayesian non parametric approach that is able to learn a dictionary of adapted size. The use of an Indian Buffet Process prior permits to learn an adequate number of atoms. The noise level is also accurately estimated so that nearly no parameter tuning is needed. We illustrate the relevance of the resulting dictionaries on numerical experiments.
Article
Full-text available
Sampling and variational inference techniques are two standard methods for in-ference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the k-means and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distri-butions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of exist-ing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, in-cluding topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis.
Article
Full-text available
Dictionary learning for sparse representation has recently attracted attention among the signal processing society in a variety of applications such as denoising, classification, and compression. The number of elements in a learned dictionary is crucial since it governs specificity and optimality of sparse representation. Sparsity level, number of dictionary elements, and representation error are three correlated factors in which setting each pair of them results in a specific value of the third factor. An arbitrary selection of the number of dictionary elements affects either the sparsity level or/and the representation error. Despite recent advancements in training dictionaries, the number of dictionary elements is still heuristically set. To avoid the representation’s suboptimality, a systematic approach to adapt the elements’ number based on input datasets is essential. Some existing methods try to address this requirement such as enhanced K-SVD, sub-clustering K-SVD, and stage-wise K-SVD. However, it is not specified under which sparsity level and representation error criteria their learned dictionaries are size-optimized. We propose a new dictionary learning approach that automatically learns a dictionary with an efficient number of elements that provides both desired representation error and desired average sparsity level. In our proposed method, for any given representation error and average sparsity level, the number of elements in the learned dictionary varies based on content complexity of training datasets. The performance of the proposed method is demonstrated in image denoising. The proposed method is compared to state-of-the-art, and results confirm the superiority of the proposed approach.
Article
Full-text available
We describe a flexible nonparametric approach to latent variable modelling in which the number of latent variables is unbounded. This approach is based on a probability distribution over equivalence classes of binary matrices with a finite number of rows, corresponding to the data points, and an unbounded number of columns, corresponding to the latent variables. Each data point can be associated with a subset of the possible latent variables, which we re-fer to as the latent features of that data point. The binary variables in the matrix indicate which latent feature is possessed by which data point, and there is a potentially infinite array of features. We derive the distribution over unbounded binary matrices by taking the limit of a distribution over N × K binary matrices as K → ∞. We define a simple generative processes for this distribution which we call the Indian buffet process (IBP; Griffiths and Ghahramani, 2005, 2006) by analogy to the Chinese restaurant process (Aldous, 1985; Pitman, 2002). The IBP has a single hyperparameter which controls both the number of feature per object and the total number of fea-tures. We describe a two-parameter generalization of the IBP which has addi-tional flexibility, independently controlling the number of features per object and the total number of features in the matrix. The use of this distribution as a prior in an infinite latent feature model is illustrated, and Markov chain Monte Carlo algorithms for inference are described.
Conference Paper
Full-text available
Sparse representations using overcomplete dictionaries are used in a variety of field such as pattern recognition and compression. However, the size of dictionary is usually a tradeoff between approximation speed and accuracy. In this paper we propose a novel technique called the Enhanced K-SVD algorithm (EK-SVD), which finds a dictionary of optimized size-for a given dataset, without compromising its approximation accuracy. EK-SVD improves the K-SVD dictionary learning algorithm by introducing an optimized dictionary size discovery feature to K-SVD. Optimizing strict sparsity and MSE constraints, it starts with a large number of dictionary elements and gradually prunes the under-utilized or similar-looking elements to produce a well-trained dictionary that has no redundant elements. Experimental results show the optimized dictionaries learned using EK-SVD give the same accuracy as dictionaries learned using the K-SVD algorithm while substantially reducing the dictionary size by 60%.
Article
Full-text available
This paper consist of two parts. The first part concerns approximation capabilities in using an overcomplete dictionary, a frame, for block coding. A frame design technique for use with vector selection algorithms, for example matching pursuits (MP), is presented. We call the technique method of optimal directions (MOD). It is iterative and requires a training set of signal vectors. Experiments demonstrate that the approximation capabilities of the optimized frames are significantly better than those obtained using frames designed by ad hoc techniques or chosen in an ad hoc fashion. Experiments show typical reduction in mean squared error (MSE) by 30-80% for speech and electrocardiogram (ECG) signals. The second part concerns a complete compression scheme using a set of optimized frames, and evaluates both the use of fixed size and variable size frames. A signal compression scheme using frames optimized with the MOD technique is proposed. The technique, called multi-frame compression (MFC) uses several different frames, each optimized for a fixed number of selected frame vectors in each approximation. We apply the MOD and the MFC scheme to ECG signals. The coding results are compared with results obtained when using transform-based compression schemes like the discrete cosine transform (DCT) in combination with run-length and entropy coding. The experiments demonstrate improved rate-distortion performance by 2-4 dB for the MFC scheme when compared to the DCT at low bit-rates. They also show that variable sized frames in the compression scheme perform better than fixed sized frames.
Conference Paper
Full-text available
We often seek to identify co-occurring hid- den features in a set of observations. The Indian Buffet Process (IBP) provides a non- parametric prior on the features present in each observation, but current inference tech- niques for the IBP often scale poorly. The collapsed Gibbs sampler for the IBP has a running time cubic in the number of obser- vations, and the uncollapsed Gibbs sampler, while linear, is often slow to mix. We present a new linear-time collapsed Gibbs sampler for conjugate likelihood models and demonstrate its efficacy on large real-world datasets.
Article
Full-text available
Finding the sparsest solution to underdetermined systems of linear equations y = φx is NP-hard in general. We show here that for systems with "typical"/"random" φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our proposal, Stagewise Orthogonal Matching Pursuit (StOMP), successively transforms the signal into a negligible residual. Starting with initial residual r o = y, at the s-th stage it forms the "matched filter" φ Tr s-1, identifies all coordinates with amplitudes exceeding a specially chosen threshold, solves a least-squares problem using the selected coordinates, and subtracts the least-squares fit, producing a new residual. After a fixed number of stages (e.g., 10), it stops. In contrast to Orthogonal Matching Pursuit (OMP), many coefficients can enter the model at each stage in StOMP while only one enters per stage in OMP; and StOMP takes a fixed number of stages (e.g., 10), while OMP can take many (e.g., ).We give both theoretical and empirical support for the large-system effectiveness of StOMP.We give numerical examples showing that StOMP rapidly and reliably finds sparse solutions in compressed sensing, decoding of error-correcting codes, and overcomplete representation.
Article
Full-text available
Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring that it can be readily learned from data. The majority of existing work on grammar induction has favoured model simplicity (and thus learnability) over representational capacity by using context free grammars and first order dependency grammars, which are not sufficiently expressive to model many common linguistic constructions. We propose a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures. To limit the model's complexity we employ a Bayesian non-parametric prior which biases the model towards a sparse grammar with shallow productions. We demonstrate the model's efficacy on supervised phrase-structure parsing, where we induce a latent segmentation of the training treebank, and on unsupervised dependency grammar induction. In both cases the model uncovers interesting latent linguistic structures while producing competitive results.
Thesis
L'objectif de ce travail est de développer de nouvelles méthodes statistiques de reconstruction d'image spatiale (3D) et spatio-temporelle (3D+t) en Tomographie par Émission de Positons (TEP). Le but est de proposer des méthodes efficaces, capables de reconstruire des images dans un contexte de faibles doses injectées tout en préservant la qualité de l'interprétation. Ainsi, nous avons abordé la reconstruction sous la forme d'un problème inverse spatial et spatio-temporel (à observations ponctuelles) dans un cadre bayésien non paramétrique. La modélisation bayésienne fournit un cadre pour la régularisation du problème inverse mal posé au travers de l'introduction d'une information dite a priori. De plus, elle caractérise les grandeurs à estimer par leur distribution a posteriori, ce qui rend accessible la distribution de l'incertitude associée à la reconstruction. L'approche non paramétrique quant à elle pourvoit la modélisation d'une grande robustesse et d'une grande flexibilité. Notre méthodologie consiste à considérer l'image comme une densité de probabilité dans (pour une reconstruction en k dimensions) et à chercher la solution parmi l'ensemble des densités de probabilité de . La grande dimensionalité des données à manipuler conduit à des estimateurs n'ayant pas de forme explicite. Cela implique l'utilisation de techniques d'approximation pour l'inférence. La plupart de ces techniques sont basées sur les méthodes de Monte-Carlo par chaînes de Markov (MCMC). Dans l'approche bayésienne non paramétrique, nous sommes confrontés à la difficulté majeure de générer aléatoirement des objets de dimension infinie sur un calculateur. Nous avons donc développé une nouvelle méthode d'échantillonnage qui allie à la fois bonnes capacités de mélange et possibilité d'être parallélisé afin de traiter de gros volumes de données. L'approche adoptée nous a permis d'obtenir des reconstructions spatiales 3D sans nécessiter de voxellisation de l'espace, et des reconstructions spatio-temporelles 4D sans discrétisation en amont ni dans l'espace ni dans le temps. De plus, on peut quantifier l'erreur associée à l'estimation statistique au travers des intervalles de crédibilité.
Thesis
We propose two novel approaches for recommender systems and networks. In the first part, we first give an overview of recommender systems and concentrate on the low-rank approaches for matrix completion. Building on a probabilistic approach, we propose novel penalty functions on the singular values of the low-rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an expectation-maximization algorithm to obtain a maximum a posteriori estimate of the completed low-rank matrix. The resulting algorithm is an iterative soft-thresholded algorithm which iteratively adapts the shrinkage coefficients associated to the singular values. The algorithm is simple to implement and can scale to large matrices. We provide numerical comparisons between our approach and recent alternatives showing the interest of the proposed approach for low-rank matrix completion. In the second part, we first introduce some background on Bayesian nonparametrics and in particular on completely random measures (CRMs) and their multivariate extension, the compound CRMs. We then propose a novel statistical model for sparse networks with overlapping community structure. The model is based on representing the graph as an exchangeable point process, and naturally generalizes existing probabilistic models with overlapping block-structure to the sparse regime. Our construction builds on vectors of CRMs, and has interpretable parameters, each node being assigned a vector representing its level of affiliation to some latent communities. We develop methods for simulating this class of random graphs, as well as to perform posterior inference. We show that the proposed approach can recover interpretable structure from two real-world networks and can handle graphs with thousands of nodes and tens of thousands of edges.
Conference Paper
Learning redundant dictionaries for sparse representation from sets of patches has proven its efficiency in solving inverse problems. In many methods, the size of the dictionary is fixed in advance. Moreover the optimization process often calls for the prior knowledge of the noise level to tune parameters. We propose a Bayesian non parametric approach which is able to learn a dictionary of adapted size : the adequate number of atoms is inferred thanks to an Indian Buffet Process prior. The noise level is also accurately estimated so that nearly no parameter tuning is needed. Numerical experiments illustrate the relevance of the resulting dictionaries.
Conference Paper
Ill-posed inverse problems call for adapted models to define relevant solutions. Dictionary learning for sparse representation is often an efficient approach. In many methods, the size of the dictionary is fixed in advance and the noise level as well as regularization parameters need some tuning. Indian Buffet process dictionary learning (IBP-DL) is a Bayesian non parametric approach which permits to learn a dictionary with an adapted number of atoms. The noise and sparsity levels are also inferred so that the proposed approach is really non parametric: no parameters tuning is needed. This work adapts IBP-DL to the problem of image inpainting by proposing an accelerated collapsed Gibbs sampler. Experimental results illustrate the relevance of this approach.
Conference Paper
We derive a stochastic EM algorithm for scalable dictionary learning with the beta-Bernoulli process, a Bayesian nonpara-metric prior that learns the dictionary size in addition to the sparse coding of each signal. The core EM algorithm provides a new way for doing inference in nonparametric dictionary learning models and has a close similarity to other sparse coding methods such as K-SVD. Our stochastic extension for handling large data sets is closely related to stochastic variational inference, with the stochastic update for one parameter exactly that found using SVI. We show our algorithm compares well with K-SVD and total variation minimization on a denoising problem using several images.
Article
Dirichlet process (DP) mixture models are the cornerstone of non-parametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of non-parametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow,and it is important to explore alternatives.One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003).Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias2000; Ghahramani and Beal 2001; Bleietal .2003).In this paper, we present a variational inference algorithm for DP mixtures.We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.
Article
We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks.
Article
The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries-stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l(1) norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.
Article
Despite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. Our algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent. In our experiments, we show that our algorithm can achieve the same level of prediction accuracy as Gibbs sampling an order of magnitude faster. We also show that our method reduces the prediction error as fast as distributed stochastic gradient descent, achieving a 4.1% improvement in RMSE for the Netflix dataset and an 1.8% for the Yahoo music dataset.
Article
Bridge regression, a special family of penalized regressions of a penalty function Σ|βj|γ with γ ≤ 1, considered. A general approach to solve for the bridge estimator is developed. A new algorithm for the lasso (γ = 1) is obtained by studying the structure of the bridge estimators. The shrinkage parameter γ and the tuning parameter λ are selected via generalized cross-validation (GCV). Comparison between the bridge model (γ ≤ 1) and several other shrinkage models, namely the ordinary least squares regression (λ = 0), the lasso (γ = 1) and ridge regression (γ = 2), is made through a simulation study. It is shown that the bridge regression performs well compared to the lasso and ridge regression. These methods are demonstrated through an analysis of a prostate cancer data. Some computational advantages and limitations are discussed.
Article
A rich and flexible class of random probability measures, which we call stick-breaking priors, can be constructed using a sequence of independent beta random variables. Examples of random measures that have this characterization include the Dirichlet process, its two-parameter extension, the two-parameter Poisson-Dirichlet process, finite dimensional Dirichlet priors, and beta two-parameter processes. The rich nature of stick-breaking priors offers Bayesians a useful class of priors for nonparametric problems, while the similar construction used in each prior can be exploited to develop a general computational procedure for fitting them. In this article we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies to stick-breaking priors with a known Polya urn characterization, that is, priors with an explicit and simple prediction rule. Our second method, the blocked Gibbs sampler, is based on an entirely different approach that works by directly sampling values from the posterior of the random measure. The blocked Gibbs sampler can be viewed as a more general approach because it works without requiring an explicit prediction rule. We find that the blocked Gibbs avoids some of the limitations seen with the Polya urn approach and should be simpler for nonexperts to use.
Article
The problem of training a dictionary for sparse representations from a given dataset is receiving a lot of attention mainly due to its applications in the fields of coding, classification and pattern recognition. One of the open questions is how to choose the number of atoms in the dictionary: if the dictionary is too small then the representation errors are big and if the dictionary is too big then using it becomes computationally expensive. In this letter, we solve the problem of computing efficient dictionaries of reduced size by a new design method, called Stagewise K-SVD, which is an adaptation of the popular K-SVD algorithm. Since K-SVD performs very well in practice, we use K-SVD steps to gradually build dictionaries that fulfill an imposed error constraint. The conceptual simplicity of the method makes it easy to apply, while the numerical experiments highlight its efficiency for different overcomplete dictionaries.
Article
Mixture models are ubiquitous in applied science. In many real-world applications, the number of mixture components needs to be estimated from the data. A popular approach consists of using information criteria to perform model selection. Another approach which has become very popular over the past few years consists of using Dirichlet processes mixture (DPM) models. Both approaches are computationally intensive. The use of information criteria requires computing the maximum likelihood parameter estimates for each candidate model whereas DPM are usually trained using Markov chain Monte Carlo (MCMC) or variational Bayes (VB) methods. We propose here original batch and recursive expectation-maximization algorithms to estimate the parameters of DPM. The performance of our algorithms is demonstrated on several applications including image segmentation and image classification tasks. Our algorithms are computationally much more efficient than MCMC and VB and outperform VB on an example.
Article
This paper shows a method for the modeling of speech signal distributions based on Dirichlet Process Mixtures (DPM) and the estimation of noise sequences based on particle filtering. In real situations, the speech recognition rate degrades miserably because of the effect of environmental noises, reflected waves and so on. To improve the speech recognition rate, a technique for the estimation of noise sequences is necessary. In this paper, the distribution of the clean speech is modeled using the DPM instead of the traditional model, which is Gaussian Mixture Model (GMM). Speech signal sequences are generated according to the mean and covariance generated from the DPM. Then, noise signal sequences are estimated with a particle filter. The proposed method can improve the speech recognition rate significantly in the low SNR region.
Article
Ever-increasing computational power, along with ever-more sophisticated statistical computing techniques, is making it possible to fit ever-more complex statistical models. Among the more computationally intensive methods, the Gibbs sampler is popular because of its simplicity and power to effectively generate samples from a high-dimensional probability distribution. Despite its simple implementation and description, however, the Gibbs sampler is criticized for its sometimes slow convergence, especially when it is used to fit highly structured complex models. Here we present partially collapsed Gibbs sampling strategies that improve the convergence by capitalizing on a set of functionally incompatible conditional distributions. Such incompatibility generally is avoided in the construction of a Gibbs sampler, because the resulting convergence properties are not well understood. We introduce three basic tools (marginalization, permutation, and trimming) that allow us to transform a Gibbs sampler into a partially collapsed Gibbs sampler with known stationary distribution and faster convergence.
Article
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRI) from highly undersampled k-space data. Our model uses the beta process as a nonparametric prior for dictionary learning, in which an image patch is a sparse combination of dictionary elements. The size of the dictionary and the patch-specific sparsity pattern is inferred from the data, in addition to all dictionary learning variables. Dictionary learning is performed as part of the image reconstruction process, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model. We derive a stochastic optimization algorithm based on Markov Chain Monte Carlo (MCMC) sampling for the Bayesian model, and use the alternating direction method of multipliers (ADMM) for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
Article
In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Conference Paper
We describe a recursive algorithm to compute representations of functions with respect to nonorthogonal and possibly overcomplete dictionaries of elementary building blocks e.g. affine (wavelet) frames. We propose a modification to the matching pursuit algorithm of Mallat and Zhang (1992) that maintains full backward orthogonality of the residual (error) at every step and thereby leads to improved convergence. We refer to this modified algorithm as orthogonal matching pursuit (OMP). It is shown that all additional computation required for the OMP algorithm may be performed recursively
Article
Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for image recovery. In the context of compressive sensing, significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Additionally, we consider simpler (incomplete) measurements, defined by measuring a subset of image pixels, uniformly selected at random. Spatial interrelationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other methods in the literature.
Article
In this paper, we present a multi-scale dictionary learning paradigm for sparse and redundant signal representations. The appeal of such a dictionary is obvious-in many cases data naturally comes at different scales. A multi-scale dictionary should be able to combine the advantages of generic multi-scale representations (such as Wavelets), with the power of learned dictionaries, in capturing the intrinsic characteristics of a family of signals. Using such a dictionary would allow representing the data in a more efficient, i.e., sparse, manner, allowing applications to take a more global look at the signal. In this paper, we aim to achieve this goal without incurring the costs of an explicit dictionary with large atoms. The K-SVD using Wavelets approach presented here applies dictionary learning in the analysis domain of a fixed multi-scale operator. This way, sub-dictionaries at different data scales, consisting of small atoms, are trained. These dictionaries can then be efficiently used in sparse coding for various image processing applications, potentially outperforming both single-scale trained dictionaries and multi-scale analytic ones. In this paper, we demonstrate this construction and discuss its potential through several experiments performed on fingerprint and coastal scenery images.
Article
Improving the modeling of natural images is important to go beyond the state-of-the-art for many image processing tasks such as compression, denoising, inverse problems, and texture synthesis. Natural images are composed of intricate patterns such as regular areas, edges, junctions, oriented oscillations, and textures. Processing efficiently such a wide range of regularities requires methods that are adaptive to the geometry of the image. This adaptivity can be achieved using sparse representations in a redundant dictionary. The geometric adaptivity is important to search for efficient representations in a structured dictionary. Another way to capture this geometry is through non-local interactions between patches in the image. The resulting non-local energies can be used to perform an adaptive image restoration. This paper reviews these emerging technics and shows the interplay between sparse and non-local regularizations.
Article
We describe methods for learning dictionaries that are appropriate for the representation of given classes of signals and multisensor data. We further show that dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analy sis or classification when the learning includes a class separability criteria in the objective function. The benefits of dictionary learning clearly show that a proper understanding of causes underlying the sensed world is key to task-specific representation of relevant information in high-dimensional data sets.
Article
We demonstrate a simple greedy algorithm that can reliably recover a vector v ?? ??d from incomplete and inaccurate measurements x = ?? v + e . Here, ?? is a N x d measurement matrix with N <<d, and e is an error vector. Our algorithm, Regularized Orthogonal Matching Pursuit (ROMP), seeks to provide the benefits of the two major approaches to sparse recovery. It combines the speed and ease of implementation of the greedy methods with the strong guarantees of the convex programming methods. For any measurement matrix ?? that satisfies a quantitative restricted isometry principle, ROMP recovers a signal v with O ( n ) nonzeros from its inaccurate measurements x in at most n iterations, where each iteration amounts to solving a least squares problem. The noise level of the recovery is proportional to ??{log n } || e ||2. In particular, if the error term e vanishes the reconstruction is exact.
Conference Paper
Sparse signal representation from overcomplete dictionaries have been extensively investigated in recent research, leading to state-of-the-art results in signal, image and video restoration. One of the most important issues is involved in selecting the proper size of dictionary. However, the related guidelines are still not established. In this paper, we tackle this problem by proposing a so-called sub clustering K-SVD algorithm. This approach incorporates the subtractive clustering method into K-SVD to retain the most important atom candidates. At the same time, the redundant atoms are removed to produce a well-trained dictionary. As for a given dataset and approximation error bound, the proposed approach can deduce the optimized size of dictionary, which is greatly compressed as compared with the one needed in the K-SVD algorithm.
Article
Image segmentation using Markov random fields involves parameter estimation in hidden Markov models for which the EM algorithm is widely used. In practice, difficulties arise due to the dependence structure in the models and approximations are required. Using ideas from the mean field approximation principle, we propose a class of EM-like algorithms in which the computation reduces to dealing with systems of independent variables. Within this class, the simulated field algorithm is a new stochastic algorithm which appears to be the most promising for its good performance and speed, on synthetic and real image experiments.
Conference Paper
Article
S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Conference Paper
The Indian buffet process (IBP) is an exchangeable distribution over binary ma- trices used in Bayesian nonparametric featural models. In this paper we propose a three-parameter generalization of the IBP exhibiting power-law behavior. We achieve this by generalizing the beta process (the de Finetti measure of the IBP) to the stable-beta process and deriving the IBP corresponding to it. We find interest- ing relationships between the stable-beta process and the Pitman-Yor process (an- other stochastic process used in Bayesian nonparametric models with interesting power-law properties). We derive a stick-breaking construction for the stable-beta process, and find that our power-law IBP is a good model for word occurrences in document corpora.
Conference Paper
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting stan- dard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process ñ the Pitman-Yor process ñ as an adaptor justies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. In this paper, we introduce a framework for developing generative models for language that produce power-law distributions. Our framework is based upon the idea of specifying language models in terms of two components: a generator, an underlying generative model for words which need not (and usually does not) produce a power-law distribution, and an adaptor, which transforms the stream of words produced by the generator into one whose frequencies obey a power law distribution. This framework is extremely general: any gen- erative model for language can be used as a generator, with the power-law distribution being produced as the result of making an appropriate choice for the adaptor. In our framework, estimation of the parameters of the generator will be affected by assump- tions about the form of the adaptor. We show that use of a particular adaptor, the Pitman- Yor process (4, 5, 6), sheds light on a tension exhibited by formal approaches to natural language: whether explanations should be based upon the types of words that languages exhibit, or the frequencies with which tokens of those words occur. One place where this
Conference Paper
Many unsupervised learning problems can be expressed as a form of matrix fac- torization, reconstructing an observed data matrix as the p roduct of two matrices of latent variables. A standard challenge in solving these p roblems is determining the dimensionality of the latent matrices. Nonparametric Bayesian matrix factor- ization is one way of dealing with this challenge, yielding a posterior distribution over possible factorizations of unbounded dimensionality. A drawback to this ap- proach is that posterior estimation is typically done using Gibbs sampling, which can be slow for large problems and when conjugate priors cannot be used. As an alternative, we present a particle filter for posterior esti mation in nonparametric Bayesian matrix factorization models. We illustrate this a pproach with two matrix factorization models and show favorable performance relative to Gibbs sampling.
Conference Paper
Abstract Sparse signal models have been the focus of much recent research, leading to (or improving upon) state-of-the-art re- sults in signal, image, and video restoration. This articleex- tends this line of research into a novel framework,for local image discrimination tasks, proposing an energy formula- tion with both sparse reconstruction and class discrimina- tion components, jointly optimized during dictionary learn- ing. This approach improves over the state of the art in tex- ture segmentation experiments using the Brodatz database, and it paves the way for a novel scene analysis and recogni- tion framework,based on simultaneously learning discrimi- native and reconstructive dictionaries. Preliminary results in this direction using examples from the Pascal VOC06 and Graz02 datasets are presented as well.
Conference Paper
Low-rank matrix approximation methods provide one of the simplest and most effective approaches to collaborative filtering. Such models are usually fitted to data by finding a MAP estimate of the model parameters, a procedure that can be performed efficiently even on very large datasets. However, un- less the regularization parameters are tuned carefully, this approach is prone to overfit- ting because it finds a single point estimate of the parameters. In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. We show that Bayesian PMF models can be efficiently trained us- ing Markov chain Monte Carlo methods by applying them to the Netflix dataset, which consists of over 100 million movie ratings. The resulting models achieve significantly higher prediction accuracy than PMF models trained using MAP estimation.
Article
The following problem is considered: given a matrix A in Rm×n, (m rows and n columns), a vector b in Rm, and ε > 0, compute a vector x satisfying ∥Ax - b∥2 ≤ ε if such exists, such that x has the fewest number of non-zero entries over all such vectors. It is shown that the problem is NP-hard, but that the well-known greedy heuristic is good in that it computes a solution with at most [18 Opt(ε/2)∥A+∥22 ln(∥b∥2/ε)] non-zero entries, where Opt(ε/2) is the optimum number of nonzero entries at error ε/2, A is the matrix obtained by normalizing each column of A with respect to the L2 norm, and A+ is its pseudo-inverse.