Article

A Bayesian hierarchical model for related densities using Polya trees

Authors:
  • Independent Researcher
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Bayesian hierarchical models are used to share information between related samples and obtain more accurate estimates of sample-level parameters, common structure, and variation between samples. When the parameter of interest is the distribution or density of a continuous variable, a hierarchical model for distributions is required. A number of such models have been described in the literature using extensions of the Dirichlet process and related processes, typically as a distribution on the parameters of a mixing kernel. We propose a new hierarchical model based on the Polya tree, which allows direct modeling of densities and enjoys some computational advantages over the Dirichlet process. The Polya tree also allows more flexible modeling of the variation between samples, providing more informed shrinkage and permitting posterior inference on the dispersion function, which quantifies the variation among sample densities. We also show how the model can be extended to cluster samples in situations where the observed samples are believed to have been drawn from several latent populations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The collection of beta concentration parameters {c(A) : A ∈ T } plays a critical role in regularizing the smoothness of f relative to the base density h. Previous works (Wong and Ma, 2010;Ma, 2017;Christensen and Ma, 2019) have shown that inference can be substantially enhanced under PT by incorporating these concentration parameters into the modeling and learning them based on the data. They introduce a technique that allows adaptive inference on these concentration parameters to characterize potentially spatially heterogeneous features of F while maintaining computational tractability using latent variables with a firstorder Markov model. ...
Preprint
Full-text available
Tree-based models for probability distributions are usually specified using a predetermined, data-independent collection of candidate recursive partitions of the sample space. To characterize an unknown target density in detail over the entire sample space, candidate partitions must have the capacity to expand deeply into all areas of the sample space with potential non-zero sampling probability. Such an expansive system of partitions often incurs prohibitive computational costs and makes inference prone to overfitting, especially in regions with little probability mass. Existing models typically make a compromise and rely on relatively shallow trees. This hampers one of the most desirable features of trees, their ability to characterize local features, and results in reduced statistical efficiency. Traditional wisdom suggests that this compromise is inevitable to ensure coherent likelihood-based reasoning, as a data-dependent partition system that allows deeper expansion only in regions with more observations would induce double dipping of the data and thus lead to inconsistent inference. We propose a simple strategy to restore coherency while allowing the candidate partitions to be data-dependent, using Cox's partial likelihood. This strategy parametrizes the tree-based sampling model according to the allocation of probability mass based on the observed data, and yet under appropriate specification, the resulting inference remains valid. Our partial likelihood approach is broadly applicable to existing likelihood-based methods and in particular to Bayesian inference on tree-based models. We give examples in density estimation in which the partial likelihood is endowed with existing priors on tree-based models and compare with the standard, full-likelihood approach. The results show substantial gains in estimation accuracy and computational efficiency from using the partial likelihood.
... It is possible to include a dispersion parameter, scalar or infinite-dimensional, which influences how samples within the same populations differ and then to define the completely random measure as µj µj (Y)×Ω where Ω is the space on which the dispersion parameter is defined. Christensen and Ma (2020) propose a special case where the dispersion parameter is provided with a hyperprior. ...
Preprint
Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random object and define a prior distribution on it. This prior distribution may be induced by models on the observations, or directly defined for the partition. Several recent results, however, have shown the difficulties in consistently estimating the number of clusters, and, therefore, the partition. The problem itself of summarising the posterior distribution on the partition remains open, given the large dimension of the partition space. This work aims at reviewing the Bayesian approaches available in the literature to perform clustering, presenting advantages and disadvantages of each of them in order to suggest future lines of research.
... Notably, Wong and Ma introduced in [173] a flexible alternative to standard PTs that they call Optional Pólya Trees (OPTs in the sequel), which have been successfully extended and applied to a number of settings in e.g. [114,83,113,110,42]. Yet, from the theoretical point of view, only posterior consistency was established in [173] and follow-up works. ...
Thesis
Modern data analysis provides scientists with statistical and machine learning algorithmswith impressive performance. In front of their extensive use to tackle problems of constantlygrowing complexity, there is a real need to understand the conditions under which algorithmsare successful or bound to fail. An additional objective is to gain insights into the design ofnew algorithmic methods able to tackle more innovative and challenging tasks. A naturalframework for developing a mathematical theory of these methods is nonparametric inference.This area of Statistics is concerned with inferences of unknown quantities of interest underminimal assumptions, involving an infinite-dimensional statistical modeling of a parameteron the data-generating mechanism. In this thesis, we consider both problems of functionestimation and uncertainty quantification.The first class of algorithms we deal with are Bayesian tree-based methods. They are based ona ‘divide-and-conquer’ principle, partitioning a sample space to estimate the parameter locally.In regression, these methods include BCART and the renowned BART, the later being anensemble of trees or a forest. In density estimation, the famous Pólya Tree prior exemplifiesthese methods and is the building block of a myriad of related constructions. We proposea new extension, DPA, that is a ‘forest of PTs’ and is shown to attain minimax contractionrates adaptively in Hellinger distance for arbitrary Hölder regularities. Adaptive rates in thestronger supremum norm are also obtained for the flexible Optional Pólya Tree (OPT) prior, aBCART-type prior, for regularities smaller than one.Gaussian processes are another popular class of priors studied in Bayesian nonparametrics andMachine Learning. Motivated by the ever-growing size of datasets, we propose a new horseshoeGaussian process with the aim to adapt to leverage a data structure of smaller dimension.First, we derive minimax optimal contraction rates for its tempered posterior. Secondly, deepGaussian processes are Bayesian counterparts to the famous deep neural networks. We provethat, as a building block in such a deep framework, it also gives optimal adaptive rates undercompositional structure assumptions on the parameter.As for uncertainty quantification (UQ), Bayesian methods are often praised for the principledsolution they offer with the definition of credible sets. We prove that OPT credible setsare confidence sets with good coverage and size (in supremum norm) under qualitativeself-similarity conditions. Moreover, we conduct a theoretical study of UQ in Wassersteindistances Wp, uncovering a new phenomenon. In dimensions smaller than 4, it is possible toconstruct confidence sets whose Wp-radii, p<=2, adapt to any regularities (with no qualitativeassumptions). This starkly contrasts the usual Lp theory, where concessions always have to bemade.
... Alternatively, it is possible to assume intermediate forms of sharing, which are between complete pooling and no sharing at all, by considering a hierarchical prior among the different trees. This form of sharing can be achieved by using a Hierarchical PT (Christensen & Ma, 2020) and defining the model by centering the PT in each group over a common PT prior. An alternative definition is to use a logistic PT (Jara & Hanson, 2011), which makes use of the logistic normal in place of the Beta distributions. ...
Article
Full-text available
Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analysing these data. Although these models have been parameterised and fitted using different approaches, they have all been designed to either model the pattern with which individuals enter and/or exit the population, or to estimate the population size by accounting for the corresponding observation process, or both. However, existing approaches rely on a predefined model structure and complexity, either by assuming that parameters linked to the entry and exit pattern (EEP) are specific to sampling occasions, or by employing parametric curves to describe the EEP. Instead, we propose a novel Bayesian nonparametric framework for modelling EEPs based on the Polya Tree (PT) prior for densities. Our Bayesian non‐parametric approach avoids overfitting when inferring EEPs, while simultaneously allowing more flexibility than is possible using parametric curves. Finally, we introduce the replicate PT prior for defining classes of models for these data allowing us to impose constraints on the EEPs, when required. We demonstrate our new approach using capture‐recapture, count and ring‐recovery data for two different case studies. This article is protected by copyright. All rights reserved
... See also Soriano and Ma (2017) for related work. Interesting alternatives that extend the analysis to more than two populations can be found in Christensen and Ma (2020), Lijoi, Prünster, and Rebaudo (2022) and in Beraha, Guglielmi, and Quintana (2021). Another similar proposal is the one by Gutiérrez et al. (2019), whose model identifies differences over cases' distributions and the control group. ...
Preprint
Full-text available
Hypertensive disorders of pregnancy occur in about 10% of pregnant women around the world. Though there is evidence that hypertension impacts maternal cardiac functions, the relation between hypertension and cardiac dysfunctions is only partially understood. The study of this relationship can be framed as a joint inferential problem on multiple populations, each corresponding to a different hypertensive disorder diagnosis, that combines multivariate information provided by a collection of cardiac function indexes. A Bayesian nonparametric approach seems particularly suited for this setup and we demonstrate it on a dataset consisting of transthoracic echocardiography results of a cohort of Indian pregnant women. We are able to perform model selection, provide density estimates of cardiac function indexes and a latent clustering of patients: these readily interpretable inferential outputs allow to single out modified cardiac functions in hypertensive patients compared to healthy subjects and progressively increased alterations with the severity of the disorder. The analysis is based on a Bayesian nonparametric model that relies on a novel hierarchical structure, called symmetric hierarchical Dirichlet process. This is suitably designed so that the mean parameters are identified and used for model selection across populations, a penalization for multiplicity is enforced, and the presence of unobserved relevant factors is investigated through a latent clustering of subjects. Posterior inference relies on a suitable Markov Chain Monte Carlo algorithm and the model behaviour is also showcased on simulated data.
... Notably, Wong and Ma introduced in [39] a flexible alternative to standard PTs that they call Optional Pólya Trees (OPTs in the sequel), which have been successfully extended and applied to a number of settings in e.g. [31,25,30,28,14]. Yet, from the theoretical point of view, only posterior consistency was established in [39] and follow-up works. ...
Preprint
Full-text available
We consider statistical inference in the density estimation model using a tree-based Bayesian approach, with Optional P\'olya trees as prior distribution. We derive near-optimal convergence rates for corresponding posterior distributions with respect to the supremum norm. For broad classes of H\"older-smooth densities, we show that the method automatically adapts to the unknown H\"older regularity parameter. We consider the question of uncertainty quantification by providing mathematical guarantees for credible sets from the obtained posterior distributions, leading to near-optimal uncertainty quantification for the density function, as well as related functionals such as the cumulative distribution function. The results are illustrated through a brief simulation study.
... This topic is very actual and different effective methods have been developed in this area. Besides using the Dirichlet process mixtures, some researchers are focused on constructing hierarchical models based on the Pólya tree (Christensen, 2019). Another method is based on separating marginal and joint distribution by using copula transform (Majdara, 2019). ...
... Alternatively, it is possible to assume intermediate forms of sharing, which are between complete pooling and no sharing at all, by considering a hierarchical prior among the different trees. This form of sharing can be achieved by using a Hierarchical PT (Christensen & Ma, 2020) and defining the model by centering the PT in each group over a common PT prior. An alternative definition is to use a logistic PT (Jara & Hanson, 2011), which makes use of the logistic normal in place of the Beta distributions. ...
Preprint
Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analysing these data. Although these models have been parameterised and fitted using different approaches, they have all been designed to model the pattern with which individuals enter and exit the population and to estimate the population size. However, existing approaches rely on a predefined model structure and complexity, either by assuming that parameters are specific to sampling occasions, or by employing parametric curves. Instead, we propose a novel Bayesian nonparametric framework for modelling entry and exit patterns based on the Polya Tree (PT) prior for densities. Our Bayesian non-parametric approach avoids overfitting when inferring entry and exit patterns while simultaneously allowing more flexibility than is possible using parametric curves. We apply our new framework to capture-recapture, count and ring-recovery data and we introduce the replicated PT prior for defining classes of models for these data. Additionally, we define the Hierarchical Logistic PT prior for jointly modelling related data and we consider the Optional PT prior for modelling long time series of data. We demonstrate our new approach using five different case studies on birds, amphibians and insects.
Article
We develop a nonparametric Bayesian prior for a family of random probability measures by extending the Polya tree (\mbox{PT}) prior to a joint prior for a set of probability measures G1,,GnG_1,\dots ,G_n, suitable for meta-analysis with event-time outcomes. In the application to meta-analysis, GiG_i is the event-time distribution specific to study i. The proposed model defines a regression on study-specific covariates by introducing increased correlation for any pair of studies with similar characteristics. The desired multivariate \mbox{PT} model is constructed by introducing a hierarchical prior on the conditional splitting probabilities in the \mbox{PT} construction for each of the GiG_i. The hierarchical prior replaces the independent beta priors for the splitting probability in the PT construction with a Gaussian process prior for corresponding (logit) splitting probabilities across all studies. The Gaussian process is indexed by study-specific covariates, introducing the desired dependence with increased correlation for similar studies. The main feature of the proposed construction is (conditionally) conjugate posterior updating with commonly reported inference summaries for event-time data. The construction is motivated by a meta-analysis over cancer immunotherapy studies.
Article
The Pólya tree (PT) process is a general-purpose Bayesian nonparametric model that has found wide application in a range of inference problems. It has a simple analytic form and the posterior computation boils down to beta-binomial conjugate updates along a partition tree over the sample space. Recent development in PT models shows that performance of these models can be substantially improved by (i) allowing the partition tree to adapt to the structure of the underlying distributions and (ii) incorporating latent state variables that characterize local features of the underlying distributions. However, important limitations of the PT remain, including (i) the sensitivity in the posterior inference with respect to the choice of the partition tree, and (ii) the lack of scalability with respect to dimensionality of the sample space. We consider a modeling strategy for PT models that incorporates a flexible prior on the partition tree along with latent states with Markov dependency. We introduce a hybrid algorithm combining sequential Monte Carlo (SMC) and recursive message passing for posterior sampling that can scale up to 100 dimensions. While our description of the algorithm assumes a single computer environment, it has the potential to be implemented on distributed systems to further enhance the scalability. Moreover, we investigate the large sample properties of the tree structures and latent states under the posterior model. We carry out extensive numerical experiments in density estimation and two-group comparison, which show that flexible partitioning can substantially improve the performance of PT models in both inference tasks. We demonstrate an application to a mass cytometry dataset with 19 dimensions and over 200,000 observations. Supplementary Materials for this article are available online.
Article
The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed‐form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by‐product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data.
Preprint
Full-text available
The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed-form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by-product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data.
Chapter
Many ecological sampling schemes do not allow for unique marking of individuals. Instead, only counts of individuals detected on each sampling occasion are available. In this paper, we propose as novel approach for modelling count data in an open population where individuals can arrive and depart from the site during the sampling period. A Bayesian nonparametric prior, known as Polya Tree, is used for modelling the bivariate density of arrival and departure times. Thanks to this choice, we can easily incorporate prior information on arrival and departure density while still allowing the model to flexibly adjust the posterior inference according to the observed data. Moreover, the model provides great scalability as the complexity does not depend on the population size but just on the number of sampling occasions, making it particularly suitable for data-sets with high numbers of detections. We apply the new model to count data of newts collected by the Durrell Institute of Conservation and Ecology, University of Kent.
Article
Full-text available
This paper describes a new class of dependent random measures which we call compound random measure and the use of normalized versions of these random measures as priors in Bayesian nonparametric mixture models. Their tractability allows the properties of both compound random measures and normalized compound random measures to be derived. In particular, we show how compound random measures can be constructed with gamma, stable and generalized gamma process marginals. We also derive several forms of the Laplace exponent and characterize dependence through both the Levy copula and correlation function. A slice sampling algorithm for inference when a normalized compound random measure is used as the mixing measure in a nonparametric mixture model is described and the algorithm is applied to a data example.
Article
Full-text available
A methodology for the simultaneous Bayesian non-parametric modelling of several distributions is developed. Our approach uses normalized random measures with independent increments and builds dependence through the superposition of shared processes. The properties of the prior are described and the modelling possibilities of this framework are explored in detail. Efficient slice sampling methods are developed for inference. Various posterior summaries are introduced which allow better understanding of the differences between distributions. The methods are illustrated on simulated data and examples from survival analysis and stochastic frontier analysis.
Chapter
Full-text available
Abstract Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that parameters are endowed,with distributions which may themselves introduce new parameters, and this construction recurses. In this review we discuss the role of hierarchical modeling in Bayesian non- parametrics, focusing on models in which the innite-dimensional parame- ters are treated hierarchically. For example, we consider a model in which the base measure for a Dirichlet process is itself treated as a draw from another Dirichlet process. This yields a natural recursion that we refer to as a hierarchical Dirichlet process. We also discuss hierarchies based on the Pitman-Yor process and,on completely random,processes. We demonstrate the value of these hierarchical constructions in a wide range of practical applications, in problems in computational biology, computer vision and natural language processing.
Article
Full-text available
We propose a class of dependent processes in which density shape is regressed on one or more predictors through conditional tail-free probabilities by using transformed Gaussian processes. A particular linear version of the process is developed in detail. The resulting process is flexible and easy to fit using standard algorithms for generalized linear models. The method is applied to growth curve analysis, evolving univariate random effects distributions in generalized linear mixed models, and median survival modelling with censored data and covariate-dependent errors.
Article
Full-text available
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stick-breaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
Article
Full-text available
Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
Article
Full-text available
In this article we describe Bayesian nonparametric procedures for two-sample hypothesis testing. Namely, given two sets of samples y^{(1)} iid F^{(1)} and y^{(2)} iid F^{(2)}, with F^{(1)}, F^{(2)} unknown, we wish to evaluate the evidence for the null hypothesis H_{0}:F^{(1)} = F^{(2)} versus the alternative. Our method is based upon a nonparametric Polya tree prior centered either subjectively or using an empirical procedure. We show that the Polya tree prior leads to an analytic expression for the marginal likelihood under the two hypotheses and hence an explicit measure of the probability of the null Pr(H_{0}|y^{(1)},y^{(2)}).
Article
Full-text available
Trees of Polya urns are used to generate sequences of exchangeable random variables. By a theorem of de Finetti each such sequence is a mixture of independent, identically distributed variables and the mixing measure can be viewed as a prior on distribution functions. The collection of these Polya tree priors forms a convenient conjugate family which was mentioned by Ferguson and includes the Dirichlet processes of Ferguson. Unlike Dirichlet processes, Polya tree priors can assign probability 1 to the class of continuous distributions. This property and a few others are investigated.
Article
Full-text available
Expression of the type II voltage-dependent sodium channel gene is restricted to neurons by a silencer element active in nonneuronal cells. We have cloned cDNA coding for a transcription factor (REST) that binds to this silencer element. Expression of a recombinant REST protein confers the ability to silence type II reporter genes in neuronal cell types lacking the native REST protein, whereas expression of a dominant negative form of REST in nonneuronal cells relieves silencing mediated by the native protein. REST transcripts in developing mouse embryos are detected ubiquitously outside of the nervous system. We propose that expression of the type II sodium channel gene in neurons reflects a default pathway that is blocked in nonneuronal cells by the presence of REST.
Article
Full-text available
We consider dependent nonparametric models for related random probability distributions. For example, the random distributions might be indexed by a categorical covariate indicating the treatment levels in a clinical trial and might represent random effects distributions under the respective treatment combinations. We propose a model that describes dependence across random distributions in an analysis of variance (ANOVA)-type fashion. We define a probability model in such a way that marginally each random measure follows a Dirichlet process (DP) and use the dependent Dirichlet process to define the desired dependence across the related random measures. The resulting probability model can alternatively be described as a mixture of ANOVA models with a DP prior on the unknown mixing measure. The main features of the proposed approach are ease of interpretation and computational simplicity. Because the model follows the standard ANOVAstructure, interpretation and inference parallels conventions for ANOVA models. This includes the notion of main effects, interactions, contrasts, and the like. Of course, the analogies are limited to structure and interpretation. The actual objects of the inference are random distributions instead of the unknown normal means in standard ANOVA models. Besides interpretation and model structure, another important feature of the proposed approach is ease of posterior simulation. Because the model can be rewritten as a DP mixture of ANOVAmodels, it inherits all computational advantages of standard DP mixture models. This includes availability of efficient Gibbs sampling schemes for posterior simulation and ease of implementation of even high-dimensional applications. Complexity of implementing posterior simulation is - at least conceptually - dimension independent.
Article
Full-text available
Bayesian nonparametric inference is a relatively young area of research and it has recently undergone a strong development. Most of its success can be explained by the considerable degree of exibility it ensures in statistical modelling, if compared to parametric alternatives, and by the emergence of new and ecient simulation techniques that make nonparametric models amenable to concrete use in a number of applied statistical problems. Since its introduction in 1973 by T.S. Ferguson, the Dirichlet process has emerged as a cornerstone in Bayesian nonparametrics. Nonetheless, in some cases of interest for statistical applications the Dirichlet process is not an adequate prior choice and alternative nonparametric models need to be devised. In this paper we provide a review of Bayesian nonparametric models that go beyond the Dirichlet process.
Article
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.
Article
We introduce a marginal version of the nested Dirichlet process to cluster distributions or histograms. We apply the model to cluster genes by patterns of gene–gene interaction. The proposed approach is based on the nested partition that is implied in the original construction of the nested Dirichlet process. It allows simulation exact inference, as opposed to a truncated Dirichlet process approximation. More importantly, the construction highlights the nature of the nested Dirichlet process as a nested partition of experimental units. We apply the proposed model to inference on clustering genes related to DNA mismatch repair (DMR) by the distribution of gene–gene interactions with other genes. Gene–gene interactions are recorded as coefficients in an auto-logistic model for the co-expression of two genes, adjusting for copy number variation, methylation and protein activation. These coefficients are extracted from an online database, called Zodiac, computed based on The Cancer Genome Atlas (TCGA) data. We compare results with a variation of k-means clustering that is set up to cluster distributions, truncated NDP and a hierarchical clustering method. The proposed inference shows favorable performance, under simulated conditions and also in the real data sets.
Article
The prediction of future outcomes of a random phenomenon is typically based on a certain number of “analogous” observations from the past. When observations are generated by multiple samples, a natural notion of analogy is partial exchangeability and the problem of prediction can be effectively addressed in a Bayesian nonparametric setting. Instead of confining ourselves to the prediction of a single future experimental outcome, as in most treatments of the subject, we aim at predicting features of an unobserved additional sample of any size. We first provide a structural property of prediction rules induced by partially exchangeable arrays, without assuming any specific nonparametric prior. Then we focus on a general class of hierarchical random probability measures and devise a simulation algorithm to forecast the outcome of future observations, for any . The theoretical result and the algorithm are illustrated by means of a real dataset, which also highlights the “borrowing strength” behavior across samples induced by the hierarchical specification.
Article
We introduce a hierarchical generalization to the Polya tree that incorporates locally adaptive shrinkage to data features of different scales, while maintaining analytical simplicity and computational efficiency. Inference under the new model proceeds efficiently using general recipes for conjugate hierarchical models, and can be completed extremely efficiently for data sets with large numbers of observations. We illustrate in density estimation that the achieved adaptive shrinkage results in proper smoothing and substantially improves inference. We evaluate the performance of the model through simulation under several schematic scenarios carefully designed to be representative of a variety of applications. We compare its performance to that of the Polya tree, the optional Polya tree, and the Dirichlet process mixture. We then apply our method to a flow cytometry data with 455,472 observations to achieve fast estimation of a large number of univariate and multivariate densities, and investigate the computational properties of our method in that context. In addition, we establish theoretical guarantees for the model including absolute continuity, full nonparametricity, and posterior consistency. All proofs are given in the Supplementary Material (Ma, 2016).
Article
In the .author and van Eeden considered, as prior distributions for the cumulative, F, of the bio-assay problem, processes whose sample functions are, with probability one, distribution functions. The example we considered there had the undesirable property that its mean, E ( F ), was singular with respect to Lebesgue measure. In fact, Dubins and Freedman have shown that a class of such processes, which includes the example we considered, has sample functions F which are, with probability one, singular.
Article
We propose a multi-resolution scanning approach to identifying two-sample differences. Windows of multiple scales are constructed through nested dyadic partitioning on the sample space and a hypothesis regarding the two-sample difference is defined on each window. Instead of testing the hypotheses on different windows independently, we adopt a joint graphical model, namely a Markov tree, on the null or alternative states of these hypotheses to incorporate spatial correlation across windows. The induced dependence allows borrowing strength across nearby and nested windows, which we show is critical for detecting high resolution local differences. We evaluate the performance of the method through simulation and show that it substantially outperforms other state of the art two-sample tests when the two-sample difference is local, involving only a small subset of the data. We then apply it to a flow cytometry data set from immunology, in which it successfully identifies highly local differences. In addition, we show how to control properly for multiple testing in a decision theoretic approach as well as how to summarize and report the inferred two-sample difference. We also construct hierarchical extensions of the framework to incorporate adaptivity into the construction of the scanning windows to improve inference further. © 2016 The Royal Statistical Society and Blackwell Publishing Ltd.
Article
In this article, we propose a Bayesian non-parametric model for the analysis of multiple time series. We consider an autoregressive structure of order p for each of the series and borrow strength across the series by considering a common error population that is also evolving in time. The error populations (distributions) are assumed non-parametric whose law is based on a series of dependent Polya trees with zero median. This dependence is of order q and is achieved via a dependent beta process that links the branching probabilities of the trees. We study the prior properties and show how to obtain posterior inference. The model is tested under a simulation study and is illustrated with the analysis of the economic activity index of the 32 states of Mexico.
Article
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Article
Nonparametric and nonlinear measures of statistical dependence between pairs of random variables have proved themselves important tools in modern data analysis, where the emergence of large data sets can support the relaxation of linearity assumptions implicit in traditional association scores such as correlation. Recent proposals based around estimating information theoretic measures such as Mutual Information (MI) have been particularly popular. Here we describe a Bayesian nonparametric procedure that leads to a tractable, explicit and analytic quantification of the probability of dependence, using Polya tree priors on the space of probability measures. Our procedure can accommodate known uncertainty in the form of the underlying sampling distribution and provides an explicit posterior probability measure of both dependence and independence. Well known advantages of having an explicit probability measure include the easy comparison of evidence across different studies, the inclusion of prior information, and the integration of results within decision analysis.
Article
We discuss functional clustering procedures for nested designs, where multiple curves are collected for each subject in the study. We start by considering the application of standard functional clustering tools to this problem, which leads to groupings based on the average profile for each subject. After discussing some of the shortcomings of this approach, we present a mixture model based on a generalization of the nested Dirichlet process that clusters subjects based on the distribution of their curves. By using mixtures of generalized Dirichlet processes, the model induces a much more flexible prior on the partition structure than other popular model-based clustering methods, allowing for different rates of introduction of new clusters as the number of observations increases. The methods are illustrated using hormone profiles from multiple menstrual cycles collected for women in the Early Pregnancy Study.
Article
In the .author and van Eeden considered, as prior distributions for the cumulative, F, of the bio-assay problem, processes whose sample functions are, with probability one, distribution functions. The example we considered there had the undesirable property that its mean, E ( F ), was singular with respect to Lebesgue measure. In fact, Dubins and Freedman have shown that a class of such processes, which includes the example we considered, has sample functions F which are, with probability one, singular.
Article
Polya tree priors are random probability distributions that are easily centered at standard parametric families, such as the normal. As such, they provide a convenient avenue toward creating a parametric/nonparametric test statistic “blend” for the classic problem of testing whether data distributions are the same across several subpopulations. Test-statistics that are (empirical) Bayes factors constructed from independent Polya tree priors are proposed. The Polya tree centering distributions are Gaussian with parameters estimated from the data and the pp-values are obtained through the permutation of group membership indicators. Generalizations to censored and multivariate data are provided. The conceptually simple test statistic fares surprisingly well against competitors in simulations.
Article
The Dirichlet process mixture model and more general mixtures based on discrete random probability measures have been shown to be flexible and accurate models for density estimation and clustering. The goal of this paper is to illustrate the use of normalized random measures as mixing measures in nonparametric hierarchical mixture models and point out how possible computational issues can be successfully addressed. To this end, we first provide a concise and accessible introduction to normalized random measures with independent increments. Then, we explain in detail a particular way of sampling from the posterior using the Ferguson-Klass representation. We develop a thorough comparative analysis for location-scale mixtures that considers a set of alternatives for the mixture kernel and for the nonparametric component. Simulation results indicate that normalized random measure mixtures potentially represent a valid default choice for density estimation problems. As a byproduct of this study an R package to fit these models was produced and is available in the Comprehensive R Archive Network (CRAN).
Article
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Article
The comparison of two treatments generally falls into one of the following two categories: (a) we may have a number of replications for each of the two treatments, which are unpaired, or (b) we may have a number of paired comparisons leading to a series of differences, some of which may be positive and some negative. The appropriate methods for testing the significance of the differences of the means in these two cases are described in most of the textbooks on statistical methods.
Article
Wavelet-based statistical signal processing techniques such as denoising and detection typically model the wavelet coefficients as independent or jointly Gaussian. These models are unrealistic for many realworld signals. In this paper, we develop a new framework for statistical signal processing based on wavelet-domain hidden Markov models (HMMs) that concisely models the statistical dependencies and nonGaussian statistics encountered in real-world signals. Wavelet-domain HMMs are designed with the intrinsic properties of the wavelet transform in mind and provide powerful, yet tractable, probabilistic signal models. Efficient Expectation Maximization algorithms are developed for fitting the HMMs to observational signal data. The new framework is suitable for a wide range of applications, including signal estimation, detection, classification, prediction, and even synthesis. To demonstrate the utility of wavelet-domain HMMs, we develop novel algorithms for signal denoising, classificati...
Article
This paper presents a Bayesian non-parametric approach to survival analysis based on arbitrarily right censored data. The analysis is based on posterior predictive probabilities using a Polya tree prior distribution on the space of probability measures on [0, ∞). In particular we show that the estimate generalizes the classical Kaplanndash;Meier non-parametric estimator, which is obtained in the limiting case as the weight of prior information tends to zero.
Article
In multicenter studies, subjects in different centers may have different outcome distri- butions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered. Starting with a stick- breaking representation of the Dirichlet process (DP), we replace the random atoms with random prob- ability measures drawn from a DP. This results in a nested Dirichlet process (nDP) prior, which can be placed on the collection of distributions for the different centers, with centers drawn from the same DP component automatically clustered together. Theoretical properties are discussed, and an efficient MCMC algorithm is developed for computation. The methods are illustrated using a simulation study and an application to quality of care in US hospitals.
Conference Paper
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.
Article
Testing and characterizing the difference between two data samples is of fundamental interest in statistics. Existing methods such as Kolmogorov-Smirnov and Cramer-von-Mises tests do not scale well as the dimensionality increases and provides no easy way to characterize the difference should it exist. In this work, we propose a theoretical framework for inference that addresses these challenges in the form of a prior for Bayesian nonparametric analysis. The new prior is constructed based on a random-partition-and-assignment procedure similar to the one that defines the standard optional P\'olya tree distribution, but has the ability to generate multiple random distributions jointly. These random probability distributions are allowed to "couple", that is to have the same conditional distribution, on subsets of the sample space. We show that this "coupling optional P\'olya tree" prior provides a convenient and effective way for both the testing of two sample difference and the learning of the underlying structure of the difference. In addition, we discuss some practical issues in the computational implementation of this prior and provide several numerical examples to demonstrate its work.
Article
We introduce an extension of the Pólya tree approach for constructing distributions on the space of probability measures. By using optional stopping and optional choice of splitting variables, the construction gives rise to random measures that are absolutely continuous with piecewise smooth densities on partitions that can adapt to fit the data. The resulting "optional Pólya tree" distribution has large support in total variation topology and yields posterior distributions that are also optional Pólya trees with computable parameter values. Comment: Published in at http://dx.doi.org/10.1214/09-AOS755 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Article
With the proliferation of spatially oriented time-to-event data, spatial modeling in the survival context has received increased recent attention. A traditional way to capture a spatial pattern is to introduce frailty terms in the linear predictor of a semiparametric model, such as proportional hazards or accelerated failure time. We propose a new methodology to capture the spatial pattern by assuming a prior based on a mixture of spatially dependent Polya trees for the baseline survival in the proportional hazards model. Thanks to modern Markov chain Monte Carlo (MCMC) methods, this approach remains computationally feasible in a fully hierarchical Bayesian framework. We compare the spatially dependent mixture of Polya trees (MPT) approach to the traditional spatial frailty approach, and illustrate the usefulness of this method with an analysis of Iowan breast cancer survival data from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute. Our method provides better goodness of fit over the traditional alternatives as measured by log pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and full sample score (FSS) statistics.
Article
INTRODUCTION Identification of active gene regulatory elements is a key to understanding transcriptional control governing biological processes such as cell-type specificity, differentiation, development, proliferation, and response to the environment. Mapping DNase I hypersensitive (HS) sites has historically been a valuable tool for identifying all different types of regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions. This method utilizes DNase I to selectively digest nucleosome-depleted DNA (presumably by transcription factors), whereas DNA regions tightly wrapped in nucleosome and higher-order structures are more resistant. The traditional low-throughput method for identifying DNase I HS sites uses Southern blots. Here, we describe the complete and improved protocol for DNase-seq, a high-throughput method that identifies DNase I HS sites across the whole genome by capturing DNase-digested fragments and sequencing them by high-throughput, next-generation sequencing. In a single experiment, DNase-seq can identify most active regulatory regions from potentially any cell type, from any species with a sequenced genome.
Article
The Cramer-von Mises ω2\omega^2 criterion for testing that a sample, x1,,xNx_1, \cdots, x_N, has been drawn from a specified continuous distribution F(x) is \begin{equation*}\tag{1}\omega^2 = \int^\infty_{-\infty} \lbrack F_N(x) - F(x)\rbrack^2 dF(x),\end{equation*} where FN(x)F_N(x) is the empirical distribution function of the sample; that is, FN(x)=k/NF_N(x) = k/N if exactly k observations are less than or equal to x(k=0,1,,N)x(k = 0, 1, \cdots, N). If there is a second sample, y1,,yMy_1, \cdots, y_M, a test of the hypothesis that the two samples come from the same (unspecified) continuous distribution can be based on the analogue of Nω2N\omega^2, namely \begin{equation*}\tag{2} T = \lbrack NM/(N + M)\rbrack \int^\infty_{-\infty} \lbrack F_N(x) - G_M(x)\rbrack^2 dH_{N+M}(x),\end{equation*} where GM(x)G_M(x) is the empirical distribution function of the second sample and HN+M(x)H_{N+M}(x) is the empirical distribution function of the two samples together [that is, (N+M)HN+M(x)=NFN(x)+MGM(x)](N + M)H_{N+M}(x) = NF_N(x) + MG_M(x)\rbrack. The limiting distribution of Nω2N\omega^2 as NN \rightarrow \infty has been tabulated [2], and it has been shown ([3], [4a], and [7]) that T has the same limiting distribution as N,MN \rightarrow \infty, M \rightarrow \infty, and N/MλN/M \rightarrow \lambda, where λ\lambda is any finite positive constant. In this note we consider the distribution of T for small values of N and M and present tables to permit use of the criterion at some conventional significance levels for small values of N and M. The limiting distribution seems a surprisingly good approximation to the exact distribution for moderate sample sizes (corresponding to the same feature for Nω2N\omega^2 [6]). The accuracy of approximation is better than in the case of the two-sample Kolmogorov-Smirnov statistic studied by Hodges [4].
Article
Doob (1949) obtained a very general result on the consistency of Bayes' estimates. Loosely, if any consistent estimates are available, then the Bayes' estimates are consistent for almost all values of the parameter under the prior measure. If the parameter is thought of as being selected by nature through a random mechanism whose probability law is known, Doob's result is completely satisfactory. On the other hand, in some circumstances it is necessary to identify the exceptional null set. For example, if the parameter is thought of as fixed but unknown, and the prior measure is chosen as a convenient way to calculate estimates, it is important to know for which null set the method fails. In particular, it is desirable to choose the prior so that the null set is in fact empty. The problem is very delicate; considerable work [8], [9], [12] has been done on it recently, in quite general contexts and under severe regularity assumptions. It might therefore be of interest to discuss the simplest possible case, that of independent, identically distributed, discrete observations, in some detail. This will be done in Sections 3 and 4 when the observations take a finite set of possible values. Under this assumption, Section 3 shows that the posterior probability converges to point mass at the true parameter value among almost all sample sequences (for short, the posterior is consistent; see Definition 1) exactly for parameter values in the topological carrier of the prior. In Section 4, the asymptotic normality of the posterior is shown to follow from a local smoothness assumption about the prior. In both sections, results are obtained for priors which admit the possibility of an infinite number of states. The results of these sections are not entirely new; see pp. 333 ff. of [7], pp. 224 ff. of [10], [11]. They have not appeared in the literature, to the best of our knowledge, in a form as precise as Theorems 1, 3, 4. Theorem 2 is essentially the relevant special case of Theorem 7.4 of Schwartz (1961). In Sections 5 and 6, the case of a countable number of possible values is treated. We believe the results to be new. Here the general problem appears, because priors which assign positive mass near the true parameter value may lead to ridiculous estimates. The results of Section 3 (let alone 4) are false. In fact, Theorem 5 of Section 5 gives the following construction. Suppose that under the true parameter value the observations take an infinite number of values with positive probability. Then given any spurious (sub-)stochastic probability distribution, it is possible to find a prior assigning positive mass to any neighborhood of the true parameter value, but leading to a posterior probability which converges for almost all sample sequences to point mass at the spurious distribution. Indeed, there is a prior assigning positive mass to every open set of parameters, for which the posterior is consistent only at a set of parameters of the first category. To some extent, this happens because at any stage information about a finite number of stages only is available, but on the basis of this evidence, conclusions must be drawn about all states. If the prior measure has a serious prejudice about the shape of the tails, disaster ensues. In Section 6, it is shown that a simple condition on the prior measure (which serves to limit this prejudice) ensures the consistency of the posterior. Prior probabilities leading to posterior distributions consistent at all and asymptotically normal at essentially all (see Remark 3, Section 3) parameter values are constructed. Section 5 is independent of Sections 3 and 4; Section 6 is not. Section 6 overlaps to some extent with unpublished work of Kiefer and Wolfowitz; it has been extended in certain directions by Fabius (1963). The results of this paper were announced in [5]; some related work for continuous state space is described in [3]. It is a pleasure to thank two very helpful referees: whatever expository merit Section 5 has is due to them and to L. J. Savage.
Article
Polya tree distributions are defined. They are generalizations of Dirichlet processes that allow for the possibility of putting positive mass on the set of continuous distributions. Predictive and posterior distributions are explained. A canonical construction of a Polya tree is given so that the Polya tree has any desired predictive distribution. Choices of the Polya tree parameters are discussed. Mixtures of Polya trees are defined and examples are given.
Article
Methods of generating prior distributions on spaces of probability measures for use in Bayesian nonparametric inference are reviewed with special emphasis on the Dirichlet processes, the tailfree processes, and processes neutral to the right. Some applications are given.
Article
The Bayesian approach to statistical problems, though fruitful in many ways, has been rather unsuccessful in treating nonparametric problems. This is due primarily to the difficulty in finding workable prior distributions on the parameter space, which in nonparametric ploblems is taken to be a set of probability distributions on a given sample space. There are two desirable properties of a prior distribution for nonparametric problems. (I) The support of the prior distribution should be large--with respect to some suitable topology on the space of probability distributions on the sample space. (II) Posterior distributions given a sample of observations from the true probability distribution should be manageable analytically. These properties are antagonistic in the sense that one may be obtained at the expense of the other. This paper presents a class of prior distributions, called Dirichlet process priors, broad in the sense of (I), for which (II) is realized, and for which treatment of many nonparametric statistical problems may be carried out, yielding results that are comparable to the classical theory. In Section 2, we review the properties of the Dirichlet distribution needed for the description of the Dirichlet process given in Section 3. Briefly, this process may be described as follows. Let X\mathscr{X} be a space and A\mathscr{A} a σ\sigma-field of subsets, and let α\alpha be a finite non-null measure on (X,A)(\mathscr{X}, \mathscr{A}). Then a stochastic process P indexed by elements A of A\mathscr{A}, is said to be a Dirichlet process on (X,A)(\mathscr{X}, \mathscr{A}) with parameter α\alpha if for any measurable partition (A1,,Ak)(A_1, \cdots, A_k) of X\mathscr{X}, the random vector (P(A1),,P(Ak))(P(A_1), \cdots, P(A_k)) has a Dirichlet distribution with parameter (α(A1),,α(Ak)).P(\alpha(A_1), \cdots, \alpha(A_k)). P may be considered a random probability measure on (X,A)(\mathscr{X}, \mathscr{A}), The main theorem states that if P is a Dirichlet process on (X,A)(\mathscr{X}, \mathscr{A}) with parameter α\alpha, and if X1,,XnX_1, \cdots, X_n is a sample from P, then the posterior distribution of P given X1,,XnX_1, \cdots, X_n is also a Dirichlet process on (X,A)(\mathscr{X}, \mathscr{A}) with a parameter α+1nδxi\alpha + \sum^n_1 \delta_{x_i}, where δx\delta_x denotes the measure giving mass one to the point x. In Section 4, an alternative definition of the Dirichlet process is given. This definition exhibits a version of the Dirichlet process that gives probability one to the set of discrete probability measures on (X,A)(\mathscr{X}, \mathscr{A}). This is in contrast to Dubins and Freedman [2], whose methods for choosing a distribution function on the interval [0, 1] lead with probability one to singular continuous distributions. Methods of choosing a distribution function on [0, 1] that with probability one is absolutely continuous have been described by Kraft [7]. The general method of choosing a distribution function on [0, 1], described in Section 2 of Kraft and van Eeden [10], can of course be used to define the Dirichlet process on [0, 1]. Special mention must be made of the papers of Freedman and Fabius. Freedman [5] defines a notion of tailfree for a distribution on the set of all probability measures on a countable space X\mathscr{X}. For a tailfree prior, posterior distribution given a sample from the true probability measure may be fairly easily computed. Fabius [3] extends the notion of tailfree to the case where X\mathscr{X} is the unit interval [0, 1], but it is clear his extension may be made to cover quite general X\mathscr{X}. With such an extension, the Dirichlet process would be a special case of a tailfree distribution for which the posterior distribution has a particularly simple form. There are disadvantages to the fact that P chosen by a Dirichlet process is discrete with probability one. These appear mainly because in sampling from a P chosen by a Dirichlet process, we expect eventually to see one observation exactly equal to another. For example, consider the goodness-of-fit problem of testing the hypothesis H0H_0 that a distribution on the interval [0, 1] is uniform. If on the alternative hypothesis we place a Dirichlet process prior with parameter α\alpha itself a uniform measure on [0, 1], and if we are given a sample of size n2n \geqq 2, the only nontrivial nonrandomized Bayes rule is to reject H0H_0 if and only if two or more of the observations are exactly equal. This is really a test of the hypothesis that a distribution is continuous against the hypothesis that it is discrete. Thus, there is still a need for a prior that chooses a continuous distribution with probability one and yet satisfies properties (I) and (II). Some applications in which the possible doubling up of the values of the observations plays no essential role are presented in Section 5. These include the estimation of a distribution function, of a mean, of quantiles, of a variance and of a covariance. A two-sample problem is considered in which the Mann-Whitney statistic, equivalent to the rank-sum statistic, appears naturally. A decision theoretic upper tolerance limit for a quantile is also treated. Finally, a hypothesis testing problem concerning a quantile is shown to yield the sign test. In each of these problems, useful ways of combining prior information with the statistical observations appear. Other applications exist. In his Ph. D. dissertation [1], Charles Antoniak finds a need to consider mixtures of Dirichlet processes. He treats several problems, including the estimation of a mixing distribution, bio-assay, empirical Bayes problems, and discrimination problems.
Article
We consider the problem of determining the distribution of means of random probability measures which are obtained by normalizing increasing additive processes. A solution is found by resorting to a well-known inversion formula for characteristic functions due to Gurland. Moreover, expressions of the posterior distributions of those means, in the presence of exchangeable observations, are given. Finally, a section is devoted to the illustration of two examples of statistical relevance.
Article
In recent years, Bayesian nonparametric inference, both theoretical and computational, has witnessed considerable advances. However, these advances have not received a full critical and comparative analysis of their scope, impact and limitations in statistical modelling; many aspects of the theory and methods remain a mystery to practitioners and many open questions remain. In this paper, we discuss and illustrate the rich modelling and analytic possibilities that are available to the statistician within the Bayesian nonparametric and/or semiparametric framework.
Article
We consider the problem of combining inference in related nonparametric Bayes models. Analogous to parametric hierarchical models, the hierarchical extension formalizes borrowing strength across the related submodels. In the nonparametric context, modelling is complicated by the fact that the random quantities over which we define the hierarchy are infinite dimensional. We discuss a formal definition of such a hierarchical model. The approach includes a regression at the level of the nonparametric model. For the special case of Dirichlet process mixtures, we develop a Markov chain Monte Carlo scheme to allow efficient implementation of full posterior inference in the given model. Copyright 2004 Royal Statistical Society.
Article
Bayesian nonparametric inference is a relatively young area of research and it has recently undergone a strong development. Most of its success can be explained by the considerable degree of exibility it ensures in statistical modelling, if compared to parametric alternatives, and by the emergence of new and ecient simulation techniques that make nonparametric models amenable to concrete use in a number of applied statistical problems. Since its introduction in 1973 by T.S. Ferguson, the Dirichlet process has emerged as a cornerstone in Bayesian nonparametrics. Nonetheless, in some cases of interest for statistical applications the Dirichlet process is not an adequate prior choice and alternative nonparametric models need to be devised. In this paper we provide a review of Bayesian nonparametric models that go beyond the Dirichlet process.
Article
Testing the fit of data to a parametric model can be done by embedding the parametric model in a nonparametric alternative and computing the Bayes factor of the parametric model to the nonparametric alternative. Doing so by specifying the nonparametric alternative via a Polya tree process is particularly attractive, from both theoretical and methodological perspectives. Among the benefits is a degree of computational simplicity that even allows for robustness analyses to be implemented. Default (nonsubjective) versions of this analysis are developed herein, in the sense that recommended choices are provided for the (many) features of the Polya tree process that need to be specified. Considerable discussion of these features is also provided to assist those who might be interested in subjective choices. A variety of examples involving location-scale models are studied. Finally, it is shown that the resulting procedure can be viewed as a conditional frequentist test, resulting in data-dependent reported error probabilities that have a real frequentist interpretation (as opposed to p values) in even small sample situations.