
Bruce George Lindsay- Ph.D. University of Washington
- Chair at Pennsylvania State University
Bruce George Lindsay
- Ph.D. University of Washington
- Chair at Pennsylvania State University
About
117
Publications
18,835
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,923
Citations
Introduction
Current institution
Additional affiliations
August 1996 - June 1997
January 1991 - June 1991
September 1979 - present
Publications
Publications (117)
Statistical distances, divergences, and similar quantities have a large history and play a fundamental role in statistics, machine learning and associated scientific disciplines. However, within the statistical literature, this extensive role has too often been played out behind the scenes, with other aspects of the statistical problems being viewe...
Statistical distances, divergences, and similar quantities have a large history and play a fundamental role in statistics, machine learning and associated scientific disciplines. However, within the statistical literature, this extensive role has too often been played out behind the scenes, with other aspects of the statistical problems being viewe...
Beyond the expectation–maximization (EM) algorithm for vector parameters, the EM for an unknown distribution function is often used in mixture models, density estimation, and signal recovery problems. We prove the convergence of the EM in functional spaces and show the EM likelihoods in this space converge to the global maximum.
This paper addresses the problem of variance estimation for a general U-statistic. U-statistics form a class of unbiased estimators for those parameters of interest that can be written as E {phi(X-1,...,X-k)}, where phi is a symmetric kernel function with k arguments. Although estimating the variance of a U-statistic is clearly of interest, asympto...
This paper is concerned with hierarchical clustering of long binary sequence data. We propose two alternative improvements of the EM algorithm used in Chen and Lindsay (2006). One is the FixEM. It is just the regular EM but we no longer update the weights πsπs used in the ancestral mixture models. The other is the ModalEM. In this we cluster data a...
In this article, we study the power properties of quadratic-distance-based goodness-of-fit tests. First, we introduce the concept of a root kernel and discuss the considerations that enter the selection of this kernel. We derive an easy to use normal approximation to the power of quadratic distance goodness-of-fit tests and base the construction of...
Nucleosome is the fundamental packing unit of DNA in eukaryotic cells, and its positioning plays a critical role in regulation of gene expression and chromosome functions. Using a recently developed chemical mapping method, nucleosomes can be potentially mapped with an unprecedented single-base-pair resolution. Existence of overlapping nucleosomes...
This chapter covers the major milestones of the early years of my career in statistics. It is really the story of the transitions I made, from early uncertainty about my choice of career and my level of talent, up to crossing the tenure line and realizing that I had not only been deemed a success, I had a passion for the subject. The focus will be...
The consistency of the maximum likelihood estimator (MLE) has been well studied in many papers such as Wald (1949), Kiefer and Wolfowitz (1956) and many more subsequent works. The purpose of this short note is to provide a new direction to understand the consistency of the MLE in discrete models. In addition, our work gives a very general and direc...
In some models, both parametric and not, maximum likelihood estimation fails to be consistent. We investigate why the maximum likelihood method breaks down with some examples and notice the paradox that, in those same models, maximum likelihood estimation would have been consistent if the data had been measured with error. With this motivation we d...
Hui & Lindsay (2010) proposed a new dimension reduction method for multivariate data. It was based on the so-called white noise matrices derived from the Fisher information matrix. Their theory and empirical studies demonstrated that this method can detect interesting features from high-dimensional data even with a moderate sample size. The theoret...
A local modal estimation procedure is proposed for the regression function in a non-parametric regression model. A distinguishing characteristic of the proposed procedure is that it introduces an additional tuning parameter that is automatically selected using the observed data in order to achieve both robustness and efficiency of the resulting est...
Kim & Lindsay (2011a) proposed a new sampling-based visualization methodology, modal simulation, designed to describe the boundaries of the confidence regions generated by an inference function such as the likelihood. Once the sample points on the boundaries of the targeted confidence sets are created in a single simulation run, one can use those s...
MixtureTree v1.0 is a Linux based program (written in C++) which implements an algorithm based on mixture models for reconstructing phylogeny from binary sequence data, such as single-nucleotide polymorphisms (SNPs). In addition to the mixture algorithm with three different optimization options, the program also implements a bootstrap procedure wit...
A new method for building a gene tree from Single Nucleotide Polymorphism (SNP) data was developed by Chen and Lindsay (Biometrika
93(4):843–860, 2006). Called the mixture tree, it was based on an ancestral mixture model. The sieve parameter in the model
plays the role of time in the evolutionary tree of the sequences. By varying the sieve paramete...
We consider an improved density estimator which arises from treating the ker-nel density estimator as an element of the model that consists of all mixtures of the kernel, continuous or discrete. One can obtain the kernel density estimator with "likelihood-tuning" by using the uniform density as the starting value in an EM algorithm. The second tuni...
The composite likelihood method has been proposed and systematically discussed by Besag (1974), Lindsay (1988), and Cox and Reid (2004). This method has received increasing interest in both theoretical and applied aspects. Compared to the traditional likelihood method, the composite likelihood method may be less statistically efficient, but it can...
Projection pursuit is a technique for locating projections from high- to low-dimensional space that reveal interesting non-linear
features of a data set, such as clustering and outliers. The two key components of projection pursuit are the chosen measure
of interesting features (the projection index) and its algorithm. In this paper, a white noise...
A standard goal of model evaluation and selection is to find a model that approximates the truth well while at the same time is as parsimonious as possible. In this paper we emphasize the point of view that the models under consideration are almost always false, if viewed realistically, and so we should analyze model adequacy from that point of vie...
A typical problem for the parameter estimation in normal mixture models is an unbounded likelihood and the presence of many spurious local maxima. To resolve this problem, we apply the doubly smoothed maximum likelihood estimator (DS-MLE) proposed by Seo and Lindsay (in preparation). We discuss the computational issues of the DS-MLE and propose a...
We develop a consistent and highly efficient marginal model for missing at random data using an estimating function approach Our approach differs from inverse weighted estimating equations (Robins. Rotnitzky. mid Zhao 1995) and the imputation method (Paik 1997) in that our approach does not require estimating the probability of missing or imputing...
We introduce a semiparametric ``tubular neighborhood'' of a parametric model in the multinomial setting. It consists of all multinomial distributions lying in a distance-based neighborhood of the parametric model of interest. Fitting such a tubular model allows one to use a parametric model while treating it as an approximation to the true distribu...
A fundamental problem for Bayesian mixture model analysis is label switching, which occurs as a result of the nonidentifiability of the mixture components under symmetric priors. We propose two labeling methods to solve this problem. The first method, denoted by PM(ALG), is based on the posterior modes and an ascending algorithm generically denoted...
Further properties of the nonparametric maximum-likelihood estimator of a mixing distribution are obtained by exploiting the properties of totally positive kernels. Sufficient conditions for uniqueness of the estimator are given. This result is more general, and the proof is substantially simpler, than given previously. When the component density h...
In the generalized method of moments approach to longitudinal data analysis, unbiased estimating functions can be constructed
to incorporate both the marginal mean and the correlation structure of the data. Increasing the number of parameters in the
correlation structure corresponds to increasing the number of estimating functions. Thus, building a...
This work builds a unified framework for the study of quadratic form distance measures as they are used in assessing the goodness of fit of models. Many important procedures have this structure, but the theory for these methods is dispersed and incomplete. Central to the statistical analysis of these distances is the spectral decomposition of the k...
This work builds a unified framework for the study of quadratic form distance measures as they are used in assessing the goodness of fit of models. Many important procedures have this structure, but the theory for these methods is dispersed and incomplete. Central to the statistical analysis of these distances is the spectral decomposition of the k...
We propose a general class of risk measures which can be used for data-based evaluation of parametric models. The loss function is defined as the generalized quadratic distance between the true density and the model proposed. These distances are characterized by a simple quadratic form structure that is adaptable through the choice of a non-negativ...
Given observations originating from a mixture distribution f [x; Q(λ)] where the kernel f is known and the mixing distribution Q is unknown, we consider estimating a functional θ (Q) of Q. A natural estimator of such a functional can be obtained by substituting Q with its nonparametric maximum likelihood estimator (NPMLE), denoted here a Q. We demo...
Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be o...
A new clustering approach based on mode identification is developed by applying new optimiza- tion techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorit...
Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be o...
We develop a new method for building a hierarchical tree from binary sequence data. It is based on an ancestral mixture model.
The sieve parameter in the model plays the role of time in the evolutionary tree of the sequences. By varying the sieve parameter,
one can create a hierarchical tree that estimates the population structure at each fixed bac...
In this article we propose a general class of risk measures which can be used for data based evaluation of parametric models. The loss function is defined as generalized quadratic distance between the true density and the proposed model. These distances are characterized by a simple quadratic form structure that is adaptable through the choice of a...
An important and yet difficult problem in fitting multivariate mixture models is determining the mixture complexity. We develop theory and a unified framework for finding the nonparametric maximum likelihood estimator of a multivariate mixing distribution and consequently estimating the mixture complexity. Multivariate mixtures provide a flexible a...
Genomic comparisons provide evidence for ancient genome-wide duplications in a diverse array of animals and plants. We developed a birth-death model to identify evidence for genome duplication in EST data, and applied a mixture model to estimate the age distribution of paralogous pairs identified in EST sets for species representing the basal-most...
Multivariate normal mixtures provide a flexible method of fitting high-dimensional data. It is shown that their topography, in the sense of their key features as a density, can be analyzed rigorously in lower dimensions by use of a ridgeline manifold that contains all critical points, as well as the ridges of the density. A plot of the elevations o...
We propose a class of penalized nonparametric maximum likelihood estimators (NPMLEs) for the species richness problem. We use a penalty term on the likelihood because likelihood estimators that lack it have an extreme instability problem. The estimators are constructed using a conditional likelihood that is simpler than the full likelihood. We show...
In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed.
We propose a compound...
Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be o...
In May 2002 a workshop was held at the National Science Foundation to discuss the future challenges and opportunities for the statistics community. After the workshop the scientific committee produced an extensive report that described the general consensus of the community. This article is an abridgment of the full report.
This paper proposes several variants of disparity-based inference (Ann. Statist. 22 (1994) 1081–1114). We introduce these modifications and explain the motivation behind them. Several of these estimators and tests have attractive efficiency and robustness properties. An extensive numerical and graphical investigation is presented to substantiate th...
The class of density based minimum distance estimators provide attractive alternatives to the maximum likelihood estimator because several members of this class have nice robustness properties while being first-order efficient under the assumed model. A helpful computational technique—similar to the iteratively reweighted least squares used in robu...
A general expository description is given of the use of quadratic score test statistics as inference functions. This methodology allows one to do efficient estimation and testing in a semiparametric model defined by a set of mean-zero estimating functions. The inference function is related to a quadratic minimum distance problem. The asymptotic chi...
To construct an optimal estimating function by weighting a set of score functions, we must either know or estimate consistently the covariance matrix for the individual scores. In problems with high dimensional correlated data the estimated covariance matrix could be unreliable. The smallest eigenvalues of the covariance matrix will be the most imp...
The paper considers a rectangular array asymptotic embedding for multistratum data sets, in which both the number of strata and the number of within-stratum replications increase, and at the same rate. It is shown that under this embedding the maximum likelihood estimator is consistent but not efficient owing to a non-zero mean in its asymptotic no...
Suppose a random sample of individuals is drawn from a population with unknown number of disjoint classes. The population is said to be homogeneous if all the classes have the same proportions and heterogeneous otherwise. Although there is a vast literature towards estimation of the number of classes, and the performance of estimators is related to...
The minimum disparity estimators of Lindsay (Ann. Statist. 22 (1994) 1081–1114) combine full asymptotic efficiency and attractive robustness properties and hence are useful practical tools. The residual adjustment function (RAF) introduced and used by Lindsay in this context helps to graphically interpret the robustness of the estimators, but this...
. In a Bernoulli census, the inclusion probabilities often vary with individ-uals. The dispersion score test is applied to test homogeneity of inclusion probabilities. Three graphic diagnostics, the log ratio plot, residual plot and gradient plot, are used to detect the existence of heterogeneity. Confidence bands are available for all three plots....
This paper first develops the ideas of Aitken δ<sup>2</sup>
method to accelerate the rate of convergence of an error sequence (value
of the objective function at each step) obtained by training a neural
network with a sigmoidal activation function via the backpropagation
algorithm. The Aitken method is exact when the error sequence is exactly
geome...
This research focuses on a general class of maximum likelihood problems in which it is desired to maximise a nonparametric mixture likelihood with finitely many known component densities over the set of unknown weight parameters. Convergence of the conventional EM algorithm for this problem is extremely slow when the component densities are poorly...
Generalised estimating equations enable one to estimate regression parameters consistently in longitudinal data analysis even when the correlation structure is misspecified. However, under such misspecification, the estimator of the regression parameter can be inefficient. In this paper we introduce a method of quadratic inference functions that do...
How much information does a finite collection of moments carry about the underlying distribution? We revive an old bound, give a new, simple formula for its calculation, and demonstrate that although very little can be said about the central part of the distribution, the tail is much more sharply defined.
There are a number of cases where the moments of a distribution are easily obtained, but theoretical distributions are not available in closed form. This paper shows how to use moment methods to approximate a theoretical univariate distribution with mixtures of known distributions. The methods are illustrated with gamma mixtures. It is shown that f...
We discuss a method of weighting likelihood equations with the aim of obtaining fully efficient and robust estimators. We discuss the case of continuous probability models using unimodal weighting functions. These weighting functions downweight observations that are inconsistent with the assumed model. At the true model, therefore, the proposed est...
The realized error of an estimate is determined not only by the efficiency of the estimator, but also by chance. For example, suppose that we have observed a bivariate normal vector whose expectation is known to be on a circle. Then, intuitively, the longer that vector happens to be, the more accurately its angle is likely to be estimated. Yet this...
Using Weyl's formula for the volume of the tube about a manifold in the unit sphere, we show that the distribution of the squared length of the projection of the normal variate to any smooth convex cone is a mixture of chi-squared distributions and we give the explicit formulas for the weights. We also give the application of our work to circular c...
We discuss a method of weighting the likelihood equations with the aim of obtaining fully efficient and robust estimators. We discuss the case of discrete probability models using several weighting functions. If the weight functions generate increasing residual adjustment functions then the method provides a link between the maximum likelihood scor...
Consider finite mixture models of the form $g(x; Q) = \int f(x; \theta) dQ(\theta)$, where f is a parametric density and Q is a discrete probability measure. An important and difficult statistical problem concerns the determination of the number of support points (usually known as components) of Q from a sample of observations from g. For an import...
A difficulty associated with current hypothesis tests for generalized estimating equations is that, if the working correlation assumption is incorrect, then consistent estimates of the true covariance matrices of the observations within clusters are needed, either in the calculation of the test statistics, or in the evaluation of percentage points...
Rudas, Clogg, and Lindsay (1994) proposed a mixture index-of-fit approach for evaluating goodness of fit in the analysis of contingency tables. Clogg, Rudas, and Xi (1995) applied this approach to the analysis of models for mobility tables. The maximum likelihood estimate of the mixture index of fit pi* can be obtained by the expectation and maximi...
Methods are devised for estimating the parameters of a prospective logistic model in a case--control study with dichotomous response D which depends on a covariate X. For a portion of the sample, both the gold standard X and a surrogate covariate W are available; however, for the greater portion of the data only the surrogate covariate W is availab...
This paper extends the projected score methods of C. G. Small and D. L. McLeish [ibid. 76, No. 4, 693-703 (1989; Zbl 0681.62008)]. It is shown that the conditional score function may be approximated, with arbitrarily small stochastic error, in terms of a natural basis for the space of centred likelihood ratios. The utility of using this basis is es...
A simple method for approximate conditional inference is described. The methodology is applied to natural exponential family models where it is shown to provide accurate approximations to fully conditional estimates. The approximation technique can be applied much more generally than in this particular class of models. The technique depends only on...
A simple method for approximate conditional inference is described. The methodology is applied to natural exponential family models where it is shown to provide accurate approximations to fully conditional estimates. The approximation technique can be applied much more generally than in this particular class of models. The technique depends only on...
The literature on semiparametric mixture models has flourished over the last decade, both in applied and theoretical journals. In this paper, we review examples of important areas of application, and summarize some of the recent developments in maximum likelihood theory, including inference for the mixing distribution and the structural parameters....
A framework based on mixture methods is proposed for evaluating goodness of fit in the analysis of contingency tables. For a given model H applied to a contingency table P, we consider the two‐point mixture P = (1 – π)π1 + ππ2, with π the mixing proportion (0 ≤ π ≤ 1) and π1 and π2 the tables of probabilities for each latent class or component. In...
It is shown how and why the influence curve poorly measures the robustness properties of minimum Hellinger distance estimation. Rather, for this and related forms of estimation, there is another function, the residual adjustment function, that carries the relevant information about the trade-off between efficiency and robustness. It is demonstrated...
We here consider testing the hypothesis ofhomogeneity against the alternative of a two-component mixture of densities. The paper focuses on the asymptotic null distribution of 2 log ?n, where ?n is the likelihood ratio statistic. The main result, obtained by simulation, is that its limiting distribution appears pivotal (in the sense of constant per...
A general class of minimum distance estimators for continuous models called minimum disparity estimators are introduced. The conventional technique is to minimize a distance between a kernel density estimator and the model density. A new approach is introduced here in which the model and the data are smoothed with the same kernel. This makes the me...
We develop two procedures based on the moment estimators that can be used to test for the number of component distributions in a mixture of normal distributions with equal variances. One is a computationally fast moment version of the likelihood ratio procedure. We apply these procedures to the classic problem of testing for the presence of two nor...
We consider the vexing computational problem of estimating the parameters of a mixtures of two normal distributions with equal variances using maximum likelihood estimation. As a partial solution we reconsider the strongly consistent moment estimators as starting values for the likelihood maximization algorithms. Using a technique based on the dete...
A framework based on mixture methods is proposed for evaluating goodness of fit in the analysis of contingency tables. For a given model H applied to a contingency table P, we consider the two-point mixture P = (1 - π)Π1 + πΠ2, with π the mixing proportion (0 ⩽ π ⩽ 1) and Π1 and Π2 the tables of probabilities for each latent class or component. In...
A longstanding difficulty in multivariate statistics is identifying and evaluating nonnormal data structures in high dimensions with high statistical efficiency and low search effort. Here the possibilities of using sample moments to identify mixtures of multivariate normals are investigated. A particular system of moment equations is devised and t...
A sample is commonly modeled by a mixture distribution if the observations follow a common distribution, but the parameter of interest differs between observations. For example, we observe the lengths but not the ages of a sample offish. It may be reasonable to assume that length is normally distributed about an unknown mean that depends on the age...
This paper presents various algorithmic approaches for computing the maximum likelihood estimator of the mixing distribution of a one-parameter family of densities and provides a unifying computer-oriented concept for the statistical analysis of unobserved heterogeneity (i.e., observations stemming from different subpopulations) in a univariate sam...
A competing risk type of model called the mixed hazards model is studied. This model differs from the usual competing risks in that the cause of failure can not be identified. Questions related to the identifiability of the model and to the uniqueness and support size of the maximum likelihood estimate of the mixing distribution are considered. A d...
The Rasch model for item analysis is an important member of the class of exponential response models in which the number of nuisance parameters increases with the number of subjects, leading to the failure of the usual likelihood methodology. Both conditional-likelihood methods and mixture-model techniques have been used to circumvent these problem...
The fitness of plants or animals within a population is largely determined by the number of offspring they produce. In natural populations lacking familial structure either one or both parents are often unknown. To circumvent this problem, parents, and offspring can be genotyped for a set of genetic markers. Likelihood models are proposed to estima...
An investigation is carried out in the behavior of the determinants of certain moment matrices, for which the $(i, j)$ entry is the $(i + j)$th moment of a distribution $F$. The determinant can be represented as the expected value of a $U$-statistic type kernel. The structure of the kernel illustrates how the determinant carries information about t...
The use of moment matrices and their determinants are shown to elucidate the structure of mixture estimation as carried out using the method of moments. The setting is the estimation of a discrete finite support point mixing distribution. In the important class of quadratic variance exponential families it is shown for any sample there is an intege...
It is desirable that a numerical maximization algorithm monotonically increase its objective function for the sake of its stability of convergence. It is here shown how one can adjust the Newton-Raphson procedure to attain monotonicity by the use of simple bounds on the curvature of the objective function. The fundamental tool in the analysis is th...
It is shown that there is an analog to the score function (the derivative of the log-likelihood with respect to the parameter), here called the difference score, which plays an important theoretical role in integer parameter models. A unified treatment of integer parameter models can be obtained by recognizing that many commonly used models have a...
For an arbitrary one parameter exponential family density it is shown how to construct a mixing distribution (prior) on the parameter in such a way that the resulting mixture distribution is a two (or more) parameter exponential family. Reweighted infinitely divisible distributions are shown to be the parametric mixing distributions for which this...
In a probability model for possible errors in the inspection of a finite lot containing defective items, it is shown how maximum likelihood estimation is affected by whether the sampling is carried out with or without replacement. These changes are examined by developing a method for simultaneous maximization over an integer and a scalar parameter....
Empirical partially Bayes methods are considered as a means of improving efficiency in a class of problems in which the number of nuisance parameters increases to infinity. In the method used, the parameter of interest is estimated in an asymptotically unbiased way while James-Stein shrinkage is applied to the nuisance parameter estimates. When the...
Geometric analysis of the mixture likelihood set for univariate exponential family densities yields results which tie the number and location of support points for the nonparametric maximum likelihood estimator of the mixing distribution to sign changes in certain integrated polynomials. One corollary is a very general uniqueness theorem for the es...
The conditional score function is found to be generally fully informative concerning a parameter of interest when the conditioning statistic $S$ is sufficient for the nuisance parameter and has an exponential family distribution. Information is here measured by assuming the nuisance parameter to have been generated by an unknown mixing distribution...
In this paper certain fundamental properties of the maximum likelihood estimator of a mixing distribution are shown to be geometric properties of the likelihood set. The existence, support size, likelihood equations, and uniqueness of the estimator are revealed to be directly related to the properties of the convex hull of the likelihood set and th...