Journal of the Royal Statistical Society. Series B: Methodological

Print ISSN: 0035-9246
To examine global and local influence, and their relations for regression models, we study the perturbation-formed surface of a variable, such as the maximum likelihood estimate of a parameter, by evaluating the second derivative or curvature of the surface. (The corresponding slope of this surface is related to the curvature of the likelihood displacement surface of Cook.) Examples show that this approach, with the aid of plots, is helpful not only to discover influential cases including those hidden in an individual global sense but also to understand the nature of influence.
Population models for dependence between two angular measurements and for dependence between an angular and a linear observation are proposed. The method of canonical correlations first leads to new population and sample measures of dependence in this latter situation. An example relating wind direction to the level of a pollutant is given. Next, applied to pairs of angular measurements, the method yields previously proposed sample measures in some special cases and a new sample measure in general.
Asymptotic normality of the posterior distribution of a parameter in a stochastic process is shown to hold under conditions which do little more than ensure consistency of a maximum likelihood estimator. Much more stringent conditions are required to ensure asymptotic normality of the MLE. This contrast, which has implications of considerable significance, does not emerge in the classical context of independent and identically distributed observations. KeywordsParameter Estimation-Stochastic processes-Bayesian Methods-Maximum Likelihood Estimator-Asymptotic Normality
An adaptation of least squares cross-validation is proposed for bandwidth-choice in the kernel estimation of the derivatives of a probability density. The practicality of the method is demonstrated by an example and a simulation study. Theoretical justification is provided by an asymptotic optimality result.
We consider bivariate survival models in which the dependence between two survival times is by way of stochastically related unobserved components. We analyse the role of the mixing or heterogeneity distribution in such models and examine how this distribution affects the correlation between the two survival times. We derive explicit expressions for the correlation between the survival times and examine some properties of this correlation. Furthermore we examine to what extent the choice of this heterogeneity distribution is important for the hazards in the sample and their time paths. The time path of the hazard among the survivors is determined by the moments of the distribution describing the heterogeneity among the survivors. We consider particular published mixing distributions. Simulations show that it may be hazardous to estimate bivariate survival models in which the mixing distribution is parameterized univariate, in that a univariate random variable may not be able to account both for the mutual dependence of the survival times and for the change in the composition of the sample over time owing to unobserved heterogeneity.
Statistical scientists have recently focused sharp attention on properties of iterated chaotic maps, with a view to employing such processes to model naturally occurring phenomena. In the present paper we treat the logistic map, which has earlier been studied in the context of modelling biological systems. We derive theory describing properties of the 'invariant' or 'stationary' distribution under logistic maps and apply those results in conjunction with numerical work to develop further properties of invariant distributions and Lyapunov exponents. We describe the role that poles play in determining properties of densities' iterated distributions and show how poles arise from iterated mappings of the centre of the interval to which the map is applied. Particular attention is paid to the shape of the invariant distribution in the tails or in the neighbourhood of a pole of its density. A new technique is developed for this application. it enables us to combine 'parametric' information, available from the structure of the map, with 'nonparametric' information obtainable from numerical experiments.
We study parameter estimation in linear Gaussian covariance models, which are $p$-dimensional Gaussian models with linear constraints on the covariance matrix. Maximum likelihood estimation for this class of models leads to a non-convex optimization problem which typically has many local optima. We prove that the log-likelihood function is concave over a large region of the cone of positive definite matrices. Using recent results on the asymptotic distribution of extreme eigenvalues of the Wishart distribution, we provide sufficient conditions for any hill climbing method to converge to the global optimum. The proofs of these results utilize large-sample asymptotic theory under the scheme $n/p \to \gamma > 1$. Remarkably, our numerical simulations indicate that our results remain valid for $\min\{n,p\}$ as small as 2. An important consequence of this analysis is that for sample sizes $n \simeq 14 p$, maximum likelihood estimation for linear Gaussian covariance models behaves as if it were a convex optimization problem.
In multivariate regression, when the regressors are orthogonal, the estimates of the coefficients may be regarded, in the normal case, as an independent normal random sample with estimable variance. Significance is determined by the absolute magnitude of the highest member of the sample, after consideration of order statistics more generally in this context. The method is applied to time series data analysed by Fisher and Yates (1957): while these authors identified, by their essentially ex ante approach, the first and second orthopolynomials as significant, ex post only the second is identifiable.
This note is concerned with the derivation of the distribution of a random variable X in terms of the distribution of Y given X, where X, Y are discrete random variables with finite support
We describe semiparametric estimation and inference in a logistic regression model with measurement error in the predictors. The particular measurement error model consists of a primary data set in which only the response Y and a fallible surrogate W of the true predictor X are observed, plus a smaller validation data set for which (Y, X, W) are observed. Except for the underlying assumption of a logistic model in the true predictor, no parametric distributional assumption is made about the true predictor or its surrogate. We develop a semiparametric parameter estimate of the logistic regression parameter which is asymptotically normally distributed and computationally feasible. The estimate relies on kernel regression techniques. For scalar predictors, by a detailed analysis of the mean-squared error of the parameter estimate, we obtain a representation for an optimal bandwidth.
In this paper it is established that the lognormal distribution is not determined by its moments. Some brief comments are made on the set of distributions having the same moments as a lognormal distribution.
The method of least squares cross-validation for choosing the bandwidth of a kernel density estimator has been the object of considerable research, through both theoretical analysis and simulation studies. The method involves the minimization of a certain function of the bandwidth. One of the less attractive features of this method, which has been observed in simulation studies but has not previously been understood theoretically, is that rather often the cross-validation function has multiple local minima. The theoretical results of this paper provide an explanation and quantification of this empirical observation, through modelling the cross-validation function as a Gaussian stochastic process. Asymptotic analysis reveals that the degree of wiggliness of the cross-validation function depends on the underlying density through a fairly simple functional, but dependence on the kernel function is much more complicated. A simulation study explores the extent to which the asymptotic analysis describes the actual situation. Our techniques may also be used to obtain other related results--e.g. to show that spurious local minima of the cross-validation function are more likely to occur at too small values of the bandwidth, rather than at too large values.
Matching university places to students is not as clear cut or as straightforward as it ought to be. By investigating the matching algorithm used by the German central clearinghouse for university admissions in medicine and related subjects, we show that a procedure designed to give an advantage to students with excellent school grades actually harms them. The reason is that the three-step process employed by the clearinghouse is a complicated mechanism in which many students fail to grasp the strategic aspects involved. The mechanism is based on quotas and consists of three procedures that are administered sequentially, one for each quota. Using the complete data set of the central clearinghouse, we show that the matching can be improved for around 20% of the excellent students while making a relatively small percentage of all other students worse off.
Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered. Low dimensional views are an important by-product of LDA---our new techniques inherit this feature. We are able to control the within-class spread of the subclass centers relative to the between-class spread. Our technique for fitting these models permits a natural blend with nonparametric versions of LDA.
For a general regression model with n independent observations we consider the variance of the estimate of a quantity of interest under two scenarios. One scenario is where all the parameters are estimated from the data, the other scenario is where a subset of the parameters are assumed known at their true values and the remaining parameters are estimated. We focus on quantities of interest which are defined on the scale of the response variable. We show that, under certain conditions, the ratio of a weighted sum across the design points of the variance of the quantity of interest is given by q=p, where q and p are the number of free parameters in the two scenarios. Thus, in this average sense, the inflation in variance associated with adding parameters, also interpreted as the cost of adding parameters to a model, is directly proportional to the number of parameters. We study models involving power transformations, non-linear models and exponential family models. Key Words: Box-Cox t...
this paper is to quantify the amount of information that an experimenter can expect to learn about the parameter 0 through experimentation. Section 2 proposes an expected utility, U(), quantifying the expected amount of information. The main application of U() is to the comparison of experiments (DeGroot, 1962), as it induces an order on experiments, in that 1 <_ 2, if U(_) < U(2). In particular, it applies to spaces of design measures, response surfaces and the choice of sample size. Only the last will be considered here. The application to non-linear design problems (Chaloner and Larntz, 1989) and model choice problems are discussed in Polson (1988). The motivation for the expected utility U() is given by a suitably normalized form of Shannon information gain (Polson, 1988). The Shannon information gain was first proposed in a statistical setting as a measure of information of an experiment by Lindley (1956) and has been applied in many contexts, e.g. the design of linear models (Stone, 1959; Smith and Verdinelli, 1980) and optimal allocation problems (Brooks, 1980, 1987). For independent and identically distrib.uted observations Bernardo (1979) uses asymptotic Shannon information gain for interpreting the Jeffreys prior as a reference prior. Here the IAddressfor correspondence: Graduate School of Business, University of Chicago, Chicago, IL 60637, USA
this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the chance of catastrophic failure of the U.S. Space Shuttle.
Considerable effort has been directed recently to develop asymptotically minimax methods in problems of recovering infinite-dimensional objects (curves, densities, spectral densities, images) from noisy data. A rich and complex body of work has evolved, with nearly- or exactly- minimax estimators being obtained for a variety of interesting problems. Unfortunately, the results have often not been translated into practice, for a variety of reasons -- sometimes, similarity to known methods, sometimes, computational intractability, and sometimes, lack of spatial adaptivity. We discuss a method for curve estimation based on n noisy data; one translates the empirical wavelet coefficients towards the origin by an amount p 2 log(n)oe= p n. The method is different from methods in common use today, is computationally practical, and is spatially adaptive; thus it avoids a number of previous objections to minimax estimators. At the same time, the method is nearly minimax for a wide variety of ...
This paper describes an approach to Bayesian sensitivity analysis that uses an influence statistic and an outlier statistic to assess the sensitivity of a model to perturbations. The basic outlier statistic is a Bayes factor, while the influence statistic depends strongly on the purpose of the analysis. The task of influence analysis is aided by having an interpretable influence statistic. Two alternative divergences, an L1 distance and a Ø 2 divergence are proposed and shown to be interpretable. The Bayes factor and the proposed influence measures are shown to be summaries of the posterior of a perturbation function. Keywords: Bayes Factor; Censoring; Conditional Predictive Ordinate; Diagnostics, Influence Analysis. 1 Introduction This paper describes an approach to Bayesian sensitivity analysis that uses an influence statistic and an outlier statistic to assess the sensitivity of a model to perturbations. Let the likelihood and prior from an initial model M 0 combine to give the ...
This paper provides a Bayesian analysis of such a model. The main contribution of our paper is that different features of the data--such as the spectral density of the stationary term, the regression parameters, unknown frequencies and missing observations--are combined in a hierarchical Bayesian framework and estimated simultaneously. A Bayesian test to detect the presence of deterministic components in the data is also constructed. Applications of our methods to simulated and real data suggest that they perform well. We place a smoothness prior, similar to that in Wahba (1980), on the logarithm of the spectral density. To make the estimation of the spectral density computationally tractable, Whittle's (1957) approximation to the Gaussian likelihood is used. This results in a nonparametric regression problem with the logarithm of the periodogram as the dependent variable, the logarithm of the spectral density as the unknown regression curve, and observation errors having log chi-squared distributions. By approximating the logarithm of a chi-squared distribution as a mixture of normals, the approximate log likelihood together with the prior for the spectral density can be expressed as a state space model with errors that are mixtures of normals. The computation is carried out efficiently by Markov chain Monte Carlo using the sampling approach in Carter and Kohn (1994). To make the paper easier to read the full model is introduced in a number of steps. Section 2 shows how to estimate the spectral density of a stationary process in the absence of deterministic components. Section 3 extends the estimation to the signal plus noise model with missing observations. Section 4 shows by example how the results in Sections 2 and 3 can be combined to analyze data and studies emp...
Innuence of prior distribution N(, ?1 ) for on the posterior distribution of k. Acidity data: mixture model with Poisson (prior P(10) for k), random and default parameter values.
Predictive densities for the 3 data sets, unconditionally (full line), and conditional on various values of k (dotted lines); the curves displayed are for k = 2 to 6, except for the Galaxy data, where they are for k = 3 to 6. In each case note that it is only the smallest k shown that
Enzyme data set: posterior densities of weights and means for the second and third component, default prior model, conditioning on k = 3 (full line), and conditioning also on w 3 0:17 (dotted line) and on w 3 > 0:17 (broken line). (The last two have areas proportional to posterior probability.)
Posterior distributions of k: comparison of sensitivity to hyperparameters between xed and random models: (a) = 2, p (==) varying between R=5 and R=20, (b) = 2, g = 0:2, p (g=hh) varying between R=5 and R=20.
Comparison of mixing of variable-k and xed-k samplers. Left panels: traces of 2 against sweep number. Right panels: (upper) posterior density estimates at the end of the runs, (lower) sequences of estimates of p( 2 < 0jy; k = 3) obtained as the runs proceed; solid lines refer to the variable-k sampler.
This article is a contribution to the methodology of fully Bayesian mixture modelling. We stress the word "fully" in two senses. First, we model the number of components and the mixture component parameters jointly and base inference about these quantities on their posterior probabilities. This is in contrast to most previous Bayesian treatments of mixture estimation, which consider models for different numbers of components separately, and use significance tests or other non-Bayesian criteria to infer the number of components. Secondly, we aim to present posterior distributions of our objects of inference (model parameters and predictive densities), and not just "best estimates". There are three key ideas in our treatment. First, we demonstrate that novel MCMC methods, the "reversible jump" samplers introduced by Green (1994, 1995), can be used to sample mixture representations with an unknown and hence varying number of components. We believe these methods are preferable on grounds of convenience,
Monte Carlo maximum likelihood for normalized families of distributions (Geyer and Thompson, 1992) can be used for an extremely broad class of models. Given any family f h ` : ` 2 Theta g of nonnegative integrable functions, maximum likelihood estimates in the family obtained by normalizing the the functions to integrate to one can be approximated by Monte Carlo, the only regularity conditions being a compactification of the parameter space such that the the evaluation maps ` 7! h ` (x) remain continuous. Then with probability one the Monte Carlo approximant to the log likelihood hypoconverges to the exact log likelihood, its maximizer converges to the exact maximum likelihood estimate, approximations to profile likelihoods hypoconverge to the exact profile, and level sets of the approximate likelihood (support regions) converge to the exact sets (in Painlev'e-Kuratowski set convergence). The same results hold when there are missing data (Thompson and Guo, 1991, Gelfand and Carlin, 19...
this paper, we produced the curves fi 0 (E) and
This paper investigates conditions under which the Gibbs sampler (Gelfand and Smith, 1990; Tanner and Wong, 1987; Geman and Geman, 1984) converges at a geometric rate. The main results appear in Sections 2 and 3, where geometric convergence results are established, with respect to total variation and supremum norms under fairly natural conditions on the underlying distribution. For ease of exposition, we shall concentrate on the two most commonly encountered situations, where the state space is finite or continuous. All our results will establish uniform convergence, a strong form of geometric convergence, under appropriate regularity conditions. Uniform convergence is a useful property in its own right but also happens to be a sufficient condition for certain ergodic central limit theorems. Such results are important for estimation in Markov chain simulation but will not be considered in detail here
Wavelet threshold estimators for data with stationary correlated noise are constructed by the following prescription. First, form the discrete wavelet transform of the data points. Next, apply a level-dependent soft threshold to the individual coefficients, allowing the thresholds to depend on the level in the wavelet transform. Finally, transform back to obtain the estimate in the original domain. The threshold used at level j is s j p 2 log n, where s j is the standard deviation of the coefficients at that level, and n is the overall sample size. The minimax properties of the estimators are investigated by considering a general problem in multivariate normal decision theory, concerned with the estimation of the mean vector of a general multivariate normal distribution subject to squared error loss. An ideal risk is obtained by the use of an `oracle' that provides the optimum diagonal projection estimate. This `benchmark' risk can be considered in its own right as a measure of the s...
The implications of parameter orthogonality for the robustness of survival regression models are considered. The question of which of the proportional hazards or the accelerated life families of models would be more appropriate for analysis are usually ignored, and the proportional hazards family is applied, particularly in medicine, for reasons of convenience. Accelerated life models have conventionally been used in reliability applications. We propose a one-parameter family mixture survival model which includes both the accelerated life and proportional hazards models. By orthogonalizing relative to the mixture parameter, we are able to show that for small effects of the covariates, the regression parameters under the alternative families agree to within a constant. This recovers a known misspecification result. We use notions of parameter orthogonality to explore robustness to other types of misspecification including misspecified baseline hazards. The results hold in the presence o...
We consider the problem of selecting one model from a large class of plausible models. A predictive Bayesian viewpoint is advocated to avoid the specification of prior probabilities for the candidate models and the detailed interpretation of the parameters in each model. Using criteria derived from a certain predictive density and a prior specification that emphasizes the observables, we implement the proposed methodology for three common problems arising in normal linear models: variable subset selection, selection of a transformation of predictor variables and estimation of a parametric variance function. Interpretation of the relative magnitudes of the criterion values for various models is facilitated by a calibration of the criteria. Relationships between the proposed criteria and other well-known criteria are examined.
This article concentrates on the estimation of functions and images from noisy data using wavelet shrinkage. A modified form of twofold cross-validation is introduced to choose a threshold for wavelet shrinkage estimators operating on data sets of length a power of two. The cross-validation algorithm is then extended to data sets of any length and to multi-dimensional data sets. The algorithms are compared to established threshold choosers using simulation. An application to a real data set arising from anaesthesia is presented. Keywords: adaptive estimation; nonparametric regression; spatial adaptation; smoothing parameter; threshold; anaesthetics Journal of the Royal Statistical Society, Series B. (1996), 58, 463--479. c flRoyal Statistical Society 1 Introduction
This paper addresses the null distribution of the likelihood ratio statistic for threshold autoregression with normally distributed noise. The problem is non-standard because the threshold parameter is a nuisance parameter which is absent under the null hypothesis. We reduce the problem to the first-passage probability associated with a Gaussian process which, in some special cases, turns out to be a Brownian bridge. It is also shown that, in some specific cases, the asymptotic null distribution of the test statistic depends only on the `degrees of freedom' and not on the exact null joint distribution of the time series.
It is suggested to discriminate between different state space models for a given time series by means of a Bayesian approach which chooses the model that minimizes the expected loss. Practical implementation of this procedures requires a fully Bayesian analysis for both the state vector and the unknown hyperparameters which is carried out by Markov chain Monte Carlo methods. Application to some non-standard situations such as testing hypotheses on the boundary of the parameter space, discriminating non-nested models and discrimination of more than two models is discussed in detail. (author's abstract)
Given a random sample from a distribution with density function that depends on an unknown parameter θ = (θ1 . . ., θp), we are concerned with the problem of setting confidence intervals for a particular component of it, say θ1, treating the remaining components as a nuisance parameter. Adopting an objective Bayesian approach, we show that the Bayes intervals with a certain conditional prior density on the parameter of interest, θ1, are confidence intervals as well, having nearly the correct frequency of coverage. The frequentist performance of the proposed intervals is tested in a simulation study for gamma mean and shape parameters.
We prove that, under appropriate conditions, in a noisy environment an embedded deterministic dynamical system which admits a compact attractor can give rise to an ergodic stochastic system. This observation justifies the stochastic set-up in the study of deterministic chaos. We also clarify a folklore concerning polynomial autoregression.
Due to the lack of development in the probabilistic and statistical aspects of clustering research, clustering procedures are often regarded as heuristics generating artificial clusters from a given set of sample data. In this paper, a clustering procedure that is useful for drawing statistical inference about the underlying population from a random sample is developed. It is based on the uniformly consistent kth nearest neighbour density estimate, and is applicable to both case-by-variable data matrices and case-by-case dissimilarity matrices. The proposed clustering procedure is shown to be asymptotically consistent for high- density clusters in several dimensions, and its small-sample behaviour is illustrated by an empirical example.
A relation is developed between Spearman's coefficient of rank correlation r<sub>s</sub> and the inversions in the two rankings. This leads to an expression for the mean value of r<sub>s</sub> in samples from a finite population, and to the improvement of Daniels' inequality relating r<sub>s</sub> and Kendall's coefficient t.
Methods for studying the stability over time of regression relationships are considered. Recursive residuals, defined to be uncorrelated with zero means and constant variance, are introduced and tests based on the cusum and cusum of squares of recursive residuals are developed. Further techniques based on moving regressions, in which the regression model is fitted from a segment of data which is moved along the series, and on regression models whose coefficients are polynomials in time are studied. The Quandt log-likelihood ratio statistic is considered. Emphasis is placed on the use of graphical methods. The techniques proposed have been embodied in a comprehensive computer program, TIMVAR. Use of the techniques is illustrated by applying them to three sets of data.
Let F<sub>n</sub>(x) be the sample distribution function derived from a sample of independent uniform (0, 1) variables. The paper is mainly concerned with the orthogonal representation of the Cramer-von Mises statistic W<sup>2</sup><sub>n</sub> in the form Σ<sup>∞</sup><sub>j=1</sub> (jπ)<sup>-2</sup> z<sup>2</sup><sub>nj</sub> where the z<sub>nj</sub> are the principal components of $\sqrt n\{F_n(x) - x\}$ . It is shown that the z<sub>nj</sub> are identically distributed for each n and their significance points are tabulated. Their use for testing goodness of fit is discussed and their asymptotic powers are compared with those of W<sup>2</sup><sub>n</sub>, Anderson and Darling's statistic A<sup>2</sup><sub>n</sub> and Watson's U<sup>2</sup><sub>n</sub> against shifts of mean and variance in a normal distribution. The asymptotic significance points of the residual statistic W<sup>2</sup><sub>n</sub> - Σ<sup>p</sup><sub>j=1</sub> (jπ)<sup>-2</sup> z<sup>2</sup><sub>nj</sub> are also given for various p. It is shown that the components analogous to z<sub>nj</sub> for A<sup>2</sup><sub>n</sub> are the Legendre polynomial components introduced by Neyman as the basis for his "smooth" test of goodness of fit. The relationship of the components to a Fourier series analysis of F<sub>n</sub>(x) - x is discussed. An alternative set of components derived from Pyke's modification of the sample distribution function is considered. Tests based on the components z<sub>nj</sub> are applied to data on coal-mining disasters.
The paper introduces a class of linear growth curve models based on combinations of exponential dispersion models. The models have three components, namely a latent Markov growth process with stationary and independent increments, a noise component and a component describing variation between subjects. The models allow a wide range of continuous, discrete and mixed distributions. Estimation for the models is based on an estimating equation derived from an EM-like algorithm combined with Kalman smoothing.
Thesis (M.A. in Statistics)--University of California, June 1967. Bibliography: l. 25.
We consider the problem of testing a given open‐loop system for time‐dependence. The test is based on just a single realization of the input and a single realization of the output. It makes use of the concept of the “evolutionary cross‐spectra” of a non‐stationary vector process and rests essentially on testing the “uniformity” of a set of vectors, whose components consist of the “evolutionary gain‐spectra” and “evolutionary phase‐spectra”. Using a logarithmic transformation on the evolutionary gain‐spectra, we show that the mechanics of the test are formally equivalent to a two‐factor multivariate analysis of variance (MANOVA) procedure. Numerical illustrations of the proposed test procedure are included.
This paper addresses the issue of constructing large sample G-optimal designs when the variability of the response varies across a compact design space. A useful characterization theorem is presented along with a computer algorithm for generating (heteroscedastic) G-optimal designs. To facilitate comparisons between D- and G-optimal designs, C. L. Atwood’s inequality [Ann. Math. Stat. 40, 1570-1602 (1969; Zbl 0182.519)] for comparing D- and G-efficiencies in homoscedastic models is generalized to heteroscedastic models. Some robustess properties of these designs are presented.
There are many practical problems where the observed data are not drawn directly from the density g of real interest, but rather from another distribution derived from g by the application of an integral operator. The estimation of g then entails both statistical and numerical difficulties. A natural statistical approach is by maximum likelihood, conveniently implemented using the EM algorithm, but this provides unsatisfactory reconstructions of g. In this paper, we modify the maximum likelihood-EM approach by introducing a simple smoothing step at each EM iteration. In our experience, this algorithm converges in relatively few iterations to good estimates of g that do not depend on the choice of starting configuration. Some theoretical background is given that relates this smoothed EM algorithm to a maximum penalized likelihood approach. Two applications are considered in detail. The first is the classical stereology problem of determining particle size distributions from data collected on a plane section through a composite medium. The second concerns the recovery of the structure of a section of the human body from external observations obtained by positron emission tomography; for this problem, we also suggest several technical improvements on existing methodology.
We consider the estimation of the coefficients in a general linear regression model in which some of the explanatory variables are lagged values of the dependent variable. For discussing optimum properties the concept of best unbiased linear estimating equations is developed. It is shown that when the errors are normally distributed the method of least squares leads to optimum estimates. The properties of the least-squares estimates are shown to be the same asymptotically as those of the least-squares coefficients of ordinary regression models containing no lagged variables, whether or not the errors are normally distributed. Finally, a method of estimation is proposed for a different model which has no lagged dependent variables but in which the errors have an autoregressive structure. The method is shown to be efficient in large samples.
This paper describes a method for choosing a natural conjugate prior distribution for a normal linear sampling model. A person using the method to quantify his/her opinions performs specified elicitation tasks. The hyperparameters of the conjugate distribution are estimated from the elicited values. The method is designed to require elicitation tasks that people can perform competently and introduces a type of task not previously reported. A property of the method is that the assessed variance matrices are certain to be positive definite. The method is sufficiently simple to implement with an interactive computer program on a microcomputer.
It has been asserted in the literature that the low pass filtering of time series data may lead to erroneous results when calculating attractor dimensions. Here we prove that finite order, non-recursive filters do not have this effect. In fact, a generic, finite order, non-recursive filter leaves invariant all the quantities that can be estimated by using embedding techniques such as the method of delays.
Values of Cook's D for Multíple Cases and F Value for the International Phonee Calls Data. The Notation {AlE} Means that Set B is Completly Deleted from thee 
Two Largest Eigenvalues and its Eigenvectors and Univariate Cook's D for the 
This paper presents a new method to identify influential subsets in linear regression problems. The procedure uses the eigenstructure of an influence matrix which is defined as the matrix of uncentered covariance of the effect on the whole data set of deleting each observation, normalized to include the univariate Cook's statistics in the diagonal. It is shown that points in an influential subset will appear with large weight in at least one of the eigenvector linked to the largest eigenvalues in this influence matrix. The method is illustrated with several well-known examples in the literature, and in all of them it succeeds in identifying the relevant influential subsets.
Statistical methods are used increasingly in theoretical chemistry. Applications range from the use of stochastic relaxation techniques in determining minimum energy molecular configurations to dynamical Monte Carlo simulations of molecular motion in liquids. This paper focuses on diffusion-controlled reactions in radiation chemistry. Here the interest is in describing the evolution of isolated clusters, containing a few chemically active particles, resulting from the passage of ionising radiation through a liquid. The subsequent chemistry is determined by the rate at which the particles can encounter each other, pairwise, in the course of random motion. Classically the trajectories of the particles are described as sample paths of a continuous stochastic process, which in many cases can be assumed to be a diffusion. We have devised an approximate theory based on the approximation that pair distances evolve independently. The effect of this geometric distortion has been studied for small systems by simulation. The extension of this work to a wider class of realistic chemical processes poses many problems. Analytical progress is difficult and methods of simulating large spatial systems are not well developed. Problems which remain to be solved relate to the random geometry of the cluster as its constituent particles diffuse, the development of good approximations to first passage time distributions for diffusions with inhomogeneous drift and methods for the analysis of random motion in a liquid on the basis of detailed non-Markovian models of molecular movement.
The problem considered is that of determining the parameters in a lagged regression of one stationary time series on another when the lag is unknown and is not an integral multiple of the time interval between observations. The residual in the regression is also taken to be a stationary time series. The method of estimation is based on the maximization with respect to the lag of a form of autocorrelation between the two series. However, the autocorrelation is defined via the Fourier transformed data which enables non‐integral lags to be considered and an optimal weighting of frequencies to be introduced. The estimation procedure's validity depends upon the satisfaction of an identification (aliasing) condition. Limit theorems are proved for the estimates and the method is extended to multivariate regressions. The method is applied to some economic and to some oceanographic data.
Top-cited authors
Yoav Benjamini
  • Tel Aviv University
Arthur Dempster
  • Harvard University
Donald B. Rubin
  • Harvard University
Natalie Laird
  • University of Chester
David Cox
  • Nuffield College, Oxford