Statistical Science

Published by Institute of Mathematical Statistics
Online ISSN: 0883-4237
Publications
Article
Diverse analysis approaches have been proposed to distinguish data missing due to death from nonresponse, and to summarize trajectories of longitudinal data truncated by death. We demonstrate how these analysis approaches arise from factorizations of the distribution of longitudinal data and survival information. Models are illustrated using cognitive functioning data for older adults. For unconditional models, deaths do not occur, deaths are independent of the longitudinal response, or the unconditional longitudinal response is averaged over the survival distribution. Unconditional models, such as random effects models fit to unbalanced data, may implicitly impute data beyond the time of death. Fully conditional models stratify the longitudinal response trajectory by time of death. Fully conditional models are effective for describing individual trajectories, in terms of either aging (age, or years from baseline) or dying (years from death). Causal models (principal stratification) as currently applied are fully conditional models, since group differences at one timepoint are described for a cohort that will survive past a later timepoint. Partly conditional models summarize the longitudinal response in the dynamic cohort of survivors. Partly conditional models are serial cross-sectional snapshots of the response, reflecting the average response in survivors at a given timepoint rather than individual trajectories. Joint models of survival and longitudinal response describe the evolving health status of the entire cohort. Researchers using longitudinal data should consider which method of accommodating deaths is consistent with research aims, and use analysis methods accordingly.
 
Article
An early phase clinical trial is the first step in evaluating the effects in humans of a potential new anti-disease agent or combination of agents. Usually called "phase I" or "phase I/II" trials, these experiments typically have the nominal scientific goal of determining an acceptable dose, most often based on adverse event probabilities. This arose from a tradition of phase I trials to evaluate cytotoxic agents for treating cancer, although some methods may be applied in other medical settings, such as treatment of stroke or immunological diseases. Most modern statistical designs for early phase trials include model-based, outcome-adaptive decision rules that choose doses for successive patient cohorts based on data from previous patients in the trial. Such designs have seen limited use in clinical practice, however, due to their complexity, the requirement of intensive, computer-based data monitoring, and the medical community's resistance to change. Still, many actual applications of model-based outcome-adaptive designs have been remarkably successful in terms of both patient benefit and scientific outcome. In this paper, I will review several Bayesian early phase trial designs that were tailored to accommodate specific complexities of the treatment regime and patient outcomes in particular clinical settings.
 
Article
In 1951 Robbins and Monro published the seminal paper on stochastic approximation and made a specific reference to its application to the "estimation of a quantal using response, non-response data". Since the 1990s, statistical methodology for dose-finding studies has grown into an active area of research. The dose-finding problem is at its core a percentile estimation problem and is in line with what the Robbins-Monro method sets out to solve. In this light, it is quite surprising that the dose-finding literature has developed rather independently of the older stochastic approximation literature. The fact that stochastic approximation has seldom been used in actual clinical studies stands in stark contrast with its constant application in engineering and finance. In this article, I explore similarities and differences between the dose-finding and the stochastic approximation literatures. This review also sheds light on the present and future relevance of stochastic approximation to dose-finding clinical trials. Such connections will in turn steer dose-finding methodology on a rigorous course and extend its ability to handle increasingly complex clinical situations.
 
Article
Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article, we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data.
 
Article
Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of ``promising'' SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a ``replication'' panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent ``exact replication'' study is needed in a similar population of the same promising SNPs using similar methods.
 
Article
Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases. We discuss prerequisites for exact replication; issues of heterogeneity; advantages and disadvantages of different methods of data synthesis across multiple studies; frequentist vs. Bayesian inferences for replication; and challenges that arise from multi-team collaborations. While consistent replication can greatly improve the credibility of a genotype-phenotype association, it may not eliminate spurious associations due to biases shared by many studies. Conversely, lack of replication in well-powered follow-up studies usually invalidates the initially proposed association, although occasionally it may point to differences in linkage disequilibrium or effect modifiers across studies.
 
Article
Genome-wide association studies, in which as many as a million single nucleotide polymorphisms (SNP) are measured on several thousand samples, are quickly becoming a common type of study for identifying genetic factors associated with many phenotypes. There is a strong assumption that interactions between SNPs or genes and interactions between genes and environmental factors substantially contribute to the genetic risk of a disease. Identification of such interactions could potentially lead to increased understanding about disease mechanisms; drug × gene interactions could have profound applications for personalized medicine; strong interaction effects could be beneficial for risk prediction models. In this paper we provide an overview of different approaches to model interactions, emphasizing approaches that make specific use of the structure of genetic data, and those that make specific modeling assumptions that may (or may not) be reasonable to make. We conclude that to identify interactions it is often necessary to do some selection of SNPs, for example, based on prior hypothesis or marginal significance, but that to identify SNPs that are marginally associated with a disease it may also be useful to consider larger numbers of interactions.
 
Article
Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When unmeasured confounding introduces spatial structure into the residuals, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual: one can reduce bias by fitting a spatial model only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals are independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.
 
Article
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labelled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.
 
Article
This paper considers conducting inference about the effect of a treatment (or exposure) on an outcome of interest. In the ideal setting where treatment is assigned randomly, under certain assumptions the treatment effect is identifiable from the observable data and inference is straightforward. However, in other settings such as observational studies or randomized trials with noncompliance, the treatment effect is no longer identifiable without relying on untestable assumptions. Nonetheless, the observable data often do provide some information about the effect of treatment, that is, the parameter of interest is partially identifiable. Two approaches are often employed in this setting: (i) bounds are derived for the treatment effect under minimal assumptions, or (ii) additional untestable assumptions are invoked that render the treatment effect identifiable and then sensitivity analysis is conducted to assess how inference about the treatment effect changes as the untestable assumptions are varied. Approaches (i) and (ii) are considered in various settings, including assessing principal strata effects, direct and indirect effects and effects of time-varying exposures. Methods for drawing formal inference about partially identified parameters are also discussed.
 
Article
When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970's, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine, and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods-or developing methods related to matching-do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.
 
Distribution of log likelihood ratio test statistics (or lod scores) given observed data and complete data. The contour plots display the joint distribution of the log likelihood ratio test statistics given the observed data and the complete data. Given n 0 = 800 and n = 1000 , the ratio between the complete data log LR and the observed data log LR is expected to be n/n 0 = 1 . 25 . In each contour plot, a dotted line is plotted to indicate the y = 1 . 25 x line. The gray broken lines display y = rx with r varying and provide reference for the empirical ratio of the complete data log LR and the observed data log LR. 
Estimated standard deviation of R I y − 1 . For sample size n = 100 , 1000 , we plot the estimated standard deviation of R I y − 1 against the observed number of successes x 0 . Density curves of observed number of successes x 0 under different true p 
Article
Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we show that a satisfactory, likelihood-based general solution exists by using appropriate forms of the relative Kullback--Leibler information, and that the proposed measures are computationally inexpensive given the maximized likelihoods with the observed data. Two measures are introduced, under the null and alternative hypothesis respectively. We exemplify the measures on data coming from mapping studies on the inflammatory bowel disease and diabetes. For small-sample problems, which appear rather frequently in practice and sometimes in disguised forms (e.g., measuring individual contributions to a large study), the robust Bayesian approach holds great promise, though the choice of a general-purpose "default prior" is a very challenging problem.
 
Article
Indirect evidence is crucial for successful statistical practice. Sometimes, however, it is better used informally. Future efforts should be directed toward understanding better the connection between statistical methods and scientific problems.
 
Efficiency comparison of GP MCMC methods
Cox GP model with large p: Simulated data (n = 100, p = 1,000). Average survivor function curve on the validation set (dashed line) compared to the Kaplan–Meier empirical estimate (solid line).  
Ozone data: Results
Boston housing data: GBM covariate analysis. Left-hand chart provides variables importance, normalized to sum up to 100. Right-hand plot enumerates partial association of x13 to the response.
Article
This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets.
 
Article
During the last twenty years there have been considerable methodological developments in the design and analysis of Phase 1, Phase 2 and Phase 1/2 dose-finding studies. Many of these developments are related to the continual reassessment method (CRM), first introduced by O'Quigley, Pepe and Fisher (1990). CRM models have proven themselves to be of practical use and, in this discussion, we investigate the basic approach, some connections to other methods, some generalizations, as well as further applications of the model. We obtain some new results which can provide guidance in practice.
 
Article
We review the class of species sampling models (SSM). In particular, we investigate the relation between the exchangeable partition probability function (EPPF) and the predictive probability function (PPF). It is straightforward to define a PPF from an EPPF, but the converse is not necessarily true. In this paper we introduce the notion of putative PPFs and show novel conditions for a putative PPF to define an EPPF. We show that all possible PPFs in a certain class have to define (unnormalized) probabilities for cluster membership that are linear in cluster size. We give a new necessary and sufficient condition for arbitrary putative PPFs to define an EPPF. Finally, we show posterior inference for a large class of SSMs with a PPF that is not linear in cluster size and discuss a numerical method to derive its PPF.
 
Article
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.
 
Approximation of a face image by rank-49 NNMF: coefficients × basis images = approximate image.
Article
This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.
 
Article
This review article provides an overview of recent work in the modelling and analysis of recurrent events arising in engineering, reliability, public health, biomedical, and other areas. Recurrent event modelling possesses unique facets making it different and more difficult to handle than single event settings. For instance, the impact of an increasing number of event occurrences needs to be taken into account, the effects of covariates should be considered, potential association among the inter-event times within a unit cannot be ignored, and the effects of performed interventions after each event occurrence need to be factored in. A recent general class of models for recurrent events which simultaneously accommodates these aspects is described. Statistical inference methods for this class of models are presented and illustrated through applications to real data sets. Some existing open research problems are described.
 
Article
The pretest-posttest study is commonplace in numerous applications. Typically, subjects are randomized to two treatments, and response is measured at baseline, prior to intervention with the randomized treatment (pretest), and at prespecified follow-up time (posttest). Interest focuses on the effect of treatments on the change between mean baseline and follow-up response. Missing posttest response for some subjects is routine, and disregarding missing cases can lead to invalid inference. Despite the popularity of this design, a consensus on an appropriate analysis when no data are missing, let alone for taking into account missing follow-up, does not exist. Under a semiparametric perspective on the pretest-posttest model, in which limited distributional assumptions on pretest or posttest response are made, we show how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class. We then describe how the theoretical results translate into practice. The development not only shows how a unified framework for inference in this setting emerges from the Robins, Rotnitzky and Zhao theory, but also provides a review and demonstration of the key aspects of this theory in a familiar context. The results are also relevant to the problem of comparing two treatment means with adjustment for baseline covariates.
 
Article
Randomized clinical trials can present a scientific/ethical dilemma for clinical investigators. Statisticians have tended to focus on only one side of this dilemma, emphasizing the statistical and scientific advantages of randomized trials. Here we look at the other side, examining the personal care principle on which the physician-patient relationship is based and observing how that principle can make it difficult or impossible for a physician to participate in a randomized clinical study. We urge that the view that randomized clinical trials are the only scientifically valid means of resolving controversies about therapies is mistaken, and we suggest that a faulty statistical principle is partly to blame for this misconception. We conclude that statisticians should be more sensitive to the physician's responsibility to the individual patient and should, besides promoting randomized trials when they are ethically and practically feasible, work to improve the planning, execution, and analysis of nonrandomized clinical studies.
 
Article
Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.
 
Article
Genetic investigations often involve the testing of vast numbers of related hypotheses simultaneously. To control the overall error rate, a substantial penalty is required, making it difficult to detect signals of moderate strength. To improve the power in this setting, a number of authors have considered using weighted p-values, with the motivation often based upon the scientific plausibility of the hypotheses. We review this literature, derive optimal weights and show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. The first, external weighting, is based on prior information. The second, estimated weighting, uses the data to choose weights.
 
Article
Familiar statistical tests and estimates are obtained by the direct observation of cases of interest: a clinical trial of a new drug, for instance, will compare the drug's effects on a relevant set of patients and controls. Sometimes, though, indirect evidence may be temptingly available, perhaps the results of previous trials on closely related drugs. Very roughly speaking, the difference between direct and indirect statistical evidence marks the boundary between frequentist and Bayesian thinking. Twentieth-century statistical practice focused heavily on direct evidence, on the grounds of superior objectivity. Now, however, new scientific devices such as microarrays routinely produce enormous data sets involving thousands of related situations, where indirect evidence seems too important to ignore. Empirical Bayes methodology offers an attractive direct/indirect compromise. There is already some evidence of a shift toward a less rigid standard of statistical objectivity that allows better use of indirect evidence. This article is basically the text of a recent talk featuring some examples from current practice, with a little bit of futuristic speculation.
 
Article
Causal inference with interference is a rapidly growing area. The literature has begun to relax the "no-interference" assumption that the treatment received by one individual does not affect the outcomes of other individuals. In this paper we briefly review the literature on causal inference in the presence of interference when treatments have been randomized. We then consider settings in which causal effects in the presence of interference are not identified, either because randomization alone does not suffice for identification, or because treatment is not randomized and there may be unmeasured confounders of the treatment-outcome relationship. We develop sensitivity analysis techniques for these settings. We describe several sensitivity analysis techniques for the infectiousness effect which, in a vaccine trial, captures the effect of the vaccine of one person on protecting a second person from infection even if the first is infected. We also develop two sensitivity analysis techniques for causal effects in the presence of unmeasured confounding which generalize analogous techniques when interference is absent. These two techniques for unmeasured confounding are compared and contrasted.
 
Article
Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.
 
Article
Probabilistic and statistical models for the occurrence of a recurrent event over time are described. These models have applicability in the reliability, engineering, biomedical and other areas where a series of events occurs for an experimental unit as time progresses. Nonparametric inference methods, in particular, the estimation of a relevant distribution function, are described.
 
Scatterplots of relative RSSE versus the condition number of the matrix X W X for  
Article
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.
 
Article
In 1922 R. A. Fisher introduced the method of maximum likelihood. He first presented the numerical procedure in 1912, This paper considers Fisher's changing justifications for the method, the concepts he developed around it (including likelihood, sufficiency, efficiency and information) and the approaches he discarded (including inverse probability).
 
Article
William Kruskal (Bill) was a distinguished statistician who spent virtually his entire professional career at the University of Chicago, and who had a lasting impact on the Institute of Mathematical Statistics and on the field of statistics more broadly, as well as on many who came in contact with him. Bill passed away last April following an extended illness, and on May 19, 2005, the University of Chicago held a memorial service at which several of Bill's colleagues and collaborators spoke along with members of his family and other friends. This biography and the accompanying commentaries derive in part from brief presentations on that occasion, along with recollections and input from several others. Bill was known personally to most of an older generation of statisticians as an editor and as an intellectual and professional leader. In 1994, Statistical Science published an interview by Sandy Zabell (Vol. 9, 285--303) in which Bill looked back on selected events in his professional life. One of the purposes of the present biography and accompanying commentaries is to reintroduce him to old friends and to introduce him for the first time to new generations of statisticians who never had an opportunity to interact with him and to fall under his influence.
 
Article
Howard Raiffa earned his bachelor's degree in mathematics, his master's degree in statistics and his Ph.D. in mathematics at the University of Michigan. Since 1957, Raiffa has been a member of the faculty at Harvard University, where he is now the Frank P. Ramsey Chair in Managerial Economics (Emeritus) in the Graduate School of Business Administration and the Kennedy School of Government. A pioneer in the creation of the field known as decision analysis, his research interests span statistical decision theory, game theory, behavioral decision theory, risk analysis and negotiation analysis. Raiffa has supervised more than 90 doctoral dissertations and written 11 books. His new book is Negotiation Analysis: The Science and Art of Collaborative Decision Making. Another book, Smart Choices, co-authored with his former doctoral students John Hammond and Ralph Keeney, was the CPR (formerly known as the Center for Public Resources) Institute for Dispute Resolution Book of the Year in 1998. Raiffa helped to create the International Institute for Applied Systems Analysis and he later became its first Director, serving in that capacity from 1972 to 1975. His many honors and awards include the Distinguished Contribution Award from the Society of Risk Analysis; the Frank P. Ramsey Medal for outstanding contributions to the field of decision analysis from the Operations Research Society of America; and the Melamed Prize from the University of Chicago Business School for The Art and Science of Negotiation. He earned a Gold Medal from the International Association for Conflict Management and a Lifetime Achievement Award from the CPR Institute for Dispute Resolution. He holds honorary doctor's degrees from Carnegie Mellon University, the University of Michigan, Northwestern University, Ben Gurion University of the Negev and Harvard University. The latter was awarded in 2002.
 
Article
Discussion of ``The William Kruskal Legacy: 1919--2005'' by Stephen E. Fienberg, Stephen M. Stigler and Judith M. Tanur [arXiv:0710.5063]
 
Article
It was known from Metropolis et al. [J. Chem. Phys. 21 (1953) 1087--1092] that one can sample from a distribution by performing Monte Carlo simulation from a Markov chain whose equilibrium distribution is equal to the target distribution. However, it took several decades before the statistical community embraced Markov chain Monte Carlo (MCMC) as a general computational tool in Bayesian inference. The usual reasons that are advanced to explain why statisticians were slow to catch on to the method include lack of computing power and unfamiliarity with the early dynamic Monte Carlo papers in the statistical physics literature. We argue that there was a deeper reason, namely, that the structure of problems in the statistical mechanics and those in the standard statistical literature are different. To make the methods usable in standard Bayesian problems, one had to exploit the power that comes from the introduction of judiciously chosen auxiliary variables and collective moves. This paper examines the development in the critical period 1980--1990, when the ideas of Markov chain simulation from the statistical physics literature and the latent variable formulation in maximum likelihood computation (i.e., EM algorithm) came together to spark the widespread application of MCMC methods in Bayesian computation.
 
Article
We present a simulation-based study in which the results of two major exit polls conducted during the recall referendum that took place in Venezuela on August 15, 2004, are compared to the official results of the Venezuelan National Electoral Council "Consejo Nacional Electoral" (CNE). The two exit polls considered here were conducted independently by S\'{u}mate, a nongovernmental organization, and Primero Justicia, a political party. We find significant discrepancies between the exit poll data and the official CNE results in about 60% of the voting centers that were sampled in these polls. We show that discrepancies between exit polls and official results are not due to a biased selection of the voting centers or to problems related to the size of the samples taken at each center. We found discrepancies in all the states where the polls were conducted. We do not have enough information on the exit poll data to determine whether the observed discrepancies are the consequence of systematic biases in the selection of the people interviewed by the pollsters around the country. Neither do we have information to study the possibility of a high number of false or nonrespondents. We have limited data suggesting that the discrepancies are not due to a drastic change in the voting patterns that occurred after the exit polls were conducted. We notice that the two exit polls were done independently and had few centers in common, yet their overall results were very similar.
 
Article
Statistical comparisons of electoral variables are made between groups of electronic voting machines and voting centers classified by types of transmissions according to the volume of traffic in incoming and outgoing data of machines from and toward the National Electoral Council (CNE) totalizing servers. One unexpectedly finds two types of behavior in wire telephony data transmissions and only one type where cellular telephony is employed, contravening any reasonable electoral normative. Differentiation in data transmissions arise when comparing number of incoming and outgoing data bytes per machine against total number of votes per machine reported officially by the CNE. The respective distributions of electoral variables for each type of transmission show that the groups classified by it do not correspond to random sets of the electoral universe. In particular, the distributions for the NO percentage of votes per machine differ statistically across groups. The presidential elections of 1998, 2000 and the 2004 Presidential Recall Referendum (2004 PRR) are compared according to the type of transmissions in 2004 PRR. Statistically, the difference between the empirical distributions of the 2004 PRR NO results and the 2000 Chavez votes results by voting centers is not significant.
 
Article
A referendum to recall President Hugo Ch\'{a}vez was held in Venezuela in August of 2004. In the referendum, voters were to vote YES if they wished to recall the President and NO if they wanted him to continue in office. The official results were 59% NO and 41% YES. Even though the election was monitored by various international groups including the Organization of American States and the Carter Center (both of which declared that the referendum had been conducted in a free and transparent manner), the outcome of the election was questioned by other groups both inside and outside of Venezuela. The collection of manuscripts that comprise this issue of Statistical Science discusses the general topic of election forensics but also focuses on different statistical approaches to explore, post-election, whether irregularities in the voting, vote transmission or vote counting processes could be detected in the 2004 presidential recall referendum. In this introduction to the Venezuela issue, we discuss the more recent literature on post-election auditing, describe the institutional context for the 2004 Venezuelan referendum, and briefly introduce each of the five contributions.
 
Article
On August 15th, 2004, Venezuelans had the opportunity to vote in a Presidential Recall Referendum to decide whether or not President Hugo Ch\'{a}vez should be removed from office. The process was largely computerized using a touch-screen system. In general the ballots were not manually counted. The significance of the high linear correlation (0.99) between the number of requesting signatures for the recall petition and the number of opposition votes in computerized centers is analyzed. The same-day audit was found to be not only ineffective but a source of suspicion. Official results were compared with the 1998 presidential election and other electoral events and distortions were found.
 
Article
Jerzy Neyman's life history and some of his contributions to applied statistics are reviewed. In a 1960 article he wrote: ``Currently in the period of dynamic indeterminism in science, there is hardly a serious piece of research which, if treated realistically, does not involve operations on stochastic processes. The time has arrived for the theory of stochastic processes to become an item of usual equipment of every applied statistician.'' The emphasis in this article is on stochastic processes and on stochastic process data analysis. A number of data sets and corresponding substantive questions are addressed. The data sets concern sardine depletion, blowfly dynamics, weather modification, elk movement and seal journeying. Three of the examples are from Neyman's work and four from the author's joint work with collaborators.
 
Article
Rejoinder to ``The 2005 Neyman Lecture: Dynamic Indeterminism in Science'' [arXiv:0808.0620]
 
Article
Comment on ``The 2005 Neyman Lecture: Dynamic Indeterminism in Science'' [arXiv:0808.0620]
 
Article
Comment on ``The 2005 Neyman Lecture: Dynamic Indeterminism in Science'' [arXiv:0808.0620]
 
Article
We review basic modeling approaches for failure and maintenance data from repairable systems. In particular we consider imperfect repair models, defined in terms of virtual age processes, and the trend-renewal process which extends the nonhomogeneous Poisson process and the renewal process. In the case where several systems of the same kind are observed, we show how observed covariates and unobserved heterogeneity can be included in the models. We also consider various approaches to trend testing. Modern reliability data bases usually contain information on the type of failure, the type of maintenance and so forth in addition to the failure times themselves. Basing our work on recent literature we present a framework where the observed events are modeled as marked point processes, with marks labeling the types of events. Throughout the paper the emphasis is more on modeling than on statistical inference.
 
Article
Response to "Discussion of "Search for the Wreckage of Air France Flight AF 447" by by Lawrence D. Stone, Colleen M. Keller, Thomas M. Kratzke, Johan P. Strumpfer [arXiv:1405.4720]" by A. H. Welsh [arXiv:1405.4991].
 
Article
In the early morning hours of June 1, 2009, during a flight from Rio de Janeiro to Paris, Air France Flight AF 447 disappeared during stormy weather over a remote part of the Atlantic carrying 228 passengers and crew to their deaths. After two years of unsuccessful search, the authors were asked by the French Bureau d'Enqu\^{e}tes et d'Analyses pour la s\'{e}curit\'{e} de l'aviation to develop a probability distribution for the location of the wreckage that accounted for all information about the crash location as well as for previous search efforts. We used a Bayesian procedure developed for search planning to produce the posterior target location distribution. This distribution was used to guide the search in the third year, and the wreckage was found with one week of undersea search. In this paper we discuss why Bayesian analysis is ideally suited to solving this problem, review previous non-Bayesian efforts, and describe the methodology used to produce the posterior probability distribution for the location of the wreck.
 
Article
Donald (Don) Arthur Berry, born May 26, 1940 in Southbridge, Massachusetts, earned his A.B. degree in mathematics from Dartmouth College and his M.A. and Ph.D. in statistics from Yale University. He served first on the faculty at the University of Minnesota and subsequently held endowed chair positions at Duke University and The University of Texas M.D. Anderson Center. At the time of the interview he served as Head of the Division of Quantitative Sciences, and Chairman and Professor of the Department of Biostatistics at UT M.D. Anderson Center.
 
Article
November 27, 2004, marked the 250th anniversary of the death of Abraham De Moivre, best known in statistical circles for his famous large-sample approximation to the binomial distribution, whose generalization is now referred to as the Central Limit Theorem. De Moivre was one of the great pioneers of classical probability theory. He also made seminal contributions in analytic geometry, complex analysis and the theory of annuities. The first biography of De Moivre, on which almost all subsequent ones have since relied, was written in French by Matthew Maty. It was published in 1755 in the Journal britannique. The authors provide here, for the first time, a complete translation into English of Maty's biography of De Moivre. New material, much of it taken from modern sources, is given in footnotes, along with numerous annotations designed to provide additional clarity to Maty's biography for contemporary readers. Comment: Published at http://dx.doi.org/10.1214/088342306000000268 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)
 
Article
Engineers in the manufacturing industries have used accelerated test (AT) experiments for many decades. The purpose of AT experiments is to acquire reliability information quickly. Test units of a material, component, subsystem or entire systems are subjected to higher-than-usual levels of one or more accelerating variables such as temperature or stress. Then the AT results are used to predict life of the units at use conditions. The extrapolation is typically justified (correctly or incorrectly) on the basis of physically motivated models or a combination of empirical model fitting with a sufficient amount of previous experience in testing similar units. The need to extrapolate in both time and the accelerating variables generally necessitates the use of fully parametric models. Statisticians have made important contributions in the development of appropriate stochastic models for AT data [typically a distribution for the response and regression relationships between the parameters of this distribution and the accelerating variable(s)], statistical methods for AT planning (choice of accelerating variable levels and allocation of available test units to those levels) and methods of estimation of suitable reliability metrics. This paper provides a review of many of the AT models that have been used successfully in this area.
 
For the example in Section 2: the third-order Bayesian survivor function, the third-order frequentist p-value and the first-order SLR p-value.  
Article
Recent likelihood theory produces $p$-values that have remarkable accuracy and wide applicability. The calculations use familiar tools such as maximum likelihood values (MLEs), observed information and parameter rescaling. The usual evaluation of such $p$-values is by simulations, and such simulations do verify that the global distribution of the $p$-values is uniform(0, 1), to high accuracy in repeated sampling. The derivation of the $p$-values, however, asserts a stronger statement, that they have a uniform(0, 1) distribution conditionally, given identified precision information provided by the data. We take a simple regression example that involves exact precision information and use large sample techniques to extract highly accurate information as to the statistical position of the data point with respect to the parameter: specifically, we examine various $p$-values and Bayesian posterior survivor $s$-values for validity. With observed data we numerically evaluate the various $p$-values and $s$-values, and we also record the related general formulas. We then assess the numerical values for accuracy using Markov chain Monte Carlo (McMC) methods. We also propose some third-order likelihood-based procedures for obtaining means and variances of Bayesian posterior distributions, again followed by McMC assessment. Finally we propose some adaptive McMC methods to improve the simulation acceptance rates. All these methods are based on asymptotic analysis that derives from the effect of additional data. And the methods use simple calculations based on familiar maximizing values and related informations. The example illustrates the general formulas and the ease of calculations, while the McMC assessments demonstrate the numerical validity of the $p$-values as percentage position of a data point. The example, however, is very simple and transparent, and thus gives little indication that in a wide generality of models the formulas do accurately separate information for almost any parameter of interest, and then do give accurate $p$-value determinations from that information. As illustration an enigmatic problem in the literature is discussed and simulations are recorded; various examples in the literature are cited.
 
Article
Discussion of "Instrumental Variables: An Econometrician's Perspective" by Guido W. Imbens [arXiv:1410.0163].
 
Top-cited authors
Rob Tibshirani
  • Stanford University
Donald B. Rubin
  • Harvard University
Trevor Hastie
  • Stanford University
William J. Welch
  • University of British Columbia - Vancouver
Henry P Wynn
  • The London School of Economics and Political Science