A discussion of the 1980 U.S. census is presented. The authors suggest that the taking of a national census is not just a statistical exercise, but an exercise involving ethics, epistemology, law, and politics. They contend that conducting a national census can be defined as an ill-structured problem in which the various complexities imposed by multidisciplinarity cannot be separated. "The 1980 census is discussed as an ill-structured problem, and a method for treating such problems is presented, within which statistical information is only one component."
A basic change concerning the racial classification of persons of Spanish origin used in the 1980 U.S. census is examined for its impact on white and nonwhite population counts, particularly in urban areas. "Arrest rates by race for central city Phoenix together with 1980 census data by race and ethnicity for Phoenix and 11 other central cities are used to illustrate the substantive effect of changes in the white and 'other race' counts produced by this change in procedure." The authors consider "remedies for the problems faced by those using published census data..., and one possibility for creating comparable rates is presented. Closely related complications created by the failure of the Office of Management and Budget to arrive at a single, logical statistical standard for the classification of U.S. residents by race and ethnicity are also identified."
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.
Assuming a binary outcome, logistic regression is the most common approach to estimating a crude or adjusted odds ratio corresponding to a continuous predictor. We revisit a method termed the discriminant function approach, which leads to closed-form estimators and corresponding standard errors. In its most appealing application, we show that the approach suggests a multiple linear regression of the continuous predictor of interest on the outcome and other covariates, in place of the traditional logistic regression model. If standard diagnostics support the assumptions (including normality of errors) accompanying this linear regression model, the resulting estimator has demonstrable advantages over the usual maximum likelihood estimator via logistic regression. These include improvements in terms of bias and efficiency based on a minimum variance unbiased estimator of the log odds ratio, as well as the availability of an estimate when logistic regression fails to converge due to a separation of data points. Use of the discriminant function approach as described here for multivariable analysis requires less stringent assumptions than those for which it was historically criticized, and is worth considering when the adjusted odds ratio associated with a particular continuous predictor is of primary interest. Simulation and case studies illustrate these points.
"Uncertainty in statistics and demographic projections for aging and other policy purposes comes from four sources: differences in definitions, sampling error, nonsampling error, and scientific uncertainty. Some of these uncertainties can be reduced by proper planning and coordination, but most often decisions have to be made in the face of some remaining uncertainty. Although decision makers have a tendency to ignore uncertainty, doing so does not lead to good policy-making. Techniques for estimating and reporting on uncertainty include sampling theory, assessment of experts' subjective distributions, sensitivity analysis, and multiple independent estimates." The primary geographical focus is on the United States.
We propose two innovations in statistical sampling for controls to enable better design of population-based case-control studies. The main innovation leads to novel solutions, without using weights, of the difficult and long-standing problem of selecting a control from persons in a household. Another advance concerns the drawing (at the outset) of the households themselves and involves random-digit dialing with atypical use of list-assisted sampling. A common element throughout is that one capitalizes on flexibility (not broadly available in usual survey settings) in choosing the frame, which specifies the population of persons from which both cases and controls come.
"Two simple current life table estimators of conditional probabilities of death result from making either a uniform or exponential distributional assumption of time at death in the age interval. Each is compared with Chiang's estimator based on the concept of fraction of the last age interval of life. Graphical and numerical results are presented to assess the magnitude and direction of differences between estimators when the true value of Chiang's fraction takes on specific values."
Statistical experiments, more commonly referred to as Monte Carlo or simulation studies, are used to study the behavior of statistical methods and measures under controlled situations. Whereas recent computing and methodological advances have permitted increased efficiency in the simulation process, known as variance reduction, such experiments remain limited by their finite nature and hence are subject to uncertainty; when a simulation is run more than once, different results are obtained. However, virtually no emphasis has been placed on reporting the uncertainty, referred to here as Monte Carlo error, associated with simulation results in the published literature, or on justifying the number of replications used. These deserve broader consideration. Here we present a series of simple and practical methods for estimating Monte Carlo error as well as determining the number of replications required to achieve a desired level of accuracy. The issues and methods are demonstrated with two simple examples, one evaluating operating characteristics of the maximum likelihood estimator for the parameters in logistic regression and the other in the context of using the bootstrap to obtain 95% confidence intervals. The results suggest that in many settings, Monte Carlo error may be more substantial than traditionally thought.
Cox proportional hazards (PH) models are commonly used in medical research to investigate the associations between covariates and time to event outcomes. It is frequently noted that with less than ten events per covariate, these models produce spurious results, and therefore, should not be used. Statistical literature contains asymptotic power formulae for the Cox model which can be used to determine the number of events needed to detect an association. Here we investigate via simulations the performance of these formulae in small sample settings for Cox models with 1- or 2-covariates. Our simulations indicate that, when the number of events is small, the power estimate based on the asymptotic formulae is often inflated. The discrepancy between the asymptotic and empirical power is larger for the dichotomous covariate especially in cases where allocation of sample size to its levels is unequal. When more than one covariate is included in the same model, the discrepancy between the asymptotic power and the empirical power is even larger, especially when a high positive correlation exists between the two covariates.
Several aspects of the disparity in birth ratio of males over females are discussed including variations among different races, variations by order of birth, by age of the parent, and in multiple births. Avenues of statistical exploration are suggested in an attempt to indicate certain peculiarities in nature. The Negro population in the United States has a sex ratio of 102 males to 100 females as opposed to 105:100 for whites, a highly significant difference. Inferences from these statistics are suggested for study of the sex ratios of mixed unions. The group classified as Mulatto show a lower sex ratio and further analysis of this was suggested including examination of slave records. For the white population sex ratio declines from 106.2 to 102.9 between 1st order and 7th order births. This is highly significant. However, nonwhite determinations were more irregular. Data limitations on sex ratio by age of parent prevented conclusive results. Multiple births among whites show a decline from 105.3 for single live births to 103.2 for twins and 86.1 for all other plural deliveries. Among nonwhites these ratios are 102.3, 99.7, and 102.6 respectively. Further information should be developed using the multiple facts relating to the sex ratio at birth.
We consider the problem of estimating the correlation in bivariate normal data when the means and variances are assumed known, with emphasis on the small sample case. We consider eight different estimators, several of them considered here for the first time in the literature. In a simulation study, we found that Bayesian estimators using the uniform and arc-sine priors outperformed several empirical and exact or approximate maximum likelihood estimators in small samples. The arc-sine prior did better for large values of the correlation. For testing whether the correlation is zero, we found that Bayesian hypothesis tests outperformed significance tests based on the empirical and exact or approximate maximum likelihood estimators considered in small samples, but that all tests performed similarly for sample size 50. These results lead us to suggest using the posterior mean with the arc-sine prior to estimate the correlation in small samples when the variances are assumed known.
The power of a test, the probability of rejecting the null hypothesis in favor of an alternative, may be computed using estimates of one or more distributional parameters. Statisticians frequently fix mean values and calculate power or sample size using a variance estimate from an existing study. Hence computed power becomes a random variable for a fixed sample size. Likewise, the sample size necessary to achieve a fixed power varies randomly. Standard statistical practice requires reporting uncertainty associated with such point estimates. Previous authors studied an asymptotically unbiased method of obtaining confidence intervals for noncentrality and power of the general linear univariate model in this setting. We provide exact confidence intervals for noncentrality, power, and sample size. Such confidence intervals, particularly one-sided intervals, help in planning a future study and in evaluating existing studies.
Population data, particularly those related to national census and survey information, serve the program and planning needs of a broad range of both public and private activities. In recent years, demand for such demographic information has increased particularly rapidly at the state and local levels of government.In response, many states have established identifiable governmental units responsible for providing a range of census-related services and data to both public and private agencies. These units also maintain gubernatorially approved liaison activities with the U.S. Bureau of the Census.Such state government demographic centers vary considerably among states with respect to organizational location, resources, and program scope, according to the results of a recent survey of state governments in the South. This article presents the results of this survey, which is the first systematic description of these state centers; the article discusses the role of state demographic centers in the context of a larger national system of census data use.
Effective component relabeling in Bayesian analyses of mixture models is critical to the routine use of mixtures in classification with analysis based on Markov chain Monte Carlo methods. The classification-based relabeling approach here is computationally attractive and statistically effective, and scales well with sample size and number of mixture components concordant with enabling routine analyses of increasingly large data sets. Building on the best of existing methods, practical relabeling aims to match data:component classification indicators in MCMC iterates with those of a defined reference mixture distribution. The method performs as well as or better than existing methods in small dimensional problems, while being practically superior in problems with larger data sets as the approach is scalable. We describe examples and computational benchmarks, and provide supporting code with efficient computational implementation of the algorithm that will be of use to others in practical applications of mixture models.
Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.
It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10(-8), which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.
This paper shows that, when variables with missing values are linearly related to observed variables, the normal-distribution-based pseudo MLEs are still consistent. The population distribution may be unknown while the missing data process can follow an arbitrary missing at random mechanism. Enough details are provided for the bivariate case so that readers having taken a course in statistics/probability can fully understand the development. Sufficient conditions for the consistency of the MLEs in higher dimensions are also stated, while the details are omitted.
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.
The traditional statistical approach to the evaluation of diagnostic tests, prediction models and molecular markers is to assess their accuracy, using metrics such as sensitivity, specificity and the receiver-operating-characteristic curve. However, there is no obvious association between accuracy and clinical value: it is unclear, for example, just how accurate a test needs to be in order for it to be considered "accurate enough" to warrant its use in patient care. Decision analysis aims to assess the clinical value of a test by assigning weights to each possible consequence. These methods have been historically considered unattractive to the practicing biostatistician because additional data from the literature, or subjective assessments from individual patients or clinicians, are needed in order to assign weights appropriately. Decision analytic methods are available that can reduce these additional requirements. These methods can provide insight into the consequences of using a test, model or marker in clinical practice.
"The purpose of this article is to show that if many characteristics affect the mortality of individuals, there are intrinsic limits to the ability of demographers to answer two elementary questions:" whether the force of mortality in the last year was more or less severe in one country relative to that in a second, and whether an individual's chance of survival would have been greater in one or the other of the two countries. The author notes that the conclusions are applicable to all demographic crude rates. "The possibility of encountering Simpson's paradox suggests that since sex is only one of many possible stratifying variables that appear to affect mortality, the use of mortality tables distinguished by sex and by no other variables is, in the absence of information about the importance of other variables, demographically arbitrary."
The randomized discontinuation trial (RDT) design is an enrichment-type design that has been used in a variety of diseases to evaluate the efficacy of new treatments. The RDT design seeks to select a more homogeneous group of patients, consisting of those who are more likely to show a treatment benefit if one exists. In oncology, the RDT design has been applied to evaluate the effects of cytostatic agents, that is, drugs that act primarily by slowing tumor growth rather than shrinking tumors. In the RDT design, all patients receive treatment during an initial, open-label run-in period of duration T. Patients with objective response (substantial tumor shrinkage) remain on therapy while those with early progressive disease are removed from the trial. Patients with stable disease (SD) are then randomized to either continue active treatment or switched to placebo. The main analysis compares outcomes, for example, progression-free survival (PFS), between the two randomized arms. As a secondary objective, investigators may seek to estimate PFS for all treated patients, measured from the time of entry into the study, by combining information from the run-in and post run-in periods. For t ⩽ T, PFS is estimated by the observed proportion of patients who are progression-free among all patients enrolled. For t > T, the estimate can be expressed as , where is the estimated probability of response during the run-in period, is the estimated probability of SD, and and are the Kaplan–Meier estimates of subsequent PFS in the responders and patients with SD randomized to continue treatment, respectively. In this article, we derive the variance of , enabling the construction of confidence intervals for both S(t) and the median survival time. Simulation results indicate that the method provides accurate coverage rates. An interesting aspect of the design is that outcomes during the run-in phase have a negative multinomial distribution, something not frequently encountered in practice.
"As a cohort of people, animals, or machines ages, the individuals at highest risk tend to die or exit first. This differential selection can produce patterns of mortality for the population as a whole that are surprisingly different from the patterns for subpopulations or individuals. Naive acceptance of observed population patterns may lead to erroneous policy recommendations if an intervention depends on the response of individuals. Furthermore, because patterns at the individual level may be simpler than composite population patterns, both theoretical and empirical research may be unnecessarily complicated by failure to recognize the effects of heterogeneity."
Equivalence testing is growing in use in scientific research outside of its traditional role in the drug approval process. Largely due to its ease of use and recommendation from the United States Food and Drug Administration guidance, the most common statistical method for testing equivalence is the two one-sided tests procedure (TOST). Like classical point-null hypothesis testing, TOST is subject to multiplicity concerns as more comparisons are made. In this manuscript, a condition that bounds the family-wise error rate using TOST is given. This condition then leads to a simple solution for controlling the family-wise error rate. Specifically, we demonstrate that if all pair-wise comparisons of k independent groups are being evaluated for equivalence, then simply scaling the nominal Type I error rate down by (k - 1) is sufficient to maintain the family-wise error rate at the desired value or less. The resulting rule is much less conservative than the equally simple Bonferroni correction. An example of equivalence testing in a non drug-development setting is given.
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
We demonstrate the algebraic equivalence of two unbiased variance estimators for the sample grand mean in a random sample of subjects from an infinite population where subjects provide repeated observations following a homoscedastic random effects model.
"The World Fertility Survey carried out cross-sectional probability surveys of fertility in more than 40 developing countries between 1972 and 1984. Statistical issues in regression analysis of the data are reviewed, including treatment of interactions, the selection of regressor variables, and appropriate linear models for rate variables. Similar issues arise in many other applications of regression to observational data."
At present, there are many software procedures available enabling statisticians to fit linear mixed models (LMMs) to continuous dependent variables in clustered or longitudinal data sets. LMMs are flexible tools for analyzing relationships among variables in these types of data sets, in that a variety of covariance structures can be used depending on the subject matter under study. The explicit random effects in LMMs allow analysts to make inferences about the variability between clusters or subjects in larger hypothetical populations, and examine cluster- or subject-level variables that explain portions of this variability. These models can also be used to analyze longitudinal or clustered data sets with data that are missing at random (MAR), and can accommodate time-varying covariates in longitudinal data sets. While the software procedures currently available have many features in common, more specific analytic aspects of fitting LMMs (e.g., crossed random effects, appropriate hypothesis testing for variance components, diagnostics, incorporating sampling weights) may only be available in selected software procedures. With this article, we aim to perform a comprehensive and up-to-date comparison of the current capabilities of software procedures for fitting LMMs, and provide statisticians with a guide for selecting a software procedure appropriate for their analytic goals.
Some governments rely on centralized, official sets of population forecasts for planning capital facilities. But the nature of population forecasting, as well as the milieu of government forecasting in general, can lead to the creation of extrapolative forecasts not well suited to long-range planning. This report discusses these matters, and suggests that custom-made forecasts and the use of forecast guidelines and a review process stressing forecast assumption justification may be a more realistic basis for planning individual facilities than general-purpose, official forecasts.
Plausibility of high variability in treatment effects across individuals has been recognized as an important consideration in clinical studies. Surprisingly, little attention has been given to evaluating this variability in design of clinical trials or analyses of resulting data. High variation in a treatment's efficacy or safety across individuals (referred to herein as treatment heterogeneity) may have important consequences because the optimal treatment choice for an individual may be different from that suggested by a study of average effects. We call this an individual qualitative interaction (IQI), borrowing terminology from earlier work - referring to a qualitative interaction (QI) being present when the optimal treatment varies across a"groups" of individuals. At least three techniques have been proposed to investigate treatment heterogeneity: techniques to detect a QI, use of measures such as the density overlap of two outcome variables under different treatments, and use of cross-over designs to observe "individual effects." We elucidate underlying connections among them, their limitations and some assumptions that may be required. We do so under a potential outcomes framework that can add insights to results from usual data analyses and to study design features that improve the capability to more directly assess treatment heterogeneity.
While marginal models, random-effects models, and conditional models are routinely considered to be the three main modeling families for continuous and discrete repeated measures with linear and generalized linear mean structures, respectively, it is less common to consider non-linear models, let alone frame them within the above taxonomy. In the latter situation, indeed, when considered at all, the focus is often exclusively on random-effects models. In this paper, we consider all three families, exemplify their great flexibility and relative ease of use, and apply them to a simple but illustrative set of data on tree circumference growth of orange trees.
Matching is a powerful statistical tool in design and analysis. Conventional two-group, or bipartite, matching has been widely used in practice. However, its utility is limited to simpler designs. In contrast, nonbipartite matching is not limited to the two-group case, handling multiparty matching situations. It can be used to find the set of matches that minimize the sum of distances based on a given distance matrix. It brings greater flexibility to the matching design, such as multigroup comparisons. Thanks to improvements in computing power and freely available algorithms to solve nonbipartite problems, the cost in terms of computation time and complexity is low. This article reviews the optimal nonbipartite matching algorithm and its statistical applications, including observational studies with complex designs and an exact distribution-free test comparing two multivariate distributions. We also introduce an R package that performs optimal nonbipartite matching. We present an easily accessible web application to make nonbipartite matching freely available to general researchers.
The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.
P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels .05, .01, and .001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.
WinBUGS, a software package that uses Markov chain Monte Carlo (MCMC) methods to fit Bayesian statistical models, has facilitated Bayesian analysis in a wide variety of applications areas. This review shows the steps required to fit a Bayesian model with WinBUGS, and discusses the package's strengths and weaknesses. WinBUGS is highly recommended for both simple and complex Bayesian analyses, with the caveat that users require knowledge of both Bayesian methods and issues in MCMC.
In addition to SPSS Base software, SPSS Inc. sells a number of add-on packages, including a package called Missing Value Analysis (MVA). In version 12.0, MVAoffers four general methods for analyzing data with missing values. Unfortunately, none of these methods is wholly satisfactory when values are missing at random. The first two methods, listwise and pairwise deletion, are well known to be biased. The third method, regression imputation, uses a regression model to impute missing values, but the regression parameters are biased because they are derived using pairwise deletion. The final method, expectation maximization (EM), produces asymptotically unbiased estimates, but EM's implementation in MVA is limited to point estimates (without standard errors) of means, variances, and covariances. MVAcan also impute values using the EM algorithm, but values are imputed without residual variation, so analyses that use the imputed values can be biased.
This article recalls and comments on the varied contributions, many of enduring value, made by Samuel Wilks to the theory and practice of statistics. Multivariate analysis, order statistics, and statistical inference have especially benefited from his research. His unflagging work on behalf of the profession as teacher, writer, editor, advisor, and committee member raised the visibility of the field of statistics and increased recognition of its wide relevance.
Maurice George Kendall was an extraordinarily versatile and influential leader in statistics. He distinguished himself as a prolific and effective expositor of statistical theory, and also made important contributions to nonparametric statistics, time series, symmetric functions, and the history of statistics. An innovative organizer, he launched a Dictionary of Statistical Terms and a Bibliography of the Statistical Literature, and served as Director of the World Fertility Survey.