Technical ReportPDF Available

为什么基于 t-分布计算小样本测量不确定度是一个谬误? -3 个悖论及其消解

Authors:

Abstract

学生氏(William Sealy Gosset)在1908年发明了统计学中著名的学生氏t-分布。100多年来,t-分布一直被学术界公认为小样本测量不确定度分析的理论基础。然而笔者于2006年在应用t-分布对ADCP(声学多普勒流速剖面仪)河流流量测验进行不确定度分析时发现了一个令人困惑的悖论,之后笔者在文献中又发现了与t-分布应用有关的另外两个悖论。这3个悖论使笔者怀疑基于t-分布计算小样本测量不确定度可能是一个谬误:即t-分布在数学上是正确的,但是基于t-分布的统计推断却可能是错误的(简称为t-分布谬误)。笔者于2015年发现产生t-分布谬误的根源是“t-转换扭曲”(t-transformation distortion)和在t-转换扭曲了的样本空间进行统计推断,这在方法论上是错误的。笔者根据经典概率论中的误差理论和点估计理论,提出了基于概率误差限的不确定度定义,并且提出了测量误差与不确定度的统一理论。按照不确定度新的定义,建议在实际应用中采用无偏估计法计算A类不确定度。3个悖论随着无偏估计法的应用迎刃而解。
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Miller and Ulrich (in press) critique our claim (Hoekstra, Morey, Rouder, & Wagenmakers, 2014), based on a survey given to researchers and students, of widespread misunderstanding of confidence intervals (CIs). They suggest that survey respondents may have interpreted the statements in the survey that we deemed incorrect in an idiosyncratic, but correct, way, thus calling into question the conclusion that the results indicate that respondents could not properly interpret CIs. Their alternative interpretations, while correct, cannot be deemed acceptable renderings of the questions in the survey due to the well-known reference class problem. Moreover , there is no support in the data for their contention that participants may have had their alternative interpretations in mind. Finally, their alternative interpretations are merely trivial restatements of the definition of a confidence interval, and have no implications for the location of a parameter.
Article
Full-text available
Some inconsistencies of the current version of the 'Guide to the expression of uncertainty in measurement' are discussed and suggestions to make this document consistent are commented. The paper is written taking into account the terminology of the third version of the 'International vocabulary of metrology'.
Article
Full-text available
Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead. Electronic supplementary material The online version of this article (doi:10.3758/s13423-015-0947-8) contains supplementary material, which is available to authorized users.
Article
Full-text available
Controversies are common in medicine. Some arise when the conclusions of research publications directly contradict each other, creating uncertainty for frontline clinicians. In this paper, we review how researchers can look at very similar data yet have completely different conclusions based purely on an over-reliance of statistical significance and an unclear understanding of confidence intervals. The dogmatic adherence to statistical significant thresholds can lead authors to write dichotomized absolute conclusions while ignoring the broader interpretations of very consistent findings. We describe three examples of controversy around the potential benefit of a medication, a comparison between new medications, and a medication with a potential harm. The examples include the highest levels of evidence, both meta-analyses and randomized controlled trials. We will show how in each case the confidence intervals and point estimates were very similar. The only identifiable differences to account for the contrasting conclusions arise from the serendipitous finding of confidence intervals that either marginally cross or just fail to cross the line of statistical significance. These opposing conclusions are false disagreements that create unnecessary clinical uncertainty. We provide helpful recommendations in approaching conflicting conclusions when they are associated with remarkably similar results.
Article
The Supplement 1 (S1) to the GUM (Guide to the Expression of Uncertainty in Measurement) describes a general numerical approach, known as Monte Carlo method (MCM) for estimating measurement uncertainty, based on the principle of propagation of distributions. The MCM applies to a measurement model that has a single output quantity where the input quantities are characterized by specified PDFs (probability density function). When an input quantity has Type A uncertainty that is estimated with a limited number of observations (a sample), the GUM-S1 recommends using the scaled and shifted t distribution conditioned on the sample mean and sample standard deviation as the PDF of the input quantity. This paper reveals that the scaled and shifted t-distribution is inappropriate for MCM because of the so-called t transformation distortion. This paper proposes an alternative PDF: a normal distribution conditioned on the sample mean and sample standard deviation. A calculation example is presented to demonstrate the inappropriateness of the scaled and shifted t distribution and appropriateness of the proposed PDF for MCM. Two real-world examples are presented to compare the MCM based on the scaled and shifted t-distribution and the MCM based on the proposed PDF with several existing analytical methods.
Article
The Guide to the Expression of Uncertainty in Measurement (GUM) includes formulas that produce an estimate of a scalar output quantity that is a function of several input quantities, and an approximate evaluation of the associated standard uncertainty. This contribution presents approximate, Bayesian counterparts of those formulas for the case where the output quantity is a parameter of the joint probability distribution of the input quantities, also taking into account any information about the value of the output quantity available prior to measurement expressed in the form of a probability distribution on the set of possible values for the measurand. The approximate Bayesian estimates and uncertainty evaluations that we present have a long history and illustrious pedigree, and provide sufficiently accurate approximations in many applications, yet are very easy to implement in practice. Differently from exact Bayesian estimates, which involve either (analytical or numerical) integrations, or Markov Chain Monte Carlo sampling, the approximations that we describe involve only numerical optimization and simple algebra. Therefore, they make Bayesian methods widely accessible to metrologists. We illustrate the application of the proposed techniques in several instances of measurement: isotopic ratio of silver in a commercial silver nitrate; odds of cryptosporidiosis in AIDS patients; height of a manometer column; mass fraction of chromium in a reference material; and potential-difference in a Zener voltage standard.
Article
The Guide to the Expression of Uncertainty in Measurement (GUM) of 1995 is known to be flawed in aspects of it statistical theory and is now undergoing revision. This article considers three controversies faced in the development of such a document, namely (i) the acceptance of the existence of ‘true values’, (ii) the association of variances with systematic influences and (ii) the representation of fixed but unknown quantities by probability distributions, which is a step that separates Bayesian statistics from frequentist, i.e. classical, statistics. Particular attention is paid to Recommendation INC-1 of 1980, which is said to be the foundation of the GUM. This advocates that variances be associated with systematic effects in a manner according with frequentist statistics. However, the revision of the GUM is being carried out along Bayesian lines. A number of difficulties are identified with this approach.
Article
The 'Guide to the Expression of Uncertainty in Measurement' (GUM) has been in use for more than 20 years, serving its purposes worldwide at all levels of metrology, from scientific to industrial and commercial applications. However, the GUM presents some inconsistencies, both internally and with respect to its two later Supplements. For this reason, the Joint Committee for Guides in Metrology, which is responsible for these documents, has decided that a major revision of the GUM is needed. This will be done by following the principles of Bayesian statistics, a concise summary of which is presented in this article. Those principles should be useful in physics and engineering laboratory courses that teach the fundamentals of data analysis and measurement uncertainty evaluation.
Article
Since the 1980s, we have seen a gradual shift in the uncertainty analyses recommended in the metrological literature, principally Metrologia, and in the BIPM's guidance documents; the Guide to the Expression of Uncertainty in Measurement (GUM) and its two supplements. The shift has seen the BIPM's recommendations change from a purely classical or frequentist analysis to a purely Bayesian analysis. Despite this drift, most metrologists continue to use the predominantly frequentist approach of the GUM and wonder what the differences are, why there are such bitter disputes about the two approaches, and should I change? The primary purpose of this note is to inform metrologists of the differences between the frequentist and Bayesian approaches and the consequences of those differences. It is often claimed that a Bayesian approach is philosophically consistent and is able to tackle problems beyond the reach of classical statistics. However, while the philosophical consistency of the of Bayesian analyses may be more aesthetically pleasing, the value to science of any statistical analysis is in the long-term success rates and on this point, classical methods perform well and Bayesian analyses can perform poorly. Thus an important secondary purpose of this note is to highlight some of the weaknesses of the Bayesian approach. We argue that moving away from well-established, easily-taught frequentist methods that perform well, to computationally expensive and numerically inferior Bayesian analyses recommended by the GUM supplements is ill-advised. Further, we recommend that whatever methods are adopted, the metrology community should insist on proven long-term numerical performance.