Article

Interval estimation, point estimation, and null hypothesis significance testing calibrated by an estimated posterior probability of the null hypothesis

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Much of the blame for failed attempts to replicate reports of scientific findings has been placed on ubiquitous and persistent misinterpretations of the p value. An increasingly popular solution is to transform a two-sided p value to a lower bound on a Bayes factor. Another solution is to interpret a one-sided p value as an approximate posterior probability. Combining the two solutions results in confidence intervals that are calibrated by an estimate of the posterior probability that the null hypothesis is true. The combination also provides a point estimate that is covered by the calibrated confidence interval at every level of confidence. Finally, the combination of solutions generates a two-sided p value that is calibrated by the estimate of the posterior probability of the null hypothesis. In the special case of a 50% prior probability of the null hypothesis and a simple lower bound on the Bayes factor, the calibrated two-sided p value is about (1 – abs(2.7 p ln p)) p + 2 abs(2.7 p ln p) for small p. The calibrations of confidence intervals, point estimates, and p values are proposed in an empirical Bayes framework without requiring multiple comparisons.

No full-text available

... Theorem 1. For any utility function satisfying equation (16), the Bacon actions from the prior probableness distribution and the likelihood function on that are specified by equations (10) and (11) for all φ ∈ are identical to the Bayes actions from p (•), where is assumed to be a finite set such that p (φ) > 0 and p (φ| y) > 0 for all φ ∈ . Those actions are given by a (π | y) = a (p | y) = arg sup φ∈ p (φ) f (y |φ) u (φ) = arg sup φ∈ p (φ| y) u (φ) (17) in the notation of Sections 2.2-2.3, ...
... For any utility function satisfying equation (18), the Bacon actions from the prior probableness distribution and the likelihood function on that are specified by equations (10) and (11) for all φ ∈ are identical to the limiting Bayes actions from p (•), which is assumed to satisfy p (φ) > 0 and p (φ| y) > 0 for all φ ∈ . Those actions are given by equation (17) with a 0 (p | y) in place of a (p | y). ...
... Suppose P y , a p-value based on the sample y, is available but f (y | H 1 ) / f (y | H 0 ) is not. Then f (y | H 1 ) / f (y | H 0 ) can be estimated by an upper bound that depends on P y when P y is sufficiently small [10,9]. Many options for such upper bounds are reviewed by Held and Ott [29]. ...
Article
A Bayesian model has two parts. The first part is a family of sampling distributions that could have generated the data. The second part of a Bayesian model is a prior distribution over the sampling distributions. Both the diagnostics used to check the model and the process of updating a failed model are widely thought to violate the standard foundations of Bayesianism. That is largely because models are checked before specifying the space of all candidate replacement models, which textbook presentations of Bayesian model averaging would require. However, that is not required under a broad class of utility functions that apply when approximate model truth is an important consideration, perhaps among other important considerations. From that class, a simple criterion for model checking emerges and suggests a coherent approach to updating Bayesian models found inadequate. The criterion only requires the specification of the prior distribution up to ratios of prior densities of the models considered until the time of the check. That criterion, while justified by Bayesian decision theory, may also be derived under possibility theory from a decision-theoretic framework that generalizes the likelihood interpretation of possibility functions.
... These tests included, family-wise error control using the Holm's correction (Holm), False Discover Rate control using the Benjamini-Hochberg correction (BH), permutation test using the maximum statistic across the tests (Perm_max) as well as permutation test for individual test (Perm). We employed two p-value calibration methods, a first proposed by Selke and colleagues [Sellke et al., 2001] (pcalSBB, pcal function from the pcal R package) and a second proposed by Bickel [Bickel, 2021b] (pcalBickel, implemented as pcalBickel = ...
Preprint
Inferring linear relationships lies at the heart of many empirical investigations. A measure of linear dependence should correctly evaluate the strength of the relationship as well as qualify whether it is meaningful for the population. Pearson's correlation coefficient (PCC), the \textit{de-facto} measure for bivariate relationships, is known to lack in both regards. The estimated strength $r$ maybe wrong due to limited sample size, and nonnormality of data. In the context of statistical significance testing, erroneous interpretation of a $p$-value as posterior probability leads to Type I errors -- a general issue with significance testing that extends to PCC. Such errors are exacerbated when testing multiple hypotheses simultaneously. To tackle these issues, we propose a machine-learning-based predictive data calibration method which essentially conditions the data samples on the expected linear relationship. Calculating PCC using calibrated data yields a calibrated $p$-value that can be interpreted as posterior probability together with a calibrated $r$ estimate, a desired outcome not provided by other methods. Furthermore, the ensuing independent interpretation of each test might eliminate the need for multiple testing correction. We provide empirical evidence favouring the proposed method using several simulations and application to real-world data.
Article
Full-text available
An increasingly popular approach to statistical inference is to focus on the estimation of effect size. Yet this approach is implicitly based on the assumption that there is an effect while ignoring the null hypothesis that the effect is absent. We demonstrate how this common null-hypothesis neglect may result in effect size estimates that are overly optimistic. As an alternative to the current approach, a spike-and-slab model explicitly incorporates the plausibility of the null hypothesis into the estimation process. We illustrate the implications of this approach and provide an empirical example.
Article
Full-text available
Article
Full-text available
The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhibit extreme variation across samples (and thus are “unreliable”), confound effect size with sample size, are sensitive to sample size, and depend on investigator sampling intentions. These properties are often criticized from a likelihood or Bayesian framework, yet they are exactly the properties P-values should exhibit when they are constructed and interpreted correctly within their originating framework. Other common criticisms are that P-values force users to focus on irrelevant hypotheses and overstate evidence against those hypotheses. These problems are not however properties of P-values but are faults of researchers who focus on null hypotheses and overstate evidence based on misperceptions that p = 0.05 represents enough evidence to reject hypotheses. Those problems are easily seen without use of Bayesian concepts by translating the observed P-value p into the Shannon information (S-value or surprisal) –log2(p).
Article
Full-text available
In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration--often scant--given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.
Article
Full-text available
We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.
Article
Full-text available
Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so-and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.
Article
Full-text available
Article
Full-text available
By representing fair betting odds according to one or more pairs of confidence set estimators, dual parameter distributions called confidence posteriors secure the coherence of actions without any prior distribution. This theory reduces to the maximization of expected utility when the pair of posteriors is induced by an exact or approximate confidence set estimator or when a reduction rule is applied to the pair. Unlike the p-value, the confidence posterior probability of an interval hypothesis is suitable as an estimator of the indicator of hypothesis truth since it converges to 1 if the hypothesis is true or to 0 otherwise.
Article
Hypothesis tests are conducted not only to determine whether a null hypothesis (H0) is true but also to determine the direction or sign of an effect. A simple estimate of the posterior probability of a sign error is PSE = (1 - PH0) p/2 + PH0, depending only on a two-sided p value and PH0, an estimate of the posterior probability of H0. A convenient option for PH0 is the posterior probability derived from estimating the Bayes factor to be its e p ln(1/p) lower bound. In that case, PSE depends only on p and an estimate of the prior probability of H0. PSE provides a continuum between significance testing and traditional Bayesian testing. The former effectively assumes the prior probability of H0 is 0, as some statisticians argue. In that case, PSE is equal to a one-sided p value. (In that sense, PSE is a calibrated p value.) In traditional Bayesian testing, on the other hand, the prior probability of H0 is at least 50%, which usually brings PSE close to PH0.
Article
In Bayesian statistics, if the distribution of the data is unknown, then each plausible distribution of the data is indexed by a parameter value, and the prior distribution of the parameter is specified. To the extent that more complicated data distributions tend to require more coincidences for their construction than simpler data distributions, default prior distributions should be transformed to assign additional prior probability or probability density to the parameter values that refer to simpler data distributions. The proposed transformation of the prior distribution relies on the entropy of each data distribution as the relevant measure of complexity. The transformation is derived from a few first principles and extended to stochastic processes.
Article
The widely claimed replicability crisis in science may lead to revised standards of significance. The customary frequentist confidence intervals, calibrated through hypothetical repetitions of the experiment that is supposed to have produced the data at hand, rely on a feeble concept of replicability. In particular, contradictory conclusions may be reached when a substantial enlargement of the study is undertaken. To redefine statistical confidence in such a way that inferential conclusions are non‐contradictory, with large enough probability, under enlargements of the sample, we give a new reading of a proposal dating back to the 60s, namely, Robbins' confidence sequences. Directly bounding the probability of reaching, in the future, conclusions that contradict the current ones, Robbins' confidence sequences ensure a clear‐cut form of replicability when inference is performed on accumulating data. Their main frequentist property is easy to understand and to prove. We show that Robbins' confidence sequences may be justified under various views of inference: they are likelihood‐based, can incorporate prior information and obey the strong likelihood principle. They are easy to compute, even when inference is on a parameter of interest, especially using a closed form approximation from normal asymptotic theory.
Book
Statisticians have met the need to test hundreds or thousands of genomics hypotheses simultaneously with novel empirical Bayes methods that combine advantages of traditional Bayesian and frequentist statistics. Techniques for estimating the local false discovery rate assign probabilities of differential gene expression, genetic association, etc. without requiring subjective prior distributions. This book brings these methods to scientists while keeping the mathematics at an elementary level. Readers will learn the fundamental concepts behind local false discovery rates, preparing them to analyze their own genomics data and to critically evaluate published genomics research. Key Features: * dice games and exercises, including one using interactive software, for teaching the concepts in the classroom * examples focusing on gene expression and on genetic association data and briefly covering metabolomics data and proteomics data * gradual introduction to the mathematical equations needed * how to choose between different methods of multiple hypothesis testing * how to convert the output of genomics hypothesis testing software to estimates of local false discovery rates * guidance through the minefield of current criticisms of p values * material on non-Bayesian prior p values and posterior p values not previously published More: https://davidbickel.com/genomics/
Article
Confidence sets, p values, maximum likelihood estimates, and other results of non-Bayesian statistical methods may be adjusted to favor sampling distributions that are simple compared to others in the parametric family. The adjustments are derived from a prior likelihood function previously used to adjust posterior distributions.
Article
Occam's razor suggests assigning more prior probability to a hypothesis corresponding to a simpler distribution of data than to a hypothesis with a more complex distribution of data, other things equal. An idealization of Occam's razor in terms of the entropy of the data distributions tends to favor the null hypothesis over the alternative hypothesis. As a result, lower p values are needed to attain the same level of evidence. A recently debated argument for lowering the significance level to 0.005 as the p value threshold for a new discovery and to 0.05 for a suggestive result would then support further lowering them to 0.001 and 0.01, respectively.
Article
Frequentist methods, without the coherence guarantees of fully Bayesian methods, are known to yield self-contradictory inferences in certain settings. The framework introduced in this paper provides a simple adjustment to p values and confidence sets to ensure the mutual consistency of all inferences without sacrificing frequentist validity. Based on a definition of the compatibility of a composite hypothesis with the observed data given any parameter restriction and on the requirement of self-consistency, the adjustment leads to the possibility and necessity measures of possibility theory rather than to the posterior probability distributions of Bayesian and fiducial inference.
Book
A Sound Basis for the Theory of Statistical Inference Measuring Statistical Evidence Using Relative Belief provides an overview of recent work on developing a theory of statistical inference based on measuring statistical evidence. It shows that being explicit about how to measure statistical evidence allows you to answer the basic question of when a statistical analysis is correct. The book attempts to establish a gold standard for how a statistical analysis should proceed. It first introduces basic features of the overall approach, such as the roles of subjectivity, objectivity, infinity, and utility in statistical analyses. It next discusses the meaning of probability and the various positions taken on probability. The author then focuses on the definition of statistical evidence and how it should be measured. He presents a method for measuring statistical evidence and develops a theory of inference based on this method. He also discusses how statisticians should choose the ingredients for a statistical problem and how these choices are to be checked for their relevance in an application.
Article
When competing interests seek to influence a decision maker, a scientist must report a posterior probability or a Bayes factor among those consistent with the evidence. The disinterested scientist seeks to report the value that is least controversial in the sense that it is best protected from being discredited by one of the competing interests. If the loss function of the decision maker is not known but can be assumed to satisfy two invariance conditions, then the least controversial value is a weighted generalized mean of the upper and lower bounds of the interval.
Article
The p-value quantifies the discrepancy between the data and a null hypothesis of interest, usually the assumption of no difference or no effect. A Bayesian approach allows the calibration of p-values by transforming them to direct measures of the evidence against the null hypothesis, so-called Bayes factors. We review the available literature in this area and consider two-sided significance tests for a point null hypothesis in more detail. We distinguish simple from local alternative hypotheses and contrast traditional Bayes factors based on the data with Bayes factors based on p-values or test statistics. A well-known finding is that the minimum Bayes factor, the smallest possible Bayes factor within a certain class of alternative hypotheses, provides less evidence against the null hypothesis than the corresponding p-value might suggest. It is less known that the relationship between p-values and minimum Bayes factors also depends on the sample size and on the dimension of the parameter of interest. We illustrate the transformation of p-values to minimum Bayes factors with two examples from clinical research.
Article
Minimum Bayes factors are commonly used to transform two-sided p-values to lower bounds on the posterior probability of the null hypothesis. Several proposals exist in the literature, but none of them depends on the sample size. However, the evidence of a p-value against a point null hypothesis is known to depend on the sample size. In this article, we consider p-values in the linear model and propose new minimum Bayes factors that depend on sample size and converge to existing bounds as the sample size goes to infinity. It turns out that the maximal evidence of an exact two-sided p-value increases with decreasing sample size. The effect of adjusting minimum Bayes factors for sample size is shown in two applications.
Article
Empirical Bayes estimates of the local false discovery rate can reflect uncertainty about the estimated prior by supplementing their Bayesian posterior probabilities with confidence levels as posterior probabilities. This use of coherent fiducial inference with hierarchical models generates set estimators that propagate uncertainty to varying degrees. Some of the set estimates approach estimates from plug-in empirical Bayes methods for high numbers of comparisons and can come close to the usual confidence sets given a sufficiently low number of comparisons.
Book
This highly acclaimed text, now available in paperback, provides a thorough account of key concepts and theoretical results, with particular emphasis on viewing statistical inference as a special case of decision theory. Information-theoretic concepts play a central role in the development of the theory, which provides, in particular, a detailed discussion of the problem of specification of so-called prior ignorance . The work is written from the authors s committed Bayesian perspective, but an overview of non-Bayesian theories is also provided, and each chapter contains a wide-ranging critical re-examination of controversial issues. The level of mathematics used is such that most material is accessible to readers with knowledge of advanced calculus. In particular, no knowledge of abstract measure theory is assumed, and the emphasis throughout is on statistical concepts rather than rigorous mathematics. The book will be an ideal source for all students and researchers in statistics, mathematics, decision analysis, economic and business studies, and all branches of science and engineering, who wish to further their understanding of Bayesian statistics.
Article
Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here, I present an alternative approach to teaching this course, built around p-values, emphasizing provably valid inference for all sample sizes. Details about computation and marginalization are also provided, with several illustrative examples, along with a course outline. Supplementary materials for this article are available online.
Article
This paper develops new methodology, together with related theories, for combining information from independent studies through confidence distributions. A formal definition of a confidence distribution and its asymptotic counterpart (i.e., asymptotic confidence distribution) are given and illustrated in the context of combining information. Two general combination methods are developed: the first along the lines of combining p-values, with some notable differences in regard to optimality of Bahadur type efficiency;, the second by multiplying and normalizing confidence densities. The latter approach is inspired by the common approach of multiplying likelihood functions for combining parametric information. The paper also develops adaptive combining methods, with supporting asymptotic theory which should be of practical interest. The key point of the adaptive development is that the methods attempt to combine only the correct information, downweighting or excluding studies containing little or wrong information about the true parameter of interest. The combination methodologies are illustrated in simulated and real data examples with a variety of applications.
Article
The complete final product of Bayesian inference is the posterior distribution of the quantity of interest. Important inference summaries include point estimation, region estimation and precise hypotheses testing. Those summaries may appropriately be described as the solution to specific decision problems which depend on the particular loss function chosen. The use of a continuous loss function leads to an integrated set of solutions where the same prior distribution may be used throughout. Objective Bayesian methods are those which use a prior distribution which only depends on the assumed model and the quantity of interest. As a consequence, objective Bayesian methods produce results which only depend on the assumed model and the data obtained. The combined use of intrinsic discrepancy, an invariant information-based loss function, and appropriately defined reference priors, provides an integrated objective Bayesian solution to both estimation and hypothesis testing problems. The ideas are illustrated with a large collection of non-trivial examples.
Article
Foreword Preface Part I. Principles and Elementary Applications: 1. Plausible reasoning 2. The quantitative rules 3. Elementary sampling theory 4. Elementary hypothesis testing 5. Queer uses for probability theory 6. Elementary parameter estimation 7. The central, Gaussian or normal distribution 8. Sufficiency, ancillarity, and all that 9. Repetitive experiments, probability and frequency 10. Physics of 'random experiments' Part II. Advanced Applications: 11. Discrete prior probabilities, the entropy principle 12. Ignorance priors and transformation groups 13. Decision theory: historical background 14. Simple applications of decision theory 15. Paradoxes of probability theory 16. Orthodox methods: historical background 17. Principles and pathology of orthodox statistics 18. The Ap distribution and rule of succession 19. Physical measurements 20. Model comparison 21. Outliers and robustness 22. Introduction to communication theory References Appendix A. Other approaches to probability theory Appendix B. Mathematical formalities and style Appendix C. Convolutions and cumulants.
Article
The controversy concerning the fundamental principles of statistics still remains unresolved. It is suggested that one key to resolving the conflict lies in recognizing that inferential probability derived from observational data is inherently noncoherent, in the sense that their inferential implications cannot be represented by a single probability distribution on the parameter space (except in the Objective Bayesian case). More precisely, for a parameter space R1, the class of all functions of the parameter comprise equivalence classes of invertibly related functions, and to each such class a logically distinct inferential probability distribution pertains. (There is an additional cross‐coherence requirement for simultaneous inference.) The non‐coherence of these distributions flows from the nonequivalence of the relevant components of the data for each. Noncoherence is mathematically inherent in confidence and fiducial theory, and provides a basis for reconciling the Fisherian and Neyman–Pearsonian viewpoints. A unified theory of confidence‐based inferential probability is presented, and the fundamental incompatibility of this with Subjective Bayesian theory is discussed.
Article
x is a one‐dimensional random variable whose distribution depends on a single parameter θ. It is the purpose of this note to establish two results: (i) The necessary and sufficient condition for the fiducial distribution of θ, given x, to be a Bayes' distribution is that there exist transformations of x to u, and of θ to τ, such that τ is a location parameter for u. The condition will be referred to as (A). This extends some results of Grundy's (1956). (ii) If, for a random sample of any size from the distribution for x, there exists a single sufficient statistic for θ then the fiducial argument is inconsistent unless condition (A) obtains: And when it does, the fiducial argument is equivalent to a Bayesian argument with uniform prior distribution for τ. The note concludes with an investigation of (A) in the case of the exponential family.
Article
Theories of Statistical Inference Example Statistical models The likelihood function Theories Nonmodel-based repeated sampling Conclusion The Integrated Bayes/Likelihood Approach Introduction Probability Prior ignorance The importance of parametrization The simple/simple hypothesis testing problem The simple/composite hypothesis testing problem Posterior likelihood approach Bayes factors The comparison of unrelated models Example-GHQ score and psychiatric diagnosis t-Tests and Normal Variance Tests One-sample t-test Two samples: equal variances The two-sample test Two samples: different variances The normal model variance Variance heterogeneity test Unified Analysis of Finite Populations Sample selection indicators The Bayesian bootstrap Sampling without replacement Regression models More general regression models The multinomial model for multiple populations Complex sample designs A complex example Discussion Regression and Analysis of Variance Multiple regression Nonnested models Binomial and Multinomial Data Single binomial samples Single multinomial samples Two-way tables for correlated proportions Multiple binomial samples Two-way tables for categorical responses-no fixed margins Two-way tables for categorical responses-one fixed margin Multinomial "nonparametric" analysis Goodness of Fit and Model Diagnostics Frequentist model diagnostics Bayesian model diagnostics The posterior predictive distribution Multinomial deviance computation Model comparison through posterior deviances Examples Simulation study Discussion Complex Models The data augmentation algorithm Two-level variance component models Test for a zero variance component Finite mixtures References Author Index Subject Index
Article
Bayesian epistemology aims to answer the following question: How strongly should an agent believe the various propositions expressible in her language? Subjective Bayesians hold that.it is largely (though not entirely) up to the agent as to which degrees of belief to adopt. Objective Bayesians, on the other hand, maintain that appropriate degrees of belief are largely (though not entirely) determined by the agent's evidence. This book states and defends a version of objective Bayesian epistemology. According to this version, objective Bayesianism is characterized by three norms: (i) Probability: degrees of belief should be probabilities; (ii) Calibration: they should be calibrated with evidence; and (iii) Equivocation: they should otherwise equivocate between basic outcomes. Objective Bayesianism has been challenged on a number of different fronts: for example, it has been accused of being poorly motivated, of failing to handle qualitative evidence, of yielding counter-intuitive degrees of belief after updating, of suffering from a failure to learn from experience, of being computationally intractable, of being susceptible to paradox, of being language dependent, and of not being objective enough. The book argues that these criticisms can be met and that objective Bayesianism is a promising theory with an exciting agenda for further research.
Article
In this book, an integrated introduction to statistical inference is provided from a frequentist likelihood-based viewpoint. Classical results are presented together with recent developments, largely built upon ideas due to R.A. Fisher. The term “neo-Fisherian” highlights this.After a unified review of background material (statistical models, likelihood, data and model reduction, first-order asymptotics) and inference in the presence of nuisance parameters (including pseudo-likelihoods), a self-contained introduction is given to exponential families, exponential dispersion models, generalized linear models, and group families. Finally, basic results of higher-order asymptotics are introduced (index notation, asymptotic expansions for statistics and distributions, and major applications to likelihood inference).The emphasis is more on general concepts and methods than on regularity conditions. Many examples are given for specific statistical models. Each chapter is supplemented with problems and bibliographic notes. This volume can serve as a textbook in intermediate-level undergraduate and postgraduate courses in statistical inference.
Article
We live in a new age for statistical inference, where modern scientific technology such as microarrays and fMRI machines routinely produce thousands and sometimes millions of parallel data sets, each with its own estimation or testing problem. Doing thousands of problems at once is more than repeated application of classical methods. Taking an empirical Bayes approach, Bradley Efron, inventor of the bootstrap, shows how information accrues across problems in a way that combines Bayesian and frequentist ideas. Estimation, testing, and prediction blend in this framework, producing opportunities for new methodologies of increased power. New difficulties also arise, easily leading to flawed inferences. This book takes a careful look at both the promise and pitfalls of large-scale statistical inference, with particular attention to false discovery rates, the most successful of the new statistical techniques. Emphasis is on the inferential ideas underlying technical developments, illustrated using a large number of real examples.
Article
A review is provided of the concept confidence distributions. Material covered include: fundamentals, extensions, applications of confidence distributions and available computer software. We expect that this review could serve as a source of reference and encourage further research with respect to confidence distributions.
Article
Il est courant, en inférence fréquentielle, d'utiliser un point unique (une estimation ponctuelle) ou un intervalle (intervalle de confiance) dans le but d'estimer un paramètre d'intér^t. Une question très simple se pose: peut-on également utiliser, dans le même but, et dans la même optique fréquentielle, à la façon dont les Bayésiens utilisent une loi a posteriori, une distribution de probabilité? La réponse est affirmative, et les distributions de confiance apparaissent comme un choix naturel dans ce contexte. Le concept de distribution de confiance a une longue histoire, longtemps associée, à tort, aux théories d'inférence fiducielle, ce qui a compromis son développement dans l'optique fréquentielle. Les distributions de confiance ont récemment attiré un regain d'intérêt, et plusieurs résultats ont mis en évidence leur potentiel considérable en tant qu'outil inférentiel. Cet article présente une définition moderne du concept, et examine les ses évolutions récentes. Il aborde les méthodes d'inférence, les problèmes d'optimalité, et les applications. A la lumière de ces nouveaux développements, le concept de distribution de confiance englobe et unifie un large éventail de cas particuliers, depuis les exemples paramétriques réguliers (distributions fiducielles), les lois de rééchantillonnage, les p-valeurs et les fonctions de vraisemblance normalisées jusqu'aux a priori et posteriori bayésiens. La discussion est entièrement menée d'un point de vue fréquentiel, et met l'accent sur les applications dans lesquelles les solutions fréquentielles sont inexistantes ou d'une application difficile. Bien que nous attirions également l'attention sur les similitudes et les différences que présentent les approches fréquentielle, fiducielle, et Bayésienne, notre intention n'est pas de rouvrir un débat philosophique qui dure depuis près de deux cents ans. Nous espérons bien au contraire contribuer à combler le fossé qui existe entre les différents points de vue.
Article
For the one-sided hypothesis testing problem it is shown that it is possible to reconcile Bayesian evidence against H0, expressed in terms of the posterior probability that H0 is true, with frequentist evidence against H0, expressed in terms of the p value. In fact, for many classes of prior distributions it is shown that the infimum of the Bayesian posterior probability of H0 is equal to the p value; in other cases the infimum is less than the p value. The results are in contrast to recent work of Berger and Sellke (1987) in the two-sided (point null) case, where it was found that the p value is much smaller than the Bayesian infimum. Some comments on the point null problem are also given.
Article
We describe a range of routine statistical problems in which marginal posterior distributions derived from improper prior measures are found to have an unBayesian property—one that could not occur if proper prior measures were employed. This paradoxical possibility is shown to have several facets that can be successfully analysed in the framework of a general group structure. The results cast a shadow on the uncritical use of improper prior measures. A separate examination of a particular application of Fraser's structural theory shows that it is intrinsically paradoxical under marginalization.
Article
Let X be a random variable which for simplicity we shall assume to have discrete values x and which has a probability distribution depending in a known way on an unknown real parameter A, $$p\left( {x|\lambda } \right) = Pr[X = x|\Lambda = \lambda ],$$ (1) A itself being a random variable with a priori distribution function $$G\left( \lambda \right) = \operatorname{P} r[\Lambda {\text{ }}\underline \leqslant {\text{ }}\lambda ].$$ (2)
Article
Subjectivism has become the dominant philosophical foundation for Bayesian inference. Yet in practice, most Bayesian analyses are performed with so-called “noninformative” priors, that is, priors constructed by some formal rule. We review the plethora of techniques for constructing such priors and discuss some of the practical and philosophical issues that arise when they are used. We give special emphasis to Jeffreys's rules and discuss the evolution of his viewpoint about the interpretation of priors, away from unique representation of ignorance toward the notion that they should be chosen by convention. We conclude that the problems raised by the research on priors chosen by formal rules are serious and may not be dismissed lightly: When sample sizes are small (relative to the number of parameters being estimated), it is dangerous to put faith in any “default” solution; but when asymptotics take over, Jeffreys's rules and their variants remain reasonable choices. We also provide an annotated bibliography.
Article
This is the first volume of a two-volume work on Probability and Induction. Because the writer holds that probability logic is identical with inductive logic, this work is devoted to philosophical problems concerning the nature of probability and inductive reasoning. The author iejects a statistical frequency basis for probability in favor of a logical relation between two statements or propositions. Probability "is the degree of confirmation of a hypothesis (or conclusion) on the basis of some given evidence (or premises)." Furthermore, all principles and theorems of inductive logic are analytic, and the entire system is to be constructed by means of symbolic logic and semantic methods. This means that the author confines himself to the formalistic procedures of word and symbol systems. The resulting sentence or language structures are presumed to separate off logic from all subjectivist or psychological elements. Despite the abstractionism, the claim is made that if an inductive probability system of logic can be constructed it will have its practical application in mathematical statistics, and in various sciences. 16-page bibliography. (PsycINFO Database Record (c) 2012 APA, all rights reserved)