Project

Calibrated significance testing and parameter estimation⭐️⭐️⭐️⭐️⭐️

Goal: Calibrate null hypothesis significance testing and effect size estimation.

Updates
0 new
1
Recommendations
0 new
0
Followers
0 new
7
Reads
2 new
67

Project log

David R. Bickel
added a research item
A Bayesian model may be relied on to the extent of its adequacy by minimizing the posterior expected loss raised to the power of a discounting exponent. The resulting action is minimax under broad conditions when the sample size is held fixed and the discounting exponent is infinite. On the other hand, for any finite discounting exponent, the action is Bayes when the sample size is sufficiently large. Thus, the action is Bayes when there is enough reliable information in the posterior distribution, is minimax when the posterior distribution is completely unreliable, and is a continuous blend of the two extremes otherwise.
David R. Bickel
added a research item
Null hypothesis significance testing is generalized by controlling the Type I error rate conditional on the existence of a non-empty confidence interval. The control of that conditional error rate results in corrected p-values called c-values. A further generalization from point null hypotheses to composite hypotheses generates C-values. The framework has implications for the following areas of application. First, for bounded parameter spaces, C-values of unspeci-fied catch-all hypotheses provide conditions under which the entire statistical model would be rejected. Second, the C-value of a point estimate or confidence interval from a previous study determines whether the conclusion of the study is replicated, discredited, or neither replicated nor discredited by a new study. Third, c-values of a finite number of hypotheses, theories, or other models facilitate both incorporating previous information into frequentist hypothesis testing and the comparison of scientific models such as those of molecular evolution. In all cases, the corrections of p-values are simple enough to be performed on a handheld device. https://doi.org/10.5281/zenodo.5123388
David R. Bickel
added a research item
David R. Bickel
added a research item
Much of the blame for failed attempts to replicate reports of scientific findings has been placed on ubiquitous and persistent misinterpretations of the p value. An increasingly popular solution is to transform a two-sided p value to a lower bound on a Bayes factor. Another solution is to interpret a one-sided p value as an approximate posterior probability. Combining the two solutions results in confidence intervals that are calibrated by an estimate of the posterior probability that the null hypothesis is true. The combination also provides a point estimate that is covered by the calibrated confidence interval at every level of confidence. Finally, the combination of solutions generates a two-sided p value that is calibrated by the estimate of the posterior probability of the null hypothesis. In the special case of a 50% prior probability of the null hypothesis and a simple lower bound on the Bayes factor, the calibrated two-sided p value is about (1 – abs(2.7 p ln p)) p + 2 abs(2.7 p ln p) for small p. The calibrations of confidence intervals, point estimates, and p values are proposed in an empirical Bayes framework without requiring multiple comparisons.
David R. Bickel
added a research item
Hypothesis tests are conducted not only to determine whether a null hypothesis (H0) is true but also to determine the direction or sign of an effect. A simple estimate of the posterior probability of a sign error is PSE = (1 - PH0) p/2 + PH0, depending only on a two-sided p value and PH0, an estimate of the posterior probability of H0. A convenient option for PH0 is the posterior probability derived from estimating the Bayes factor to be its e p ln(1/p) lower bound. In that case, PSE depends only on p and an estimate of the prior probability of H0. PSE provides a continuum between significance testing and traditional Bayesian testing. The former effectively assumes the prior probability of H0 is 0, as some statisticians argue. In that case, PSE is equal to a one-sided p value. (In that sense, PSE is a calibrated p value.) In traditional Bayesian testing, on the other hand, the prior probability of H0 is at least 50%, which usually brings PSE close to PH0.
David R. Bickel
added an update
Here's a simple explanation of "Null hypothesis significance testing defended and calibrated by Bayesian model checking":
 
David R. Bickel
added a research item
In Bayesian statistics, if the distribution of the data is unknown, then each plausible distribution of the data is indexed by a parameter value, and the prior distribution of the parameter is specified. To the extent that more complicated data distributions tend to require more coincidences for their construction than simpler data distributions, default prior distributions should be transformed to assign additional prior probability or probability density to the parameter values that refer to simpler data distributions. The proposed transformation of the prior distribution relies on the entropy of each data distribution as the relevant measure of complexity. The transformation is derived from a few first principles and extended to stochastic processes.
David R. Bickel
added a research item
Much of the blame for failed attempts to replicate reports of scientific findings has been placed on ubiquitous and persistent misinterpretations of the p value. An increasingly popular solution is to transform a two-sided p value to a lower bound on a Bayes factor. Another solution is to interpret a one-sided p value as an approximate posterior probability. Combining the two solutions results in confidence intervals that are calibrated by an estimate of the posterior probability that the null hypothesis is true. The combination also provides a point estimate that is covered by the calibrated confidence interval at every level of confidence. Finally, the combination of solutions generates a two-sided p value that is calibrated by the estimate of the posterior probability of the null hypothesis. In the special case of a 50% prior probability of the null hypothesis and a simple lower bound on the Bayes factor, the calibrated two-sided p value is about (1-abs(2.7 p ln p)) p + 2 abs(2.7 p ln p) for small p. The calibrations of confidence intervals, point estimates, and p values are proposed in an empirical Bayes framework without requiring multiple comparisons.
David R. Bickel
added 3 research items
Occam's razor suggests assigning more prior probability to a hypothesis corresponding to a simpler distribution of data than to a hypothesis with a more complex distribution of data, other things equal. An idealization of Occam's razor in terms of the entropy of the data distributions tends to favor the null hypothesis over the alternative hypothesis. As a result, lower p values are needed to attain the same level of evidence. A recently debated argument for lowering the significance level to 0.005 as the p value threshold for a new discovery and to 0.05 for a suggestive result would then support further lowering them to 0.001 and 0.01, respectively.
Significance testing is often criticized because p values can be low even though posterior probabilities of the null hypothesis are not low according to some Bayesian models. Those models, however, would assign low prior probabilities to the observation that the the p value is sufficiently low. That conflict between the models and the data may indicate that the models needs revision. Indeed, if the p value is sufficiently small while the posterior probability according to a model is insufficiently small, then the model will fail a model check. That result leads to a way to calibrate a p value by transforming it into an upper bound on the posterior probability of the null hypothesis (conditional on rejection) for any model that would pass the check. The calibration may be calculated from a prior probability of the null hypothesis and the stringency of the check without more detailed modeling. An upper bound, as opposed to a lower bound, can justify concluding that the null hypothesis has a low posterior probability.
Concepts from multiple testing can improve tests of single hypotheses. The proposed definition of the calibrated p value is an estimate of the local false sign rate, the posterior probability that the direction of the estimated effect is incorrect. Interpreting one-sided p values as estimates of conditional posterior probabilities, that calibrated p value is (1 - LFDR) p/2 + LFDR, where p is a two-sided p value and LFDR is an estimate of the local false discovery rate, the posterior probability that a point null hypothesis is true given p. A simple option for LFDR is the posterior probability derived from estimating the Bayes factor to be its e p ln(1/p) lower bound. The calibration provides a continuum between significance testing and traditional Bayesian testing. The former effectively assumes the prior probability of the null hypothesis is 0, as some statisticians argue is the case. Then the calibrated p value is equal to p/2, a one-sided p value, since LFDR = 0. In traditional Bayesian testing, the prior probability of the null hypothesis is at least 50%, which usually results in LFDR >> p. At that end of the continuum, the calibrated p value is close to LFDR.
David R. Bickel
added a project goal
Calibrate null hypothesis significance testing and effect size estimation.