## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

To read the full-text of this research,

you can request a copy directly from the authors.

... However, if the effect of the intervention varies across participants, then the variation of the outcome in the intervention arm will be different to that in the control. 9,13,14 A difference in variance would then require further study to identify the effect modifiers, using individual participant data. ...

Background:
Randomized controlled trials (RCTs) with continuous outcomes usually only examine mean differences in response between trial arms. If the intervention has heterogeneous effects, then outcome variances will also differ between arms. Power of an individual trial to assess heterogeneity is lower than the power to detect the same size of main effect.
Methods:
We describe several methods for assessing differences in variance in trial arms and apply them to a single trial with individual patient data and to meta-analyses using summary data. Where individual data are available, we use regression-based methods to examine the effects of covariates on variation. We present an additional method to meta-analyze differences in variances with summary data.
Results:
In the single trial there was agreement between methods, and the difference in variance was largely due to differences in prevalence of depression at baseline. In two meta-analyses, most individual trials did not show strong evidence of a difference in variance between arms, with wide confidence intervals. However, both meta-analyses showed evidence of greater variance in the control arm, and in one example this was perhaps because mean outcome in the control arm was higher.
Conclusions:
Using meta-analysis, we overcame low power of individual trials to examine differences in variance using meta-analysis. Evidence of differences in variance should be followed up to identify potential effect modifiers and explore other possible causes such as varying compliance.

... Simply labeling patients as "responders" and "non-responders" based on crossing an arbitrary threshold on a continuous outcome scale is not a valid way to investigate variation in individual treatment effect [22]. In order to assess treatment effect heterogeneity from the data of parallel group trials, the comparison of variances between the active and the control condition has been proposed [22,23]. Here, an increase in variance in the active group might be a signal of a variation in the individual treatment effect [22]. ...

Background: The average treatment effect of antidepressants in major depression was found to be about 2 points on the 17-item Hamilton Depression Rating Scale, which lies below clinical relevance. Here, we searched for evidence of a relevant treatment effect heterogeneity that could justify the usage of antidepressants despite their low average treatment effect.
Methods: Bayesian meta-analysis of 169 randomized, controlled trials including 58,687 patients. We considered the effect sizes log variability ratio (lnVR) and log coefficient of variation ratio (lnCVR) to analyze the difference in variability of active and placebo response. We used Bayesian random-effects meta-analyses (REMA) for lnVR and lnCVR and fitted a random-effects meta-regression (REMR) model to estimate the treatment effect variability between antidepressants and placebo.
Results: The variability ratio was found to be very close to 1 in the best fitting models (REMR: 95% HPD = [0.98, 1.02], lnVR REMA: 95% HPD = [1.00, 1.02]), whereas the CVR REMA showed a reduced variability 95% HPD [0.80,0.84]. The Widely Applicable Information Criterion (WAIC) showed that the lnVR REMR and the lnVR REMA outperform the lnCVR REMA. The between-study variance τ2 under the REMA was found to be (95% HPD [0.00, 0.00]).
Conclusions: The published data on antidepressants for the treatment of major depression is compatible with a near-constant treatment effect. Although it is impossible to rule out a substantial treatment effect heterogeneity, its existence seems rather unlikely. Since the average treatment effect of antidepressants falls short of clinical relevance, the current prescribing practice should be re-evaluated.

... Some important recent papers are Fraser [41], Hampel [51], and Hannig et al. [53]. We also mention the following excellent books: Fisher and Bennett [37], Fisher [36], Fisher and Bennett [38], Garthwaite and Jones [47] and Lehmann [78]. For excellent debates about fiducial inference and generalized fiducial inference, and debates with Bayes inference, we refer the readers to Pederson [107], Zabell [150] and Hannig [52] . ...

A review is provided of the concept confidence distributions. Material covered include: fundamentals, extensions, applications of confidence distributions and available computer software. We expect that this review could serve as a source of reference and encourage further research with respect to confidence distributions.

... In his 1956 paper, he describes a two-parameter twisted normal example that demonstrates that the fiducial construction is not unique for two-parameter models. In the published volume of Fisher's correspondence (Bennett 1990 ) the correspondence between Fisher and Tukey makes interesting reading. In 1955, Tukey sent Fisher a draft of his paper, exploring through examples the matter of non-uniqueness of fiducial probability. ...

John Wilder Tukey was a scientific generalist, a chemist by undergraduate training, a topologist by graduate training, an environmentalist by his work on Federal Government panels, a consultant to US corporations, a data analyst who revolutionized signal processing in the 1960s, and a statistician who initiated grand programmes whose effects on statistical practice are as much cultural as they are specific. He had a prodigious knowledge of the physical sciences, legendary calculating skills, an unusually sharp and creative mind, and enormous energy. He invented neologisms at every opportunity, among which the best known are ‘bit’ for binary digit, and ‘software’ by contrast with hardware, both products of his long association with Bell Telephone Labs. Among his legacies are the fast Fourier transformation, one degree of freedom for non-additivity, statistical allowances for multiple comparisons, various contributions to exploratory data analysis and graphical presentation of data, and the jackknife as a general method for variance estimation. He popularized spectrum analysis as a way of studying stationary time series, he promoted exploratory data analysis at a time when the subject was not academically respectable, and he initiated a crusade for robust or outlier-resistant methods in statistical computation. He was for many years a scientific adviser to Presidents Eisenhower, Kennedy, Johnson and Nixon. A 1965 report he wrote was the impetus leading to Congressional action that created the Environmental Protection Agency, and a later 1976 report (4)* confirmed the threat of halocarbons to stratospheric ozone. His work for the State Department on the Nuclear Weapons Test Ban Treaty led him to develop data-analytic tools to distinguish explosions from earthquakes. Evidence of his influence can be seen in a wide areas of science and technology, from oceanography to seismology, from topology to sampling and statistical graphics. Among many honours, he was awarded the S.S. Wilks award of the American Statistical Association in 1965, the Medal of Honour of the Institute of Electronic and Electrical Engineers in 1982, and the US National Medal of Science in 1973.

... Fisher has provided a list of caveats and situations in which it can be applied which seems to be endless: the observable quantities must be continuos, the statistic involved must be complete and sufficient, the pivotal function must be monotonic. He wrote to Barnard in 1962 (Bennet, 1990, 1964) have proved that Fisher imposed far too many restrictions and that one can renounce at a part of them or at least replace them with something more convenient. For instance, he asserted that Fisher put the condition that a proper fiducial distribution must depend on the data only through sufficient statistics and parameters, without explaining beyond reasonable doubt why only this kind of dependence remains valid for post-data prescription. ...

This paper describes the general principles and methods of the fiducial inference. A brief survey of its competing inferencial theories as well as a comparison with them are also provided. Arguments in favour of the application of the fiducial method to the parameters of discrete random variables are given, and, as an applica-tion, the fiducial distribution associated to the binomial proportions is shown to be of the beta family.

... Further, the interpretation of the word ''probability'' itself differs in the Bayesian and frequentist philosophies. This difference characterized the debate between Harold Jeffreys, a Bayesian, and Sir Ronald A. Fisher, the founder of frequentist statistics (see reviews in Howie, 2002; Bennett, 1990). In JeffreysÕ formulation of Bayesian inference, probabilities are epistemic—that is, representing a degree of confidence or belief that a hypothesis is true (Jeffreys, 1961). ...

The objective Bayesian approach relies on the construction of prior distributions that reflect ignorance. When topologies are considered equally probable a priori, clades cannot be. Shifting justifications have been offered for the use of uniform topological priors in Bayesian inference. These include: (i) topological priors do not inappropriately influence Bayesian inference when they are uniform; (ii) although clade priors are not uniform, their undesirable influence is negated by the likelihood function, even when data sets are small; and (iii) the influence of nonuniform clade priors is an appropriate reflection of knowledge. The first two justifications have been addressed previously: the first is false, and the second was found to be questionable. The third and most recent justification is inconsistent with the first two, and with the objective Bayesian philosophy itself. Thus, there has been no coherent justification for the use of nonflat clade priors in Bayesian phylogenetics. We discuss several solutions: (i) Bayesian inference can be abandoned in favour of other methods of phylogenetic inference; (ii) the objective Bayesian philosophy can be abandoned in favour of a subjective interpretation; (iii) the topology with the greatest posterior probability, which is also the tree of greatest marginal likelihood, can be accepted as optimal, with clade support estimated using other means; or (iv) a Bayes factor, which accounts for differences in priors among competing hypotheses, can be used to assess the weight of evidence in support of clades. ©The Willi Hennig Society 2009

... In accordance with Eq. (3), the pure maximum likelihood estimation ignores the existence of any prior information. In the past, this triggered some controversy between the frequentist and Bayesian theories of probability [61,42]. To a degree, this controversy has been resolved in geosciences by using regularization approaches that include, to a certain extent, prior information [ example) is subjected to error. ...

Parameter identification is one of the key elements in the construction of models in geosciences. However, inherent difficulties such as the instability of ill-posed problems or the presence of multiple local optima may impede the execution of this task.
Regularization methods and Bayesian formulations, such as the maximum a posteriori estimation approach, have been used to overcome those complications. Nevertheless, in some instances, a more in-depth analysis of the inverse problem is advisable before obtaining estimates of the optimal parameters. The Markov Chain Monte Carlo (MCMC) methods used in Bayesian inference have been applied in the last 10years in several fields of geosciences such as hydrology, geophysics or reservoir engineering.
In the present paper, a compilation of basic tools for inference and a case study illustrating the practical application of them are given. Firstly, an introduction to the Bayesian approach to the inverse problem is provided together with the most common sampling algorithms with MCMC chains. Secondly, a series of estimators for quantities of interest, such as the marginal densities or the normalization constant of the posterior distribution of the parameters, are reviewed. Those reduce the computational cost significantly, using only the time needed to obtain a sample of the posterior probability density function. The use of the information theory principles for the experimental design and for the ill-posedness diagnosis is also introduced. Finally, a case study based on a highly instrumented well test found in the literature is presented. The results obtained are compared with the ones computed by the maximum likelihood estimation approach.

en The problem posed by exact confidence intervals (CIs) which can be either all‐inclusive or empty for a nonnegligible set of sample points is known to have no solution within CI theory. Confidence belts causing improper CIs can be modified by using margins of error from the renewed theory of errors initiated by J. W. Tukey—briefly described in the article—for which an extended Fraser's frequency interpretation is given. This approach is consistent with Kolmogorov's axiomatization of probability, in which a probability and an error measure obey the same axioms, although the connotation of the two words is different. An algorithm capable of producing a margin of error for any parameter derived from the five parameters of the bivariate normal distribution is provided. Margins of error correcting Fieller's CIs for a ratio of means are obtained, as are margins of error replacing Jolicoeur's CIs for the slope of the major axis. Margins of error using Dempster's conditioning that can correct optimal, but improper, CIs for the noncentrality parameter of a noncentral chi‐square distribution are also given.
Résumé
fr Le problème que posent les intervalles de confiance exacts qui peuvent être soit universels soit vides pour un ensemble appréciable de points d'échantillonnage est bien connu comme insoluble dans la théorie des intervalles de confiance. Les ceintures de confiance qui causent des intervalles de confiance impropres peuvent être modifiées en utilisant des marges d'erreur issues d'une théorie renouvelée des erreurs entreprise par J. W. Tukey—décrite brièvement dans le présent article—pour laquelle l'interprétation de Fraser en termes de fréquence est étendue. Dans cette approche, qui est en accord avec l'axiomatisation des probabilités selon Kolmogorov, une probabilité et une mesure d'erreur obéissent aux mêmes axiomes, bien que les connotations de ces deux mots soient différentes. L'auteur propose un algorithme capable de produire une marge d'erreur pour tout paramètre dérivé des cinq paramètres de la distribution normale à deux variables. Il obtient des marges d'erreur qui corrigent les intervalles de confiance de Fieller pour un rapport de moyennes, de même que des marges d'erreur qui remplacent les intervalles de confiance de Jolicoeur pour la pente d'un axe principal. Il fournit aussi des marges d'erreur qui utilisent les erreurs conditionnelles de Dempster pour corriger les intervalles de confiance optimaux mais impropres afin d'estimer le paramètre de non‐centralité d'une distribution chi‐carré non centrale.

Randomised controlled trials (RCTs) with continuous outcomes usually only examine mean differences in response between trial arms. If the intervention has heterogeneous effects (e.g. the effect of the intervention differs by individual characteristics), then outcome variances will also differ between arms. However, power of an individual trial to assess heterogeneity is lower than the power to detect the same size of main effect. The aim of this work was to describe and implement methods for examining heterogeneity of effects of interventions, in trials with individual patient data (IPD) and also in meta-analyses using summary data. Several methods for assessing differences in variance were applied using IPD from a single trial, and summary data from two meta-analyses.
In the single trial there was agreement between methods, and the difference in variance was largely due to differences in depression at baseline. In two meta-analyses, most individual trials did not show strong evidence of a difference in variance between arms, with wide confidence intervals. However, both meta-analyses showed evidence of greater variance in the control arm, and in one example this was perhaps because mean outcome in the control arm was higher.
Low power of individual trials to examine differences in variance can be overcome using meta-analysis. Evidence of differences in variance should be followed-up to identify potential effect modifiers and explore other possible causes such as varying compliance.

Square lattice designs are often used in trials of new varieties of various agricultural crops. However, there are no square lattice designs for 36 varieties in blocks of size six for four or more replicates. Here, we use three different approaches to construct designs for up to eight replicates. All the designs perform well in terms of giving a low average variance of variety contrasts.
Supplementary materials accompanying this paper appear online.

Square lattice designs are often used in trials of new varieties of various agricultural crops. However, there are no square lattice designs for 36 varieties in blocks of size six for four or more replicates. Here we use three different approaches to construct designs for up to eight replicates. All the designs perform well in terms of giving a low average variance of variety contrasts. Supplementary materials are available online.

Observed response to regular exercise training differs widely between individuals even in tightly controlled research settings. However, the respective contributions of random error and true interindividual differences as well as the relative frequency of non-responders are disputed. Specific challenges of analyses on the individual level as well as a striking heterogeneity in definitions may partly explain these inconsistent results. Repeated testing during the training phase specifically addresses the requirements of analyses on the individual level. Here we report a first implementation of this innovative design amendment in a head to head comparison if existing analytical approaches. To allow for comparative implementation of approaches we conducted a controlled endurance training trial (one year walking/jogging 3 days/week for 45 min with 60% heart rate reserve) in healthy, untrained subjects (n=36, age=46±8; BMI 24.7±2.7; VO 2max 36.6±5.4). In the training group additional VO2max tests were conducted after 3, 6 and 9 months. Duration of the control condition was 6 months due to ethical constraints. General efficacy of the training intervention could be verified by a significant increase in VO2max in the training group (p<0.001 vs. control). Individual training response of relevant magnitude (>0.2*baseline variability in VO 2max ) could be demonstrated by several approaches. Regarding the classification of individuals only 11 out of 20 subjects were consistently classified, demonstrating remarkable disagreement between approaches. These results are in support of relevant interindividual variability in training efficacy and stress the limitations of a responder classification. Moreover, this proof-of-concept underlines the need for tailored methodological approaches for well-defined problems.
Key words: Variance components, interaction, classification, personalized medicine

We develop a unified theory of designs for controlled experiments that balance baseline covariates a priori (before treatment and before randomization) using the framework of minimax variance and a new method called kernel allocation. We show that any notion of a priori balance must go hand in hand with a notion of structure, since with no structure on the dependence of outcomes on baseline covariates complete randomization (no special covariate balance) is always minimax optimal. Restricting the structure of dependence, either parametrically or non-parametrically, gives rise to certain covariate imbalance metrics and optimal designs. This recovers many popular imbalance metrics and designs previously developed ad hoc, including randomized block designs, pairwise-matched allocation and rerandomization. We develop a new design method called kernel allocation based on the optimal design when structure is expressed by using kernels, which can be parametric or non-parametric. Relying on modern optimization methods, kernel allocation, which ensures nearly perfect covariate balance without biasing estimates under model misspecification, offers sizable advantages in precision and power as demonstrated in a range of real and synthetic examples. We provide strong theoretical guarantees on variance, consistency and rates of convergence and develop special algorithms for design and hypothesis testing. © 2017 The Royal Statistical Society and Blackwell Publishing Ltd.

I pick up a very few points of minor disagreement with Stefan Wellek's comprehensive review of P-values in this journal. I conclude that P-values have a limited function in statistical inference but can nevertheless have their uses.

Ihr volle wissenschaftstheoretisch-philosophische Tiefe erhält die Statistik erst mit der Betrachtung des allgemeinen Induktionsproblems. Man kann sie sogar als den am weitesten ausgearbeiteten theoretisch fundierten als auch praktisch erfolgreichen Versuch auffassen, jenes zu lösen. Die Formulierung von Tukey deutet bereits an, dass die Statistik nicht eine, sondern ein ganzes Spektrum spezieller Lösungen anbietet. Genauso wenig wie es den Stein der Weisen gibt, existiert ein Induktionsprinzip. Vielmehr gibt es eine ganze Reihe von Ansätzen und verschiedenartige Klassen von Argumenten um Verallgemeinerungen zu rechtfertigen.

We know a lot about the phenomena involved in the use of our techniques. Some of what we know has been learned deductively, using assumptions and mathematics. We do learn from practice, as well as from deduction and from experimental sampling. We can practice a science. We need not hide behind a mysterious shield of false-tofact deduction!

Reichenbach (1951: 117) machte den Fortschritt der Philosophie daran fest, dass man mehr und mehr lernt, welche Fragen man besser nicht stellen sollte. Sich hierauf beziehend sagt Tukey (1961: 148): “[…] statistics has grown, and must continue to grow, by learning what questions not to fear.” Tukey (1986b: 289) ergänzt: “[…] the development of statistics can be portrayed as learning more and more things about which certainty should not be sought.” Tukey (1986c: 588f) fasst zusammen: “[…] the history of statistics has involved - indeed, very nearly consisted of - successive enforced retreats from certainty. Each step of that retreat has brought further gains. It is fair to say that statistics has made its greatest progress by having to move away from certainty, to move in a direction some would feel to be backward […] Each of these steps has built on the past, most have led to a weaker and weaker form of certainty.”

Some statisticians and economists might find it surprising to learn that statistics and economics share common roots going back to ‘Political Arithmetic’ in the mid-17th century. The primary objective of this article is to revisit the common roots and trace the parallel development of both disciplines up to and including the 20th century, and to attempt to signpost certain methodological lessons that were missed along the way to the detriment of both disciplines. The emphasis is primarily on methodological developments, with less attention paid to institutional developments.

Wir kommen nun der Aufforderung Kempthornes (S. 247) nach und erweitern den unteren Teil des Forschungszirkels zu einer kompletten Wissenschaftstheorie. Der Umgang mit Daten wird so zum Herzstück quantitativer und oft mit größeren Unsicherheiten behafteter empirischer Wissenschaft, während sich der gesamte Zirkel (Abschnitt 5.2) als „Rad der Erkenntnis“ der empirischen Wissenschaften verstehen lässt. Dempster (1990: 262) formuliert wie folgt: “Statisticians participate directly and indirectly in scientific developments in many fields where statistical methodology is applied, so are well placed to develop a philosophy of science, including statistical science, that accords with the realities of practice.”

‘To work for that Galtonian renascence has been the writer’s main aim in life’ wrote Karl Pearson in April 1914, and for us to explore the extent to which Pearson was successful in transmitting and elaborating his Galtonian statistical inheritance it is natural to start with the work from whose preface this quotation is taken, The Life, Letters and Labours of Francis Galton, published in three volumes (but four parts) by Cambridge University Press between 1914 and 1930 (Pearson, 1914–30). That renascence was being produced in innumerable branches of science, wrote Pearson, by the ramifications of Galton’s methods, and ‘will be as epoch-making in the near future as the Darwinian theory of evolution was in biology from 1860 to 1860 to 1880’. Pearson, having in his preface just taken a swipe at William Bateson and the Mendelian school, added for good measure ‘… and which has encountered and will encounter no less bigoted opposition from both the learned and the lay’. He was not far off the mark, but it was to be Pearson’s own statistical work rather than Galton’s methods that encountered opposition.

Throughout the nineteenth century, the most commonly used statistical procedure was estimation by means of least squares. In 1894, Karl Pearson broke new ground by proposing an alternative approach: the method of moments. Of this method, Fisher, in his fundamental paper of 1922 [18] (discussed in Sect. 1.5), wrote that it is “without question of great practical utility.” On the other hand, he points out that it requires the population moments involved to be finite, and “that it has not been shown, except in the case of a normal curve, that the best values will be obtained by the method of moments.” And he asserts that “a more adequate criterion is required.” For this purpose, he proposes the method of maximum likelihood.

As stated in the Preface, it is the aim of this book to trace the creation of classical statistics, and to show that it was principally the work of two men, Fisher and Neyman. Since the main story is somewhat lost in the details, let us now review their contributions to hypothesis testing, estimation, design, and the philosophy of statistics.

R. A. Fisher transformed the statistics of his day from a modest collection of useful ad hoc techniques into a powerful and systematic body of theoretical concepts and practical methods. This achievement was all the more impressive because at the same time he pursued a dual career as a biologist, laying down, together with Sewall Wright and J. B. S. Haldane, the foundations of modern theoretical population genetics.

Better known by his pseudonym, “Student,” Gosset’s name is associated with the discovery of the t-distribution and its use. He had a profound effect on the practice of statistics in industry and agriculture.

This article brings attention to some historical developments that gave rise to the Bayes factor for testing a point null hypothesis against a composite alternative. In line with current thinking, we find that the conceptual innovation - to assign prior mass to a general law - is due to a series of three articles by Dorothy Wrinch and Sir Harold Jeffreys (1919, 1921, 1923). However, our historical investigation also suggests that in 1932 J.B.S. Haldane made an important contribution to the development of the Bayes factor by proposing the use of a mixture prior comprising a point mass and a continuous probability density. Jeffreys was aware of Haldane's work and it may have inspired him to pursue a more concrete statistical implementation for his conceptual ideas. It thus appears that Haldane may have played a much bigger role in the statistical development of the Bayes factor than has hitherto been assumed.

Various sources of variation in observed response in clinical trials and clinical practice are considered, and ways in which the corresponding components of variation might be estimated are discussed. Although the issues have been generally well-covered in the statistical literature, they seem to be poorly understood in the medical literature and even the statistical literature occasionally shows some confusion. To increase understanding and communication, some simple graphical approaches to illustrating issues are proposed. It is also suggested that reducing variation in medical practice might make as big a contribution to improving health outcome as personalising its delivery according to the patient. It is concluded that the common belief that there is a strong personal element in response to treatment is not based on sound statistical evidence. Copyright © 2015 John Wiley & Sons, Ltd.

In 1930 R.A. Fisher put forward the fiducial argument. This paper discusses the argument and its origins in Fisher's earlier work. It also emphasises the contribution of Mordecai Ezekiel to the 1930 publication. /// En 1930 R. A. Fisher a présenté l'argument fiduciare. Je discute l'argument et ses origines, soulignant le rôle de Mordecai Ezekiel dans le développement du 1930.

We summarize and extend the concept of Gaussian, or
normal, distributions into multivariate statistics over
many dimensions. We demonstrate how multivariate
statistics can be applied to probability distributions.
Through assumptions in the linearization of the inverse
problem, we show that the best-fit inverse model
parameters are normally distributed with mean values and
associated variance and covariance values that obey
Gaussian statistics. Variance and covariance values
describe how the model parameters interact with each
other. By changing one value in the model parameter
vector, other parameters are changed through the
covariance that links them. We apply Gaussian statistics
over many dimensions to query our models for
statistically meaningful questions that can only be
answered by taking the integral of the multivariate
distribution over the multidimensional space that contains
the model parameter values. We illustrate this with an
example of aquifer detection, using resistivity limits, for
an electromagnetic transect adjacent to the Gascoyne
River near Carnarvon, Western Australia.

Drug development is not the only industrial-scientific enterprise subject to government regulations. In some fields of ecology and environmental sciences, the application of statistical methods is also regulated by ordinance. Over the past 20years, ecologists and environmental scientists have argued against an unthinking application of null hypothesis significance tests. More recently, Canadian ecologists have suggested a new approach to significance testing, taking account of the costs of both type I and type II errors. In this paper, we investigate the implications of this for testing in drug development and demonstrate that its adoption leads directly to the likelihood principle and Bayesian approaches. Copyright © 2015 John Wiley & Sons, Ltd.

One hundred years ago, an author under the pseudonym of “Student” published a paper which was to become famous. It was entitled “The probable error of a mean”. But what we now know as Student's t-test attracted little attention. It took another statistician of genius, R. A. Fisher, to amend, publicise and make it ubiquitous. But both Student's and Fisher's published versions were based upon faulty data. Stephen Senn reminds us of the third dedicated researcher and the quarter of a century delay before the story behind Student's t-test emerged.

Ronald Fisher blieved that "The theory of inverse probability is founded upon an error,
and must be wholy rejected." This note describes how Fisher divided responsibility for the
error between Bayes and Laplace. Bayes he admired for formulating the problem, producing a
solution and then withholding it; Laplace he blamed for promulgating the theory and
for distorting the concept of probability to accommodate the theory. At the end of his
life Fisher added a refinement: in the Essay Bayes had anticipated one of Fisher's
own fiducial arguments.

The p-value serves a valuable purpose in the evaluation and interpretation of research findings and most computer programs report the results of statistical tests as probability values. It enables the researchers to set their own level of significance and to reject or accept the null hypothesis in accordance with their own criterion rather than that of fixed level of significance. Many statisticians may, in some situations, disagree on its appropriate use and on its interpretation as a measure of evidence. Despite advances in applied statistics, many researchers still rely on a set of tables (i.e. z, t, F, and chi-square) which do not give the upper tail probabilities, for which they have to convert those probabilities to get the required p-values. In recent years statisticians have prepared the tables for z, t, F and chi-square, which gives directly the upper tail probabilities. As many situations, the evaluation of exact p-values for t and F-test become difficult without a computer package. Therefore, direct methods may be used to evaluate the exact p-values. In this article, a description of the p-value along with a review of literature is presented. Also, the tables which provide upper tail probabilities are indicated. Numerical examples are provided and comments and suggestions are made.

The preceding two chapters describe the two stages of the founding of the field of hypothesis testing. The first stage consisted
of Fisher’s development of a cohesive methodology of basic tests under the assumption of normality. It was followed by the
Neyman-Pearson theory of optimal tests, which as its major application provided a justification of the tests Fisher had proposed
on intuitive grounds. What, one wonders, was Fisher’s reaction to this new perspective.

We saw in Sect. 1.3 that Student in 1908a brought a new point of view to statistical inference by determining the small-sample
(exact) distribution of what he called z, now called t, under the assumption of normality. Student found the correct form of this distribution but was not able to prove it.

A perspective on statistical inference is proposed that is broad enough to encompass modern Bayesian and traditional Fisherian
thinking, and interprets frequentist theory in a way that gives appropriate weights to both science and mathematics, and to
both objective and subjective elements. The aim is to inject new thinking into a field held back by a longstanding lack of
consensus.

A survey of 252 prospective, comparative studies reported in five, frequently cited biomedical journals revealed that experimental groups were constructed by randomization in 96% of cases and by random sampling in only 4%. The median group sizes ranged from 4 to 12. In the randomized studies in which measurements were made on a continuous scale, comparisons of location were made by t or F tests in 84% of cases, and by nonparametric, rank-order, tests in the remainder. Because randomization rather than random sampling is the norm in biomedical research and because group sizes are usually small, exact permutation or randomization tests for differences in location should be preferred to t or F tests.

Histories of econometrics describe what econometricians did with the ideas of statisticians—correlation, regression, maximum likelihood, testing statistical hypotheses, etc. The transfer of these ideas was made possible by institutions, practices, and personal contacts between statisticians and econometricians. This essay complements the usual history of econometrics with an account of these transactions from the statisticians' side. I concentrate on four statisticians and their interactions with econometricians: Karl Pearson, whose influence was exercised in the years before the First World War; Ronald Fisher, who began to be influential in the late 1920s; Jerzy Neyman, who appeared in the late 1930s; and Abraham Wald in the early 1940s.

A sketch of the history of Fisher’s fiducial argument is accompanied by a version, called “pivotal inference” which, it seems, may give a consistent revised version of the ideas involved.

The graphic portrayal of quantitative information has deep roots. These roots reach into histories of thematic cartography, statistical graphics, and data visualization, which are intertwined with each other. They also connect with the rise of statistical thinking up through the 19th century, and developments in technology into the 20th century. From above ground, we can see the current fruit; we must look below to see its pedigree and germination. There certainly have been many new things in the world of visualization; but unless you know its history, everything might seem novel.

It is argued that statistical testing has been overvalued because it is perceived as an optimal, objective, algorithmic method of testing hypotheses. Researchers need to be made aware of the subjective nature of statistical inference in order not to make too much of it. Examples are given of aspects of data that are ignored in the computation of p values but are relevant to their interpretation. These include the fit of the chance distribution, the presence of influential points, the possibilities for post hoc selectivity, the presence of expected and unexpected trends in the data, and the amount of sampling variability that is present. Researchers should be taught that although probabilistic reasoning is a deductive process, making inferences from data is not. There is always potentially relevant information available over and above that which has been taken account of by any p value. Indeed, it is noted that a probability can never characterize all the uncertainty regarding an event because of problems to do with self-reference. Theories of statistical inference need to be weakened to take these ideas into account. Statistical tests assess the implausibility of supposing that the sign of a difference can be explained away as due to sampling error and they work only to the extent that this implausibility can be inferred from the size of the test statistic. The process of inferring plausibility is a subjective one and is inconsistent with the Neyman-Pearson Theory of statistical testing and Fisher's notion of fiducial statistical tests. Probabilities need to be assessed in the context of what they have and have not taken into account and this weakens Bayesian statistical inferences as well as classical ones.

Ronald A. Fisher's 1921 article on mathematical statistics (submitted and read in 1921; published in 1922) was arguably the most influential article on that subject in the twentieth century, yet up to that time Fisher was primarily occupied with other pursuits. A number of previously published documents are examined in a new light to argue that the origin of that work owes a considerable (and unacknowledged) debt to a challenge issued in 1916 by Karl Pearson.

The early history of the Gram-Charlier series is discussed from three points of view: (1) a generalization of Laplace's central limit theorem, (2) a least squares approximation to a continuous function by means of Chebyshev-Hermite polynomials, (3) a generalization of Gauss's normal distribution to a system of skew distributions. Thiele defined the cumulants in terms of the moments, first by a recursion formula and later by an expansion of the logarithm of the moment generating function. He devised a differential operator which adjusts any cumulant to a desired value. His little known 1899 paper in Danish on the properties of the cumulants is translated into English in the Appendix.

The group theorist William Burnside devoted much of the last decade of his life to probability and statistics. The work led to contact with Ronald Fisher who was on his way to becoming the leading statistician of the age and with Karl Pearson, the man Fisher supplanted. Burnside corresponded with Fisher for nearly three years until their correspondence ended abruptly. This paper examines Burnside's interactions with the statisticians and looks more generally at his work in probability and statistics.

In 1930 R. A. Fisher put forward the fiducial argument. This paper discusses the argument and its origins in Fisher's earlier work. It also emphasises the contribution of Mordecal Ezekiel to the 1930 publication.

The paper discusses the scope and influence of eugenics in defining the scientific programme of statistics and the impact of the evolution of biology on social scientists. It argues that eugenics was instrumental in providing a bridge between sciences, and therefore created both the impulse and the institutions necessary for the birth of modern statistics in its applications first to biology and then to the social sciences. Looking at the question from the point of view of the history of statistics and the social sciences, and mostly concentrating on evidence from the British debates, the paper discusses how these disciplines became emancipated from eugenics precisely because of the inspiration of biology. It also relates how social scientists were fascinated and perplexed by the innovations taking place in statistical theory and practice.

ResearchGate has not been able to resolve any references for this publication.