ArticlePublisher preview available

The Role of Meta-Analysis in the Significance Test Controversy

Hogrefe Publishing
European Psychologist
Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The critique against significance testing has been increasingly acknowledged in recent years. This paper focuses on the relation between meta-analysis and this controversy. A contradiction in the literature can be seen in that significance testing has been blamed for the poor accumulation of knowledge in psychology, while at the same time meta-analytic reviews have claimed the opposite. Although a majority of meta-analytic experts argue against significance testing, this critique cannot account for the success of meta-analysis. Rather, it may be that meta-analysis has facilitated the recognition of the significance test critique. Taking the significance testing critique seriously has important implications for meta-analysis in that its research base (e. g., studies) is viewed as unreliable. Although the significance test controversy may lead to further fragmentation of psychology, it is not clear that this will negatively affect the practice of meta-analysis.
Gerhard Andersson TheRoleof Meta-Analysis in the Significance TestC ontroversy
The Role of Meta-Analysis in the
Significance Test Controversy
Gerhard Andersson
Department of Psychology, Uppsala University, Sweden
The critique against significance testing has been increasingly acknowl-
edged in recent years. This paper focuses on the relation between meta-
analysis and this controversy. A contradiction in the literature can be
seen in that significance testing has been blamed for the poor accumula-
tion of knowledge in psychology, while at the same time meta-analytic
reviews have claimed the opposite. Although a majority of meta-analytic
experts argue against significance testing, this critique cannot account
for the success of meta-analysis. Rather, it may be that meta-analysis has
facilitated the recognition of the significance test critique. Taking the
significance testing critique seriously has important implications for
meta-analysis in that its research base (e. g., studies) is viewed as unre-
liable. Although the significance test controversy may lead to further
fragmentation of psychology, it is not clear that this will negatively affect
the practice of meta-analysis.
Keywords: Meta-analysis, significance test debate, methodological critique, accumulation of research.
Introduction
Meta-analysis consists of a host of techniques used for
quantitatively summarizing findings from a large body
of empirical research. Since the advent of meta-analy-
sis, many of its proponents have argued that it is a way
to avoid the problems associated with significance test-
ing (Hedges & Olkin, 1985; Hunter & Schmidt, 1990;
Rosenthal, 1991). These problems are commonly de-
scribed under the heading of the “significance test con-
troversy,” and have been covered in books (Morrison &
Henkel, 1970; Chow, 1996), and in several psychology
journals recently (e.g., Chow, 1998; Kirk, 1996;
Schmidt, 1996; Shrout, 1997), and in the past (Bakan,
1966; Carver, 1978; Lykken, 1968; Rozeboom, 1960). Al-
though perhaps not familiar to all psychologists, the
practice of significance testing is due to a hybridization
of the contributions of J. Neyman, E.S. Pearson, and
R.A. Fisher (Chow, 1996; Gigerenzer, 1993; Goodman,
1993). More specifically, according to Chow (1996), sig-
nificance testing (based on the null hypothesis proce-
dure) has two sources: A statistical decision (based on
the pvalue) and an inferential procedure (based on con-
ditional syllogisms). The pvalue stands for the proba-
bility of obtaining data as extreme or more extreme
given that the null hypothesis is true: p(data null hy-
pothesis). Although there are different strands in the
critique against significance testing and the use of the
null hypothesis procedure, there are several issues on
which a majority of opponents agree. The criticism
against significance testing is often directed against the
European
Psychologist
ψψ
ψ
ψ
ψ
ψψ
ψ
ψ
ψ
Gerhard Andersson was educated at Uppsala University, Sweden
(MSc and PhD in Clinical Psychology). He is now Associate Professor
of Psychology at Uppsala University. He also holds a position as clinical
psychologist at the Department of Audiology, Uppsala University
Hospital. His current research interests are health psychology, meth-
odology, and clinical psychology. He has published papers on hearing
impairment, dizziness, tinnitus, and optimism. He is also the Editor-
in-Chief of the Scandinavian Journal of Behaviour Therapy.
Correspondence concerning this article should be addressed to Ger-
hard Andersson, Department of Psychology, Uppsala University, Box
1225, SE-751 42 Uppsala, Sweden (tel. +46 18 471-2116, fax +46
18 471-2123, E-mail Gerhard.Andersson@psyk.uu.se).
European Psychologist, Vol. 4, No. 2, June 1999, pp. 75–82
© 1999 Hogrefe & Huber Publishers 75
... Con tal propósito se han desarrollado diversas técnicas formales que permiten cuantificar el TE para diversas pruebas estadísticas habituales en la investigación psicológica como son, por ejemplo, la prueba t, el análisis correlacional r, y el análisis de varianza, entre otras (Cohen, 1988). Estas técnicas de estimación del TE poseen interés práctico en Psicología, no sólo como complemento necesario a la pruebas de hipótesis, sino también porque ofrecen una métrica común sobre la cual integrar los resultados de la investigación en estudios de meta-análisis (Anderson, 1999; Macbeth, citado por Kohan & Razumiejczyk, en prensa). Este interés ha llevado a la American Psychological Association (APA) a alentar su uso entre los investigadores en Psicología (Thompson, 1998) y también a que las publicaciones periódicas soliciten, cada vez más, no solo estadísticas, sino también sus TE (Hunter & Schmidt, 2004). ...
... La d de Cohen (1988) es una de las medidas más empleadas en las publicaciones especializadas para el cálculo del TE y en los estudios metaanalíticos (Anderson, 1999;Hunter & Schmidt, 2004). Su cómputo se presenta en la Ecuación 9. ...
Article
Full-text available
Effect size (ES) is a necessary complement to the statistical hypothesis testing, however, researchers rarely report ES in their papers. This work provides a conceptual review of the ES estimates for the difference between two means, taking into account the most important algorithms and their interpretation. We also provide a guide to the freely available and easy-to-use ViSta statistical software to compute ES. We hope this paper contributes to the diffusion of ES methods and encourages its use among researchers in Psychology.
... Continuing with the same example, we can state that the distribution of the experimental group's scores betters the distribution of the control group's scores by 82%, because that is the area under the normal curve that corresponds to a z score = .92. Another important advantage of this ES measure is that it provides a common measuring stick to compare the relative importance of interventions and programs across different research studies, e.g. in meta‐analytical studies (Anderson, 1999). ...
... Cohen's d (1988, 1994) is one of the most widely used measures in specialized publications to calculate ES, and in meta‐analytical studies (Anderson, 1999; Hunter & Schmidt, 2004). To calculate it, see Equation 1. Cohen's d can also be calculated from t‐test results (Thalheimer & Cook, 2002). ...
Article
Full-text available
Effect size measures are recognized as a necessary complement to statistical hypothesis testing because they provide important information that such tests alone cannot offer. In this paper we: a) briefly review the importance of effect size measures, b) describe some calculation algorithms for the case of the difference between two means, and c) provide a new and easy-to-use computer program to perform these calculations within ViSta “The Visual Statistics System”. A worked example is also provided to illustrate some practical issues concerning the interpretation and limits of effect size computation. The audience for this paper includes novice researchers as well as ViSta’s user interested on applying effect size measures.
... Con tal propósito se han desarrollado diversas técnicas formales que permiten cuantificar el TE para diversas pruebas estadísticas habituales en la investigación psicológica como son, por ejemplo, la prueba t, el análisis correlacional r, y el análisis de varianza, entre otras (Cohen, 1988). Estas técnicas de estimación del TE poseen interés práctico en Psicología, no sólo como complemento necesario a la pruebas de hipótesis, sino también porque ofrecen una métrica común sobre la cual integrar los resultados de la investigación en estudios de meta-análisis (Anderson, 1999; Macbeth, citado por Kohan & Razumiejczyk, en prensa). Este interés ha llevado a la American Psychological Association (APA) a alentar su uso entre los investigadores en Psicología (Thompson, 1998) y también a que las publicaciones periódicas soliciten, cada vez más, no solo estadísticas, sino también sus TE (Hunter & Schmidt, 2004). ...
... La d de Cohen, entonces , emplea este artificio para el cálculo del TE. El desvío estándar de la d de Cohen es, como ocurre con la g de Hedges que se presenta en las Ecuaciones 5 y 6, una medida que combina los desvíos estándar de los dos grupos, aunque la d no emplea el artificio de n – 1. La d de Cohen (1988) es una de las medidas más empleadas en las publicaciones especializadas para el cálculo del TE y en los estudios metaanalíticos (Anderson, 1999; Hunter & Schmidt, 2004). Su cómputo se presenta en la Ecuación 9. ...
Article
Full-text available
La estimación del tamaño del efecto (TE) se considera actualmente como un complemento necesario a las pruebas de hipótesis, no obstante, su uso se encuentra aún poco extendido entre los investigadores en Psicología. Este trabajo ofrece una revisión teórica de las estimaciones del TE para el caso de la diferencia entre dos medias, considerando los algoritmos más importantes y su interpretación. Complementariamente, se presenta y describe un nuevo programa para el cálculo del TE dentro del sistema ViSta. Este programa es simple de utilizar y se encuentra disponible de forma gratuita. Se espera que el trabajo contribuya a difundir estos procedimientos y aliente su uso entre los investigadores en Psicología.
... Nonetheless, perhaps an even greater contributor to the now faster pace of change is the wide acceptance of meta-analysis (e.g., Andersson, 2003). The critique of NHST and the practice of meta-analysis should be seen as distinct developments (Andersson, 1999). As noted above it is still the case that p levels are completely determined by the size of the effect in combination with the number of observations ( N ), holding measurement reliability and control over confounding variables constant. ...
Article
The practice of statistical inference in psychological research is critically reviewed. Particular emphasis is put on the fast pace of change from the sole reliance on null hypothesis significance testing (NHST) to the inclusion of effect size estimates, confidence intervals, and an interest in the Bayesian approach. We conclude that these developments are helpful for psychologists seeking to extract a maximum of useful information from statistical research data, and that seven decades of criticism against NHST is finally having an effect.
Preprint
Full-text available
Objective: The purpose was to compare the effectiveness of the Pilates Method versus the Back School in specialized care, assessing improving the disability with the Roland Morris questionnaire and the perceived pain with the visual analog scale (VAS) in people with non-specific chronic low back pain. Method: Single-blind randomized controlled trial to determine the effects of the Pilates Method for patients with low back pain compared to Back School exercises, two groups of 48 patients, 3-month treatment period. Results: The Pilates Group (GP) recorded significant improvements in all of the variables that were the subject for this research, compared to those provided by the Back School Group (GEE). In the Roland Morris questionnaire of 0,41 points [GP (Mean difference [MD] Pretreatment-posttreatment =2,08; 95% confidence interval [CI] = 1,21 to 2,95; p=0,001) vs GEE ( MD=1,66; 95% confidence interval [CI] = 0,90 to 2,43; p=0,001 )]. In the visual analog scale (VAS) of 0,40 points [ GP (Mean difference [MD] Pretreatment-posttreatment =1,82; 95% confidence interval [CI] = 1,24 to 2,40; p=0,001 ) vs GEE (MD=1,42; 95% confidence interval [CI] = 0,82 to 2,04; p=0,001)]. Conclusions: The treatment of nonspecific low back pain with therapeutic Pilates is more effective than the therapeutic treatment of the Back School, both in terms of functional disability and intensity of pain. Trial registration : This trial is registered in http://www.ensaiosclinicos.gov.br/rg/RBR-5nk2tr/ , with the ID number of RBR-5nk2tr
Article
The more two treatments' outcome distributions overlap, the more ambiguity there is about which would be better for some clients. Effect size and t-statistics ignore this ambiguity by indicating nothing about the contrasted treatments' outcome ranges, although the wider these are the smaller are these statistics and the more other influences than these given treatments matter for outcomes. Treatment contrast data analysis logically requires valid measurement of all the influences on outcomes. Each influence, measured or not, is somehow sampled in every treatment contrast, and the nature of this sampling affects the contrast's two outcome distributions. Sampling also affects replications of a treatment contrast, which requires sampling that produces the same statistically expected outcome distributions for each replicate as a logical prerequisite of proper meta-analysis. Because scientific human psychology is most fundamentally about individual persons and cases, rather than aggregations of persons or cases, contrasted treatments' outcome distributions ought eventually be disaggregated to whatever input dimension gradation configurations collapse their ranges to zero through jointly taking account of every influence on outcomes. Only then are the data about individual persons or cases and so relevant to psychotherapy theory.
Article
Full-text available
This meta-analysis tested the Dodo bird conjecture, which states that when psychotherapies intended to be therapeutic are compared, the true differences among all such treatments are 0. Based on comparisons between treatments culled from 6 journals, it was found that the effect sizes were homogeneously distributed about 0, as was expected under the Dodo bird conjecture, and that under the most liberal assumptions, the upper bound of the true effect was about .20. Moreover, the effect sizes (a) were not related positively to publication date, indicating that improving research methods were not detecting effects, and (b) were not related to the similarity of the treatments, indicating that more dissimilar treatments did not produce larger effects, as would be expected if the Dodo bird conjecture was false. The evidence from these analyses supports the conjecture that the efficacy of bona fide treatments are roughly equivalent.
Article
Full-text available
After 3 decades of intensive research, there is still confusion about the nature and reliability of relations between psychological factors and coronary heart disease (CHD). A meta-analysis, or quantitative review, was performed to integrate and organize the results of studies that investigated certain personality variables in relation to CHD. The personality variables included were anger, hostility, aggression, depression, extroversion, anxiety, Type A, and the major components of Type A. The meta-analytic framework helps focus attention on issues needing clarification. The results indicate that modest but reliable associations exist between some of the personality variables and CHD. The strongest associations were found for Type A and, surprisingly, for depression, but anger/hostility/aggression and anxiety also related reliably to CHD. The Structured Interview diagnosis of Type A was shown to be clearly superior to the Jenkins Activity Survey as a predictor of CHD. The Type A-CHD relation was smaller in prospective than in cross-sectional studies and smaller in recent than in less recent studies. This review also revealed that information about the interrelations of personality predictors of CHD is sorely needed. The picture of coronary-proneness revealed by this review is not one of a hurried, impatient workaholic but instead is one of a person with one or more negative emotions. We suggest that the concept of the coronary-prone personality and its associated research be broadened to encompass psychological attributes in addition to those associated with Type A behavior and narrowed to eliminate those components that the accumulated evidence shows to be unimportant.
Article
Full-text available
Results of 375 controlled evaluations of psychotherapy and counseling were coded and integrated statistically. The findings provide convincing evidence of the efficacy of psychotherapy. On the average, the typical therapy client is better off than 75% of untreated individuals. Few important differences in effectiveness could be established among many quite different types of psychotherapy. More generally, virtually no difference in effectiveness was observed between the class of all behavioral therapies (e.g., systematic desensitization and behavior modification) and the nonbehavioral therapies (e.g., Rogerian, psychodynamic, rational-emotive, and transactional analysis).