ArticlePDF Available

Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers

Authors:

Abstract

Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use meta-analysis to show that reliance on significance testing retards the development of cumulative knowledge. But reform of teaching and practice will also require that researchers learn that the benefits that they believe flow from use of significance testing are illusory. Teachers must revamp their courses to bring students to understand that (a) reliance on significance testing retards the growth of cumulative research knowledge; (b) benefits widely believed to flow from significance testing do not in fact exist; and (c) significance testing methods must be replaced with point estimates and confidence intervals in individual studies and with meta-analyses in the integration of multiple studies. This reform is essential to the future progress of cumulative knowledge in psychological research. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
... A significance level of 0.412 (p > 0.05) was found which implies that the variances of the two groups were not significantly different. This gives evidence that there was equality of variances and also one of the prerequisites for carrying out the Independent Samples T-Test as stated by Schmidt (1996). ...
... In practice, such slight deviations from ideal normality as found in this study, may not invalidate the results of parametric tests,such as the T-Test, where sample sizes are of reasonable size (Schmidt, 1996). However, in order to take extra precautions, some researchers like Pallant (Pallant, 2020) recommend applying nonparametric tests in post-test analyses so as not to introduce potential biases due to non-normality of data. ...
... Other studies recommend that pre-existing differences should be addressed especially in experimental studies within an educational context, where there is a high potential for individual differences (Ghasemi & Zahediasl, 2012;Schmidt, 1996). Such finding conforms to current literature indicating that it is nonsignificant baseline difference where the intervention effect is demonstrated that minimizes bias in the interpretation of intervention effect (Kline, 2016). ...
Article
Full-text available
This study investigates the impact of traditional game-based teaching on the geometry students' creativity, emphasizing similarity and parallelism concepts. Creative thinking is important in education, but memorization typically overshadows it. Traditional games can connect the curriculum to students' cultures and boost engagement and creativity. A quasi-experimental design with a pre-test and post-test control group of 40 students assigned 24 to the experimental group, which received traditional games training, and 16 to the control group, which received conventional instruction. Students' fluency, flexibility, and originality were tested, and N Gain scores rated their improvement. The findings showed that the students in the experimental group excelled further than the students in the control group on all of the measures of creative thinking. The results achieved the most significant enhancement in fluency and flexibility while there was a less significant increase in originality. This indicates that playing traditional games helps students to come with ideas and diversify their approaches but would necessitate other activities to develop originality. The intervention also exhibited the effectiveness of culturally relevant pedagogy in combining the students’ cultural background and sociometric ideas by making geometry more appealing. To deepen the understanding of how traditional games affect students’ creative thinking further research should have a larger population, extended duration of the intervention and more games to ensure better understanding of the long-term effects and originality improvements. The originality of this study is centered on the fact that cultural assets can be embedded into classroom activities and shows the importance of using traditional games such as Gobak Sodor in promoting students’ creative and mathematical thinking as well as their cultural identity in a holistic way.
... The adaptation's construct, discriminant and test-retest validities were shown comparable to those of the original gold standard RAVLT. Vermeent et al. (2022) incorporated the English RAVLT (Schmidt, 1996) into the Philips IntelliSpace Cognition platform to be used on an iPad tablet and showed its statistical equivalence to the pen-and-paper version. In the present study, we developed a RAVLT World mobile application incorporating the new Russian version along with other language versions under development by our team (Ukrainian and Uzbek). ...
... Both in the pen-and-paper and digital form, the testing procedure followed Schmidt (1996) and included a learning phase (Trials 1-5), interference (Trial 6), immediate recall after interference (Trial 7), delayed recall (Trial 8), and visual recognition (Trial 9). In Trials 1 to 8, the participant did not have access to the record sheet or the tablet. ...
... Data were collected by authors of the present paper or by student volunteers, all trained in Linguistics or Psychology. They were all given oral and written instructions on conducting the test following the procedure described by Schmidt (1996). 188 of the participants were administered the test once, only to establish norms and estimate the effect of demographic variables. ...
... Therefore, it is easier to understand Neyman-Pearson's procedure if we peg the effect size to beta and call it the expected minimum effect size (MES; Figure 3). This helps us conceptualize better how Neyman-Pearson's procedure works (Schmidt, 1996): The minimum effect size effectively represents that part of the main hypothesis that is not going to be rejected by the test (i.e., MES captures values of no research interest which you want to left under H M ; Cortina and Dunlap, 1997;Hagen, 1997;Macdonald, 2002). (Worry not, as there is no need to perform any further calculations: The population effect size is the one to use, for example, for estimating research power.) ...
... NHST is, in reality, an amalgamation of Fisher's and Neyman-Pearson's theories, offered as a seamless approach to testing (Gigerenzer, 2004;Macdonald, 2002). It is not a clearly defined amalgamation either and, depending on the author describing it or on the researcher using it, it may veer more towards Fisher's approach (e.g., American Psychological Association, 2010;Krueger, 2001;Nunnally, 1960;Wilkinson and the Task Force on Statistical Inference, 1999) or towards Neyman-Pearson's approach (e.g., Cohen, 1988;Cortina and Dunlap, 1997;Frick, 1996;Kline, 2004;Nickerson, 2000;Rosnow and Rosenthal, 1989;Schmidt, 1996;Wainer, 1999). ...
Preprint
Despite frequent calls for the overhaul of null hypothesis significance testing (NHST), this controversial procedure remains ubiquitous in behavioral, social and biomedical teaching and research. Little change seems possible once the procedure becomes well ingrained in the minds and current practice of researchers; thus, the optimal opportunity for such change is at the time the procedure is taught, be this at undergraduate or at postgraduate levels. This paper presents a tutorial for the teaching of data testing procedures, often referred to as hypothesis testing theories. The first procedure introduced is the approach to data testing followed by Fisher (tests of significance); the second is the approach followed by Neyman and Pearson (tests of acceptance); the final procedure is the incongruent combination of the previous two theories into the current approach (NSHT). For those researchers sticking with the latter, two compromise solutions on how to improve NHST conclude the tutorial.
... The greatest potential benefit of intensive designs such as multiple baseline designs is the cumulative knowledge provided by multiple replications. As Schmidt (1996) has argued, because limited information is provided by any single study, it may best be viewed as a data point for a future meta-analysis. In contrast to cluster-randomized trials that require participation of many schools This document is copyrighted by the American Psychological Association or one of its allied publishers. ...
Article
Full-text available
Objective: Youth violence represents a significant public health problem that has serious and often lasting consequences for its victims and perpetrators. School-based interventions have considerable potential for playing a central role in prevention efforts. Although a variety of school-based interventions have shown some degree of success, further work is needed to improve their effectiveness. This article discusses several challenges that have impeded efforts to develop and evaluate school-based violence prevention programs and how these challenges might be addressed. Method: This article draws upon theoretical and empirical research on youth violence to highlight critical issues that have limited efforts to develop effective school-based youth violence prevention programs. Results: Applying a prevention science framework indicates that interventions are unlikely to be effective unless they target risk and protective factors most relevant to a specific population. Research on factors associated with youth violence suggests the need for comprehensive interventions that address risk and protective factors across multiple social–ecological domains. Subgroup differences in the effects of school-based violence prevention programs indicate the need for a better understanding of differences in patterns of risk and protective factors both across and within populations. Conclusions: Improving the effectiveness of school-based youth violence prevention programs will require an iterative process to identify subgroups of individuals across and within populations that differ in their patterns of risk and protective factors, to refine logic models to guide the development of interventions tailored to specific populations, and to evaluate interventions using designs that examine variability in intervention effects across subgroups.
... In our study we perform artifactual corrections of the correlations which is crucial for clarifying the associations between the variables. This ultimately contributes to a more comprehensive body of quantitative data that can be used in future meta-analyses on the topic in order to estimate the true magnitude of the effect sizes and to test whether the observed variability is real or is due to artifacts [72,83]. ...
Article
Full-text available
The main goal of the current study is to broaden the knowledge on the association between personality, subjective well-being (SWB) and technostress in an academic context. This research specifically examines the prevalence of technostress in a European university sample. It also explores the relationship between technostress and its dimensions with the Big Five model of personality and with SWB and its affective and cognitive components. Finally, the combined predictive validity of the Big Five and SWB on technostress is tested. The sample was composed of 346 undergraduate students. Correlational and multiple regression analyses were carried out. Results show that fatigue and anxiety are the most frequently experienced dimensions of technostress. Emotional stability, openness to experience, and SWB are negatively and significantly correlated to technostress. Multiple regression analyses show that the Big Five factors and SWB account for technostress variance, the main predictor being the affective component of SWB. These results contribute to a more comprehensive understanding of technostress and suggest that personality traits and SWB are important factors in its prediction. The theoretical and practical implications will be discussed.
... Scholars have long argued for the importance of effect sizes to quantify and advance knowledge in a field beyond prediction and significance testing (Lipsey & Wilson, 2001;Schmidt, 1996;Sink & Mvududu, 2010). To provide a direct estimate of vocational interest fit and test career theories in the present work, we take a novel approach to derive effect sizes of fit. ...
Article
Full-text available
Although research and policy efforts have attempted to "even the hiring playing field" and progress equal opportunities, systemic employment patterns based on gender and ethnicity remain prevalent. An unexplored avenue of diversity, equity, and inclusion efforts is the degree to which all people can obtain jobs that fit their interests. The present study used a large, diverse sample of over 250,000 American employees to estimate the average vocational interest fit that people have with their jobs and differences in fit across race/ ethnicity, gender, and education. Overall, employees showed moderate positive vocational interest fit with their jobs, with an average profile correlation of .20 between person and job interests. There were small gender differences in vocational interest fit favoring men, especially White and Hispanic men, with minimal differences across other race/ethnicity groups. However, the largest group differences emerged for education, as employees with higher educational attainment showed greater vocational interest fit, particularly among women. Further intersectional analyses added greater nuance to these results, including how various groups achieve vocational interest fit across different types of jobs. Altogether, this work
Article
Cybersecurity is the foundation for preserving confidentiality and integrity in the modern digital age. It is crucial for the security of individuals, organizations, and society. This paper is based on these premises, exploring the impact of demographic factors on user perceptions and behaviors regarding cybersecurity in digital ban king. The study draws on the socio-technical systems theory, which examines the relationships between social and technical elements within technology usage. The research was conducted through an online survey distributed via posts on social networks such as Facebook, LinkedIn, and Twitter and by emailing the survey to target groups currently engaged in secondary or higher education or already em ployed. The study involved 212 respondents divided into six age groups. The sample (n=212) was achieved with 100% response quality and a standard deviation of 0%. The research aimed to understand how demographic characteristics, particularly age, influence interactions with digital banking technology and cybersecurity pra ctices. Using multiple regression analysis, two hypotheses were tested hypothesis 1: Older users (aged 45 and above) demonstrate a higher level of caution in online payments compared to younger users (aged up to 44) was rejected, and hypothesis 2: A higher level of education positively influences users’ understanding of security when making online payments with bank cards confirmed. The results indicate that education is a significant predictor of a sense of security, with users having higher levels of education reporting a greater sense of security during online payments. Age and employment status did not prove to be statistically significant factors in explaining users’ sense of security within this sample. However, age showed a ne gative effect, suggesting that older users feel less secure.
Article
Full-text available
As research in psychology becomes more sophisticated and more oriented toward the development and testing of theory, it becomes more important to eliminate biases in data caused by measurement error. Both failure to correct for biases induced by measurement error and improper corrections can lead to erroneous conclusions that retard progress toward cumulative knowledge. Corrections for attenuation due to measurement error are common in the literature today and are becoming more common, yet errors are frequently made in this process. Technical psychometric presentations of abstract measurement theory principles have proved inadequte in improving the practices of working researchers. As an alternative, this article uses realistic research scenarios (cases) to illustrate and explain appropriate and inappropriate instances of correction for measurement error in commonly occurring research situations.
Chapter
It is widely felt that psychologists and other behavioural scientists tend to over-emphasize the role of tests of significance in experimental work to the neglect of problems of estimation such as are discussed in chapter 4. To some extent the criticism is justified, but over-emphasis is largely a reflection of the nature of much of the data in question. Measurements of psychological phenomena, as noted earlier, tend to be imprecise and it is frequently the case that variation is considerable not only from subject to subject but for a single subject on different occasions. In view of this variation it is difficult to assess at a glance the performance of subjects on different occasions and in different experimental situations, whereas a test of significance, which takes the amount of variability in the data into account, provides a useful quantitative appraisal of the results. Perhaps the most important warning to the amateur is not to take the word ‘significance’ too literally. A difference between the mean scores of two samples of patients, which is ‘significant’ in the probability or statistical sense, may still not be of sufficient magnitude to be of any great importance in a clinical sense. In this chapter some simple tests for comparing pairs of means will be described and the distinction between statistical and clinical significance will be emphasized.