In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.
All content in this area was uploaded by Joseph P Simmons
Content may be subject to copyright.
A preview of the PDF is not available
... Consistent with recent proposals [82,83], we pre-registered the study and report for how we determined our sample size, all data exclusions, all manipulations and all measures in the study [see 84]. In addition, following open science initiatives [e.g., 85], the de-identified data set, stimuli and analysis code associated with this study are freely available online [86]. ...
... In doing so, this will help to overcome the vast challenges and barriers that are related to long-term HRI studied in natural ecologically valid settings. [82,83], we preregistered the study and report for how we determined our sample size, all data exclusions, all manipulations and all measures in the study [see 84]. In addition, following open science initiatives [e.g., 85], the deidentified data set, stimuli and analysis code associated with this study are freely available online [86]. ...
While interactions with social robots are novel and exciting for many people, one concern is the extent to which people’s behavioural and emotional engagement might be sustained across time, since during initial interactions with a robot, its novelty is especially salient. This challenge is particularly noteworthy when considering interactions designed to support people’s well-being, with limited evidence (or empirical exploration) of social robots’ capacity to support people’s emotional health over time. Accordingly, our aim here was to examine how long-term repeated interactions with a social robot affect people’s self-disclosure behaviour toward the robot, their perceptions of the robot, and how such sustained interactions influence factors related to well-being. We conducted a mediated long-term online experiment with participants conversing with the social robot Pepper 10 times over 5 weeks. We found that people self-disclose increasingly more to a social robot over time, and report the robot to be more social and competent over time. Participants’ moods also improved after talking to the robot, and across sessions, they found the robot’s responses increasingly comforting as well as reported feeling less lonely. Finally, our results emphasize that when the discussion frame was supposedly more emotional (in this case, framing questions in the context of the COVID-19 pandemic), participants reported feeling lonelier and more stressed. These results set the stage for situating social robots as conversational partners and provide crucial evidence for their potential inclusion in interventions supporting people’s emotional health through encouraging self-disclosure.
... USD) gift card. At the time of the study, we used a rule of at least 50 participants per cell to determine sample size (Simmons, Nelson, & Simonsohn, 2011). Data collection stopped at the end of the semester considering that the sample size requirement had been reached. ...
To understand the persistent social class achievement gap, researchers have investigated how educational settings affect lower versus higher socioeconomic status (SES) students’ performance. We move beyond the question of actual performance to study its assessment by evaluators. We hypothesized that even in the absence of performance differences, assessment’s function of selection (i.e., compare, rank, and track students) leads evaluators to create a SES achievement gap. In 2 experiments (N = 196; N = 259), participants had to assess a test supposedly produced by a high- or a low-SES student, and used assessment for selection (i.e., normative grading) or learning (i.e., formative comments). Results showed that evaluators using assessment for selection found more mistakes if the test was attributed to a low-rather than a high-SES student, a difference reduced in the assessment for learning condition. The third and fourth experiments (N = 374; N = 306) directly manipulated the function of assessment to investigate whether the production of the social class achievement gap was facilitated by the function of selection to a greater extent than the educational function. Results of Experiment 3 supported this hypothesis. The effect did not reach significance for Experiment 4, but an internal meta-analysis confirmed that assessment used for selection led evaluators to create a SES achievement gap more than assessment used for learning, thereby contributing to the reproduction of social inequalities.
... This medium effect size would suggest a sample size of 128 for the current study. Simmons, Nelson, and Simonsohn (2011) suggested at least 50 participants in each condition to reach the power of .80, which would suggest that 100 participants is sufficient for the present study. ...
In the current study, female participants’ responses to a control threat were measured by an author-generated scale on attitudes toward traditional gender roles for women and Benevolent Sexism Scale (Glick & Fiske, 1996). In a community sample (but not in a student sample), participants whose personal control was threatened were more accepting toward benevolent sexism when compared with those whose control was not threatened. Participants in the control-threat condition also tended to express more traditional gender attitudes for women. In both community and student samples, those with stronger system-justification beliefs also tended to endorse more traditional gender roles and benevolent sexism; they also tended to be less gender-equality oriented, more politically conservative, and more religious. The effects of control threat in the community sample were not mediated by gender-specific system justification or moderated by gender identification. Based on the Compensatory Control Model (CCM; Kay et al., 2009), it is possible that benevolent sexism and traditional gender roles are perceived as a source of compensating control, which is in line with the protective and caring tone implied by benevolent sexism (Glick & Fiske, 1996; 2001). The results suggest that control threat may lead women to accept the status quo and internalize gender inequality, rather than defending gender egalitarianism.
... The recent replication crisis has put low statistical power and replicability of scientific research into focus (Open Science Collaboration, 2015; Button et al., 2013). Starting from the observation that most published research results might be wrong (Ioannidis, 2005;Simmons et al., 2011), there have been several developments to improve the replicability of scientific studies (Shrout and Rodgers, 2018). One of these are registered reports, in which research projects are reviewed and conditionally accepted based on sound methodology rather than on the statistical significance of the result. ...
A common challenge in designing empirical studies is determining an appropriate sample size. When more complex models are used, estimates of power can only be obtained using Monte Carlo simulations. In this tutorial, we introduce the R package mlpwr to perform simulation-based power analysis based on surrogate modeling. Surrogate modeling is a powerful tool in guiding the search for study design parameters that imply a desired power or meet a cost threshold (e.g., in terms of monetary cost). mlpwr can be used to search for the optimal allocation when there are multiple design parameters, e.g., when balancing the number of participants and the number of groups in multilevel modeling. At the same time, the approach can take into account the cost of each design parameter, and aims to find a cost-efficient design. We introduce the basic functionality of the package, which can be applied to a wide range of statistical models and study designs. Additionally, we provide two examples based on empirical studies for illustration: one for sample size planning when using an item response theory model, and one for assigning the number of participants and the number of countries for a study using multilevel modeling.
... In regards to indirect impacts, the attribute of SC exhibited notable and adverse indirect connections with PR BT , mediated by the cognitive strategy of reappraisal and the experience of negative affect. Following the suggestions put forth by Simmons et al, 52 path analysis was conducted without incorporating any covariates into the model, resulting in insignificant disparities in the findings. ...
Purpose
The present research aims to investigate the potential correlations between self-compassion and bedtime procrastination, a significant behavior related to sleep. In this research, we put forward the hypothesis that a reduction in negative affect and the implementation of adaptative emotion regulation strategies can elucidate the established connections between self-compassion and a decreased tendency for bedtime procrastination.
Methods
Two cross-sectional online surveys (Survey I: n=241 and Survey II: n=546) were carried out via a convenient sampling method. Prior to their inclusion, all participants underwent a thorough assessment to confirm no evidence of clinical insomnia. The study participants in both survey investigations were asked to complete various psychometric assessments, including self-compassion, positive and negative affect, and bedtime procrastination; however, the study participants in Survey II additionally underwent the administration of a cognitive reappraisal assessment.
Results
In Survey I, a multiple mediation analysis was conducted to examine the mediating effects of self-compassion on reducing bedtime procrastination through a reduction in negative affect. The results supported the hypothesized relationships, indicating that self-compassion had the expected mediated effects by mitigating negative affective states. However, contrary to expectations, higher positive affect did not mediate the relationship between self-compassion and reduced bedtime procrastination. The findings of Survey II were confirmed through the utilization of path analysis. Moreover, this analysis provided additional evidence to suggest that the mechanism of cognitive reappraisal could account for the observed decrease in negative affect associated with self-compassion. The present study found a notable and sustained impact of self-compassion on reducing instances of delaying bedtime activities.
Conclusion
The present research contributes novel empirical evidence suggesting a negative association between self-compassion and the propensity to engage in bedtime procrastination. This relationship can be attributed partly to the implementation of an adaptative emotion regulation mechanism that effectively alleviates negative affect.
... Following the recommendations of Simmons et al. (2011), we also wanted to analyze if the relationship between pathological narcissistic grandiosity and involvement in feminist activism would still be found when the following, potentially influential covariates were statistically controlled for. This approach was chosen to avoid the reporting of false-positive results for our pre-registered main hypothesis which requires the reporting of alternative analyses excluding and including relevant covariates. ...
According to the dark-ego-vehicle principle (DEVP), individuals with so-called dark personalities (e.g., individuals with high narcissistic traits) are attracted to political and social activism not for the achievement of prosocial goals but to repurpose the activism to satisfy their specific ego-focused needs. In this pre-registered study, we aimed at replicating and extending previous empirical evidence for the DEVP by examining the associations of pathological narcissism with involvement in feminist activism. A diverse US sample (N = 458) completed online measures of the Pathological Narcissism Inventory and several covariates (i.e., altruism, self-identification as a feminist, and age). Paralleling previous research, higher pathological narcissistic grandiosity was found to be statistically significantly related to greater involvement in feminist activism. Unexpectedly, gender did not moderate this relationship. Also, higher pathological narcissism was related to stronger self-identification as a feminist; however, pathological narcissistic grandiosity explained some variance in the involvement in feminist activism over and above feminist self-identification. In exploratory secondary analyses, we found that higher pathological narcissism was associated with specific feminist conversational interaction behaviors (e.g., correcting other’s non-feminist language). The limitations (e.g., the relevance of other dark personality traits beyond narcissism) and the theoretical implications for the DEVP are discussed. Overall, the findings of the present study are further support for the DEVP.
Objectivity in scientific research have been a frequently discussed issue in the scientific community given that interpretivist scholars have resisted the crucial role of the positivist paradigm which dominates in social sciences as well. This paper seeks to critically consider the main criterion (or principle) of scientific knowledge – objectivity – from the standpoint of social science research. The conducted analysis shows that objectivity is not only the key tenet of quantitative research, but also is equally important in qualitative studies which are used in numerous disciplines. The main objective of this paper is, in order to avoid various threats to objective research, to conceptualize this leading sicentific principle that may enhance the methodological quality of science; for example, lack of bias, replicability, reproducibility, etc.
This Element will overview research using models to understand scientific practice. Models are useful for reasoning about groups and processes that are complicated and distributed across time and space, i.e., those that are difficult to study using empirical methods alone. Science fits this picture. For this reason, it is no surprise that researchers have turned to models over the last few decades to study various features of science. The different sections of the element are mostly organized around different modeling approaches. The models described in this element sometimes yield take-aways that are straightforward, and at other times more nuanced. The Element ultimately argues that while these models are epistemically useful, the best way to employ most of them to understand and improve science is in combination with empirical methods and other sorts of theorizing.
People tend to approach agreeable propositions with a bias toward confirmation and disagreeable propositions with a bias toward disconfirmation. Because the appropriate strategy for solving the four-card Wason selection task is to seek disconfirmation, the authors predicted that people motivated to reject a task rule should be more likely to solve the task than those without such motivation. In two studies, participants who considered a Wason task rule that implied their own early death (Study 1) or the validity of a threatening stereotype (Study 2) vastly outperformed participants who considered nonthreatening or agreeable rules. Discussion focuses on how a skeptical mindset may help people avoid confirmation bias both in the context of the Wason task and in everyday reasoning.
Some effects diminish when tests are repeated. Jonathan Schooler says being open about findings that don't make the scientific record could reveal why.
Does psi exist? D. J. Bem (2011) conducted 9 studies with over 1,000 participants in an attempt to demonstrate that future events retroactively affect people's responses. Here we discuss several limitations of Bem's experiments on psi; in particular, we show that the data analysis was partly exploratory and that one-sided p values may overstate the statistical evidence against the null hypothesis. We reanalyze Bem's data with a default Bayesian t test and show that the evidence for psi is weak to nonexistent. We argue that in order to convince a skeptical audience of a controversial claim, one needs to conduct strictly confirmatory studies and analyze the results with statistical tests that are conservative rather than liberal. We conclude that Bem's p values do not indicate evidence in favor of precognition; instead, they indicate that experimental psychologists need to change the way they conduct their experiments and analyze their data.
It is proposed that motivation may affect reasoning through reliance on a biased set of cognitive processes--that is, strategies for accessing, constructing, and evaluating beliefs. The motivation to be accurate enhances use of those beliefs and strategies that are considered most appropriate, whereas the motivation to arrive at particular conclusions enhances use of those that are considered most likely to yield the desired conclusion. There is considerable evidence that people are more likely to arrive at conclusions that they want to arrive at, but their ability to do so is constrained by their ability to construct seemingly reasonable justifications for these conclusions. These ideas can account for a wide variety of research concerned with motivated reasoning.
Perhaps the most striking aspect of gambling behavior is that people continue to gamble despite persistent failure. One reason for this persistence may be that gamblers evaluate outcomes in a biased manner. Specifically, gamblers may tend to accept wins at face value but explain away or discount losses. Experiment 1 tested this hypothesis by recording subjects' explanations of the outcomes of their bets on professional football games. The results supported the hypothesis: Subjects spent more time explaining their losses than their wins. A content analysis of these explanations revealed that subjects tended to discount their losses but "bolster" their wins Finally, subjects remembered their losses better during a recall test 3 weeks later. Experiments 2 and 3 extended this analysis by demonstrating that a manipulation of the salience or existence of a critical "fluke" play in a sporting event had a greater impact on the subsequent expectations of those who had bet on the losing team than of those who had bet on the winning team. Both the implications and the possible mechanisms underlying these biases are discussed.
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
A bstract
Do causal attributions serve the need to protect and / or enhance self‐esteem? In a recent review, Miller and Ross (1975) proposed that there is evidence for self‐serving effect in the attribution of success but not in the attribution of failure; and that this effect reflects biases in information‐processing rather than self‐esteem maintenance. The present review indicated that self‐serving effects for both success and failure are obtained in most but not all experimental paradigms. Processes which may suppress or even reverse the self‐serving effect were discussed. Most important, the examination of research in which self‐serving effects are obtained suggested that these attributions are better understood in motivational than in information‐processing terms.
Cases of clear scientific misconduct have received significant media attention recently, but less flagrantly questionable research practices may be more prevalent and, ultimately, more damaging to the academic enterprise. Using an anonymous elicitation format supplemented by incentives for honest reporting, we surveyed over 2,000 psychologists about their involvement in questionable research practices. The impact of truth-telling incentives on self-admissions of questionable research practices was positive, and this impact was greater for practices that respondents judged to be less defensible. Combining three different estimation methods, we found that the percentage of respondents who have engaged in questionable practices was surprisingly high. This finding suggests that some questionable practices may constitute the prevailing research norm.
SUMMARY In clinical trials with sequential patient entry, fixed sample size designs are unjustified on ethical grounds and sequential
designs are often impracticable. One solution is a group sequential design dividing patient entry into a number of equal-sized
groups so that the decision to stop the trial or continue is based on repeated significance tests of the accumulated data
after each group is evaluated. Exact results are obtained for a trial with two treatments and a normal response with known
variance. The design problem of determining the required size and number of groups is also considered. Simulation shows that
these normal results may be adapted to other types of response data. An example shows that group sequential designs can sometimes
be statistically superior to standard sequential designs.
When the Dartmouth football team played Princeton in 1951, much controversy was generated over what actually took place during the game. Basically, there was disagreement between the two schools as to what had happened during the game. A questionnaire designed to get reactions to the game and to learn something of the climate of opinion was administered at each school and the same motion picture of the game was shown to a sample of undergraduate at each school, followed by another questionnnaire. Results indicate that the "game" was actually many different games and that each version of the events that transpired was just as "real" to a particular person as other versions were to other people.