Content uploaded by Christopher J Soto
Author content
All content in this area was uploaded by Christopher J Soto on Mar 26, 2020
Content may be subject to copyright.
Replicability of Trait-Outcome Associations 1
Soto, C. J. (2019). How replicable are links between personality traits and consequential life
outcomes? The Life Outcomes Of Personality Replication Project. Psychological
Science, 30, 711-727.
How Replicable Are Links Between Personality Traits and Consequential Life Outcomes?
The Life Outcomes Of Personality Replication Project
Christopher J. Soto
Colby College
Corresponding Author
Christopher J. Soto, Department of Psychology, Colby College, 5550 Mayflower Hill,
Waterville, ME 04901. E-mail: christopher.soto@colby.edu.
Replicability of Trait-Outcome Associations 2
Abstract
The Big Five personality traits have been linked with dozens of life outcomes. However,
metascientific research has raised questions about the replicability of behavioral science. The
Life Outcomes Of Personality Replication (LOOPR) Project was therefore conducted to estimate
the replicability of the personality-outcome literature. Specifically, we conducted preregistered,
high-powered (median N = 1,504) replications of 78 previously published trait-outcome
associations. Overall, 87% of the replication attempts were statistically significant in the
expected direction. The replication effects were typically 77% as strong as the corresponding
original effects, which represents a significant decline in effect size. The replicability of
individual effects was predicted by the effect size and design of the original study, as well as the
sample size and statistical power of the replication. These results indicate that the personality-
outcome literature provides a reasonably accurate map of trait-outcome associations, but also
stands to benefit from efforts to improve replicability.
Keywords: Big Five; life outcomes; metascience; personality traits; replication
Replicability of Trait-Outcome Associations 3
How Replicable Are Links Between Personality Traits and Consequential Life Outcomes?
The Life Outcomes Of Personality Replication Project
Do personality characteristics reliably predict consequential life outcomes? A sizable
research literature has identified links between the Big Five personality traits and dozens of
outcomes (Ozer & Benet-Martinez, 2006; Roberts, Kuncel, Shiner, Caspi, & Goldberg, 2007).
Based on this personality-outcome literature, economists, educators, and policymakers have
proposed initiatives to promote well-being through positive personality development
(Chernyshenko, Kankaraš, & Drasgow, 2018; Kautz, Heckman, Diris, ter Weel, & Borghans,
2014; OECD, 2015; Primi, Santos, John, & De Fruyt, 2016). However, recent metascientific
research has raised questions about the replicability of behavioral science (Button et al., 2013;
Camerer et al., 2016; Cova et al., in press; Open Science Collaboration, 2015; Simmons, Nelson,
& Simonsohn, 2011; Vul, Harris, Winkielman, & Pashler, 2009). We therefore conducted the
Life Outcomes Of Personality Replication (LOOPR) Project, an effort to estimate the
replicability of the personality-outcome literature. Specifically, we attempted preregistered, high-
powered replications of 78 previously published associations between the Big Five traits and a
diverse set of consequential life outcomes.
Personality Traits and Consequential Life Outcomes
A personality trait is a characteristic pattern of thinking, feeling, or behaving that tends to
be consistent over time and across relevant situations (Allport, 1961). The world’s languages
include thousands of adjectives for describing personality, many of which can be organized in
terms of the Big Five trait dimensions: Extraversion (e.g., sociable, assertive, energetic vs. quiet,
reserved), Agreeableness (compassionate, respectful, trusting vs. rude, suspicious),
Conscientiousness (orderly, hard-working, responsible vs. disorganized, unreliable), Negative
Replicability of Trait-Outcome Associations 4
Emotionality (or Neuroticism; worrying, pessimistic, temperamental vs. calm, stable), and Open-
Mindedness (or Openness to Experience; intellectual, artistic, imaginative vs. incurious,
uncreative) (De Raad, Perugini, Hrebícková, & Szarota, 1998; Goldberg, 1993; John, Naumann,
& Soto, 2008).
The Big Five constitute the most widely used framework for conceptualizing and
measuring personality traits (Almlund, Duckworth, Heckman, & Kautz, 2011; John et al., 2008).
This scientific consensus reflects their usefulness for organizing personality-descriptive
language, as well as a substantial research literature linking them with life outcomes. The most
comprehensive literature review conducted to date summarized associations between the Big
Five and dozens of individual, interpersonal, and social institutional outcomes (Ozer & Benet-
Martinez, 2006). For example, high Extraversion has been linked with social status and
leadership capacity, Agreeableness with volunteerism and relationship satisfaction,
Conscientiousness with job performance and health, Negative Emotionality with relationship
conflict and psychopathology, and Open-Mindedness with spirituality and political liberalism.
The Replicability of Behavioral Science
Drawing on both conceptual and empirical evidence, recent metascientific research (i.e.,
the scientific study of science itself) has raised questions about the replicability of behavioral
science: the likelihood that independent researchers conducting similar studies will obtain similar
results. Conceptually, this work has focused on researcher degrees of freedom, statistical power,
and publication bias. Researcher degrees of freedom represent undisclosed flexibility in the
design, analysis, and reporting of a scientific study (Simmons et al., 2011). Statistical power is
the probability of obtaining a statistically significant result, when the effect being tested truly
exists in the population (Cohen, 1988). Publication bias occurs when journals selectively publish
Replicability of Trait-Outcome Associations 5
studies with statistically significant results, thereby producing a literature that under-represents
null results (Sterling, Rosenbaum, & Weinkam, 1995). Multiple observers have expressed
concern that much behavioral science is characterized by many researcher degrees of freedom,
modest statistical power, and strong publication bias, leading to the publication of numerous
false-positive results: statistical flukes that are unlikely to replicate (Fraley & Vazire, 2014;
Franco, Malhotra, & Simonovits, 2014; Rossi, 1990; Simmons et al., 2011; Sterling et al., 1995;
Tversky & Kahneman, 1971).
Recently, large-scale replication projects have begun to empirically test these concerns.
For example, the Reproducibility Project: Psychology (RP:P) attempted to replicate 100 studies
published in high-impact psychology journals. Despite high statistical power, the RP:P observed
a replication success rate of only 36% (when success was defined as a statistically significant
result in the expected direction), and found that the replication effects were only half as strong as
the original effects, on average (Open Science Collaboration, 2015). Similar projects in
economics and experimental philosophy have also obtained replicability estimates considerably
lower than would be expected in the absence of published false positives, although results have
varied somewhat across projects (Camerer et al., 2016; Cova et al., in press). These findings
reinforce concerns about the replicability of behavioral science, and suggest that replicability
may vary both between and within disciplines. For example, replicability appears to be higher for
original studies that (a) examined main effects rather than interactions, (b) reported intuitive
rather than surprising results, and (c) obtained a greater effect size, sample size, and strength of
evidence (Camerer et al., 2016; Cova et al., in press; Open Science Collaboration, 2015).
The LOOPR Project
Replicability of Trait-Outcome Associations 6
In sum, previous research suggests that the Big Five personality traits relate with many
consequential life outcomes, but also raises questions about the replicability of behavioral
science. We therefore conducted the LOOPR Project to estimate the replicability of the
personality-outcome literature. Specifically, we attempted to replicate 78 previously published
trait-outcome associations, and then used the replication results to test two descriptive
hypotheses. First, we hypothesized that trait-outcome associations would be less than perfectly
replicable, due to the likely presence of published false positives and biased reporting of effect
sizes. Second, we hypothesized that the replicability of the personality-outcome literature may be
greater than the estimates obtained by previous large-scale replication projects in psychology,
due to normative practices in personality research such as using relatively large samples to
examine the main effects of personality traits. We also conducted exploratory analyses to search
for predictors of replicability, tentatively hypothesizing that original studies with greater effect
size, sample size, and strength of evidence, as well as replication attempts with greater sample
size and statistical power, may yield greater replicability.
Method
The LOOPR Project was conducted in six phases, which are briefly described below. An
extended description is available in the Supplemental Online Material. Additional materials,
including coded lists of the selected trait-outcome associations, original sources, and measures,
as well as the final survey materials, preregistration protocol and revisions, data, and analysis
code are available at https://osf.io/d3xb7. This research was approved by the Colby Institutional
Review Board.
The first phase of the project was to select a set of trait-outcome associations for
replication. We selected these from a published review of the personality-outcome literature
Replicability of Trait-Outcome Associations 7
(Ozer & Benet-Martinez, 2006), whose Table 1 summarizes 86 associations between the Big
Five traits and 49 life outcomes. The author and a research assistant examined the summary
table, main text, and citations of this review to identify the empirical evidence supporting each
trait-outcome association. We then selected 78 associations, spanning all of the Big Five and 48
life outcomes, that could be feasibly replicated. These 78 hypothesized trait-outcome
associations served as the LOOPR Project’s primary units of analysis for estimating
replicability.1
The second phase was to code the empirical sources supporting each association, so that
our replication attempts could follow the original studies as closely as was feasible. We therefore
coded information about the sample, measures, analytic method, and results of one empirical
study or meta-analysis for each of the 78 trait-outcome associations, which resulted in the coding
of 38 original sources. Some sources assessed multiple traits, outcomes, sub-outcomes, or
subsamples; when results differed across these components, we coded each one separately.
Supplemental Appendix A lists citations for the 38 original sources. Detailed coding of the
original studies, including information about their samples, measures, and design, is available at
https://osf.io/mc3z7.
The third phase was to develop a survey procedure for assessing the Big Five traits and
48 selected life outcomes. We assessed personality using a brief, consensus measure of the Big
1 Previous large-scale replication projects have typically treated the individual study as the
primary unit of analysis. Because personality-outcome studies often examine multiple trait-
outcome associations, we selected the individual association as the most appropriate unit of
analysis for estimating replicability in this literature.
Replicability of Trait-Outcome Associations 8
Five: the Big Five Inventory–2 (BFI-2; Soto & John, 2017). This 60-item questionnaire uses
short phrases to assess the prototypical facets of each Big Five trait domain. The 48 target life
outcomes were assessed using a battery of measures selected to follow the original studies as
closely as was feasible. For most outcomes, this involved administering the same outcome
measure used in the original study, or a subset of the original measures. For some outcomes, it
involved adapting interview items to a questionnaire format, or constructing items based on the
information available in the original source. To conserve assessment time, lengthy outcome
measures were abbreviated to approximately six items per outcome, sampling equally across
subscales or content domains to preserve content validity. After developing this assessment
battery, we used the Qualtrics platform to construct two online surveys; each survey included the
BFI-2 and approximately half of the outcome measures. Supplemental Table S1 lists the outcome
measures used in the original studies and replications, and Supplemental Appendix B lists
citations for these measures. Detailed coding of the original and replication outcome measures is
available at https://osf.io/mc3z7, and the final LOOPR surveys can be viewed at
https://osf.io/9nzxa (Survey 1) and https://osf.io/vdb6w (Survey 2).
The fourth phase was data collection. We used the Qualtrics Online Sample service to
administer our surveys to four samples of adults (ages 18 and older; used to replicate studies that
analyzed adult community samples) and young adults (ages 18-25; used to replicate studies that
analyzed student or young-adult samples). This yielded samples of 1,559 adults and 1,550 young
adults who completed Survey 1, and samples of 1,512 adults and 1,505 young adults who
completed Survey 2. Quota sampling was used to ensure that each sample would be
approximately representative of the United States population in terms of sex, race, and ethnicity,
and that the adult samples would also be representative in terms of age, educational attainment,
Replicability of Trait-Outcome Associations 9
and household income. Participants were compensated approximately $3 per 25-minute survey.
A minimum sample size of 1,500 participants per sample was selected to maximize statistical
power within our budgetary constraints; this sample size provides power of 97.3% to detect a
small true correlation (.10), and greater than 99.9% power to detect a medium-sized (.30) or
large (.50) correlation, using two-tailed tests and a .05 significance level (Cohen, 1988).
The fifth phase was preregistration. We registered our hypotheses, design, materials, and
planned analyses on the Open Science Framework, an online platform for sharing scientific
projects (see https://osf.io/d3xb7). The preregistration protocol was submitted during data
collection, and prior to data analysis, thereby minimizing the influence of researcher degrees of
freedom.
The final phase was data analysis. Descriptive statistics for all personality and outcome
variables are presented in Supplemental Table S2. We conducted two key sets of planned
analyses and one set of exploratory analyses. The first set attempted to replicate each of the 78
hypothesized trait-outcome associations. The second set aggregated the results of these 78
replication attempts to estimate the overall replicability of the personality-outcome literature. We
examined replicability in terms of both statistical significance and effect size, using Pearson’s r
(or standardized regression coefficients when the original results could not be converted to r) as
our common effect size metric, and using Fisher’s r-to-z transformation to aggregate effects. The
final set of analyses searched for predictors of replicability by correlating indicators of
replication success with characteristics of the original study and replication attempt.
Results
Testing the Hypothesized Trait-Outcome Associations
Replicability of Trait-Outcome Associations 10
Did the trait-outcome associations replicate? Our first set of planned analyses attempted
to replicate each of the 78 hypothesized associations. For each association, we conducted a
preregistered analysis specified to parallel the original study. For outcomes that included
multiple sub-outcomes or subsamples, we conducted a separate analysis for each component,
then aggregated these results (e.g., effect size, number of statistically significant results) to the
outcome level. For analyses involving outcome measures that had been abbreviated to conserve
assessment time, we computed the observed trait-outcome associations, and also estimated the
associations that would be expected if the outcome measure had not been abbreviated.
Specifically, we used the Spearman-Brown prediction formula and Spearman disattenuation
formula to estimate the trait-outcome associations that would be expected if our outcome
measure had used the same number of items or indicators as the original study (Lord & Novick,
1968). These corrected associations address the possibility that some failures to replicate could
simply reflect the attenuated reliability and validity of the abbreviated measures.
Table 1 presents the basic results of these analyses, including the number of significance
tests conducted for each hypothesized association, the (mean) sample size, the proportion of tests
that were statistically significant (i.e., two-tailed p-value < .05) in the hypothesized direction, the
(mean) original effect size, the (mean) replication effect size, and the ratio of the replication
effect size to the original effect size. To check the robustness of these results to variations in
sample size, Table 2 presents the replication success rates that would be expected using different
sample sizes: the sample size used in the original study, a sample size 2.5 times as large as the
original study (as recommended by Simonsohn, 2015), and a sample size with 80% power to
detect the original effect size (a heuristic that is often used to plan follow-up studies). More
Replicability of Trait-Outcome Associations 11
detailed information about all of these analyses, including complete results by sub-outcome and
subsample, is available at https://osf.io/mc3z7.
The results shown in Tables 1 and 2 indicate that many of the 78 replication attempts
obtained statistically significant support for the hypothesized associations, with effect sizes
comparable to the original results. However, these tables also suggest substantial variability in
the results of the replication attempts, in terms of both statistical significance and effect size.
Replicability of Trait-Outcome Associations 12
Table 1
Summary of the Hypothesized Trait-Outcome Associations and Replication Results
Outcome
Expected trait
associations Number
of tests Replication
sample size Original
sample size Replication
success rate Replication
effect size Original
effect size Effect size
ratio
Individual outcomes
Subjective well-being
E+
4
1,559
9,131
100/100
.37/.39
.18
2.18/2.31
N–
4
1,559
7,869
100/100
.52/.54
.22
2.64/2.78
Religious beliefs and behavior
A+
2
1,550
595
100/100
.18/.19
.28
0.63/0.69
C+
2
1,550
595
100/100
.14/.15
.24
0.60/0.65
Existential/phenomenological concerns
O+
2
1,550
595
100/100
.18/.20
.35
0.50/0.56
Existential well-being
E+
1
1,550
595
100/100
.35/.37
.32
1.12/1.18
N–
1
1,550
595
100/100
.60/.63
.66
0.87/0.93
Gratitude
E+
1
1,559
1,228
100/100
.37/.37
.32
1.17/1.17
A+
1
1,559
1,228
100/100
.54/.54
.41
1.39/1.39
Forgiveness
A+
1
1,550
140
100/100
.48/.57
.58
0.79/0.97
Inspiration
E+
1
1,514
152
100/100
.39/.39
.20
2.04/2.04
O+
1
1,514
152
100/100
.35/.35
.43
0.80/0.80
Humor
A+
1
1,550
169
100/100
.16/.16
—
—
N–
1
1,550
169
100/100
.13/.13
—
—
Heart disease
A–
1
1,235
1,108
0/0
.04/.04
.15
0.24/0.24
Risky behavior
C–
15
1,336
826
72/72
.08/.08
.26
0.31/0.31
Coping
E+
2
1,505
672
50/100
.17/.19
.16
1.04/1.19
N–
2
1,505
672
100/100
.21/.24
.16
1.32/1.50
Resilience
E+
1
1,505
138
100/100
.18/.18
.19
0.96/0.96
Substance abuse
C–
1
1,505
468
100/100
.06/.06
.25
0.25/0.25
O+
1
1,505
468
0/0
.02/.02
.18
0.12/0.12
Anxiety
N+
1
1,505
468
100/100
.31/.31
.34
0.90/0.90
Depression
E–
1
1,505
468
100/100
.13/.13
.42
0.28/0.28
N+
1
1,505
468
100/100
.31/.31
.46
0.64/0.64
Personality disorders
E(+/–)
4
1,505
194
75/75
.30/.41
.43
0.66/0.93
A–
3
1,505
194
100/100
.42/.58
.44
0.95/1.40
Replicability of Trait-Outcome Associations 13
C(+/–)
5
1,505
194
100/100
.30/.42
.41
0.71/1.03
N(+/–)
4
1,505
194
100/100
.31/.41
.38
0.82/1.11
Identity achievement
C+
1
1,550
198
100/100
.23/.25
.30
0.75/0.83
Identity foreclosure
O–
1
1,550
198
100/100
.33/.35
.50
0.63/0.66
Identity integration/consolidation
N–
1
804
111
100/100
.47/.57
.22
2.31/2.86
O+
1
804
111
100/100
.21/.25
.27
0.77/0.92
Ethnic culture identification
(for minorities)
C+
1
181
164
100/100
.18/.18
.20
0.91/0.91
Majority culture identification
(for minorities)
E+
1
181
164
0/0
.10/.10
.35
0.28/0.28
O+
1
181
164
0/0
.12/.12
.28
0.41/0.41
Interpersonal outcomes
Family satisfaction
C+
2
1,466
980
0/0
-.07/-.08
.11
-0.69/-0.77
N–
1
1,489
980
100/100
.17/.19
.10
1.74/1.89
Peers’ acceptance and friendship
E+
1
1,549
418
100/100
.35/.35
.41
0.84/0.84
Dating variety
E+
1
1,284
418
100/100
.12/.12
.17
0.73/0.73
Attractiveness
E+
1
1,550
418
100/100
.33/.33
.24
1.39/1.39
Peer status
E+
2
775
37
100/100
.39/.39
.41
0.93/0.93
Peer status (men)
N–
1
749
42
100/100
.31/.31
.43
0.69/0.69
Romantic satisfaction
E+
2
795
210
100/100
.15/.18
.28
0.53/0.63
N–
2
795
210
100/100
.20/.23
.32
0.62/0.73
Romantic satisfaction (dating couples)
A+
1
757
272
100/100
.18/.22
.35
0.51/0.63
C+
1
757
272
100/100
.16/.19
.35
0.44/0.53
Romantic conflict
N+
1
1,154
712
0/0
.01/.01
.32
0.02/0.02
Romantic abuse
N+
1
1,154
712
100/100
.09/.09
.25
0.35/0.37
Romantic dissolution
N+
1
1,098
n.r.
100/100
.10/.10
.21
0.45/0.45
Social institutional outcomes
Investigative occupational interests
O+
1
1,503
725
100/100
.15/.16
.25
0.58/0.63
Artistic occupational interests
O+
1
1,503
725
100/100
.41/.43
.30
1.39/1.51
Social occupational interests
E+
1
1,503
725
100/100
.15/.17
.16
0.96/1.05
A+
1
1,503
725
100/100
.08/.09
.11
0.77/0.84
Enterprising occupational interests
E+
1
1,503
725
100/100
.18/.20
.16
1.14/1.23
Occupational performance
C–
3
829
2,058
33/33
.03/.03
.11
0.31/0.31
Occupational satisfaction
E+
1
747
12,023
100/100
.19/.21
.18
1.09/1.17
N–
1
747
13,500
100/100
.17/.18
.23
0.72/0.77
Replicability of Trait-Outcome Associations 14
Occupational commitment
E+
1
748
492
100/100
.32/.32
.17
1.96/1.96
N–
1
748
713
100/100
.26/.26
.19
1.38/1.38
Extrinsic success
A–
1
481
194
100/100
.15/.15
.24
0.63/0.63
C+
1
481
194
0/0
-.07/-.07
.50
-0.13/-0.13
N–
1
481
194
100/100
.10/.10
.34
0.28/0.28
Intrinsic success
C+
1
512
194
100/100
.24/.25
.20
1.22/1.25
N–
1
512
194
100/100
.31/.32
.26
1.20/1.24
Job attainment
A+
1
838
859
0/0
-.02/-.02
.19
-0.09/-0.09
Occupational involvement
E+
1
944
859
100/100
.17/.17
.18
0.93/0.95
Financial security
N–
1
944
859
100/100
.33/.33
.22
1.52/1.52
Right-wing authoritarianism
O–
1
1,549
424
100/100
.29/.32
.35
0.80/0.92
Conservatism
C+
1
1,559
93
100/100
.14/.18
.24
0.56/0.75
O–
1
1,550
1,648
100/100
.17/.25
.34
0.49/0.74
Volunteerism
E+
1
1,504
796
100/100
.20/.20
.14
1.41/1.41
A+
1
1,504
796
100/100
.17/.17
.23
0.74/0.74
Leadership
E+
1
747
169
100/100
.45/.47
.22
2.16/2.28
A+
1
747
169
100/100
.27/.28
.27
1.00/1.05
Antisocial behavior
C–
1
1,550
187
100/100
.26/.29
.28
0.92/1.04
N+
1
1,550
187
100/100
.06/.07
.28
0.20/0.23
Criminal behavior
A–
1
1,550
197
100/100
.23/.23
.20
1.14/1.17
C–
1
1,550
197
100/100
.18/.19
.31
0.58/0.59
Note. E = Extraversion. A = Agreeableness. C = Conscientiousness. N = Negative Emotionality. O = Open-Mindedness. + = Hypothesized positive
association. – = Hypothesized negative association. n.r. = Not reported. For replication success rate, replication effect size, and effect size ratio,
values left of the forward slash represent the observed trait-outcome associations, and values right of the slash represent the corrected associations.
All effect sizes are in the correlation metric or standardized regression coefficient metric, and oriented so that positive values represent effects in the
hypothesized direction. For outcomes that include multiple sub-outcomes or subsamples, results are aggregated within each outcome. Mean effect
sizes and effect size ratios were computed using Fisher’s r-to-z transformation.
Replicability of Trait-Outcome Associations 15
Table 2
Obtained and Expected Replication Success Rates for Varying Sample Sizes
Replication success rate
Outcome
Expected
trait
associations Number
of tests Replication
sample size Original
sample size Original sample
size × 2.5
Sample size with
80% power
Individual outcomes
Subjective well-being
E+
4
100/100
100/100
100/100
100/100
N–
4
100/100
100/100
100/100
100/100
Religious beliefs and behavior
A+
2
100/100
100/100
100/100
50/50
C+
2
100/100
100/100
100/100
50/50
Existential/phenomenological concerns
O+
2
100/100
100/100
100/100
0/0
Existential well-being
E+
1
100/100
100/100
100/100
100/100
N–
1
100/100
100/100
100/100
100/100
Gratitude
E+
1
100/100
100/100
100/100
100/100
A+
1
100/100
100/100
100/100
100/100
Forgiveness
A+
1
100/100
100/100
100/100
100/100
Inspiration
E+
1
100/100
100/100
100/100
100/100
O+
1
100/100
100/100
100/100
100/100
Humor
A+
1
100/100
100/100
100/100
n.a.
N–
1
100/100
0/0
100/100
n.a.
Heart disease
A–
1
0/0
0/0
0/0
0/0
Risky behavior
C–
15
72/72
61/61
89/89
33/33
Coping
E+
2
50/100
50/50
50/100
50/50
N–
2
100/100
100/100
100/100
100/100
Resilience
E+
1
100/100
100/100
100/100
100/100
Substance abuse
C–
1
100/100
0/0
100/100
0/0
O+
1
0/0
0/0
0/0
0/0
Anxiety
N+
1
100/100
100/100
100/100
100/100
Depression
E–
1
100/100
100/100
100/100
0/0
Replicability of Trait-Outcome Associations 16
N+
1
100/100
100/100
100/100
0/0
Personality disorders
E(+/–)
4
75/75
75/75
75/75
25/75
A–
3
100/100
100/100
100/100
100/100
C(+/–)
5
100/100
100/100
100/100
60/80
N(+/–)
4
100/100
100/100
100/100
50/100
Identity achievement
C+
1
100/100
100/100
100/100
100/100
Identity foreclosure
O–
1
100/100
100/100
100/100
0/0
Identity integration/consolidation
N–
1
100/100
100/100
100/100
100/100
O+
1
100/100
100/100
100/100
100/100
Ethnic culture identification
(for minorities)
C+
1
100/100
100/100
100/100
100/100
Majority culture identification
(for minorities)
E+
1
0/0
0/0
100/100
0/0
O+
1
0/0
0/0
100/100
0/0
Interpersonal outcomes
Family satisfaction
C+
2
0/0
0/0
0/0
0/0
N–
1
100/100
100/100
100/100
100/100
Peers’ acceptance and friendship
E+
1
100/100
100/100
100/100
100/100
Dating variety
E+
1
100/100
100/100
100/100
100/100
Attractiveness
E+
1
100/100
100/100
100/100
100/100
Peer status
E+
2
100/100
100/100
100/100
100/100
Peer status (men)
N–
1
100/100
100/100
100/100
0/0
Romantic satisfaction
E+
2
100/100
100/100
100/100
50/50
N–
2
100/100
50/50
100/100
50/50
Romantic satisfaction (dating couples)
A+
1
100/100
100/100
100/100
0/0
C+
1
100/100
100/100
100/100
0/0
Romantic conflict
N+
1
0/0
0/0
0/0
0/0
Romantic abuse
N+
1
100/100
100/100
100/100
0/0
Romantic dissolution
N+
1
100/100
n.a.
n.a.
0/0
Social institutional outcomes
Investigative occupational interests
O+
1
100/100
100/100
100/100
0/0
Artistic occupational interests
O+
1
100/100
100/100
100/100
100/100
Social occupational interests
E+
1
100/100
100/100
100/100
100/100
A+
1
100/100
100/100
100/100
100/100
Replicability of Trait-Outcome Associations 17
Enterprising occupational interests
E+
1
100/100
100/100
100/100
100/100
Occupational performance
C–
3
33/33
33/33
67/67
33/33
Occupational satisfaction
E+
1
100/100
100/100
100/100
100/100
N–
1
100/100
100/100
100/100
100/100
Occupational commitment
E+
1
100/100
100/100
100/100
100/100
N–
1
100/100
100/100
100/100
100/100
Extrinsic success
A–
1
100/100
100/100
100/100
0/0
C+
1
0/0
0/0
0/0
0/0
N–
1
100/100
0/0
100/100
0/0
Intrinsic success
C+
1
100/100
100/100
100/100
100/100
N–
1
100/100
100/100
100/100
100/100
Job attainment
A+
1
0/0
0/0
0/0
0/0
Occupational involvement
E+
1
100/100
100/100
100/100
100/100
Financial security
N–
1
100/100
100/100
100/100
100/100
Right-wing authoritarianism
O–
1
100/100
100/100
100/100
100/100
Conservatism
C+
1
100/100
0/0
100/100
0/100
O–
1
100/100
100/100
100/100
0/100
Volunteerism
E+
1
100/100
100/100
100/100
100/100
A+
1
100/100
100/100
100/100
100/100
Leadership
E+
1
100/100
100/100
100/100
100/100
A+
1
100/100
100/100
100/100
100/100
Antisocial behavior
C–
1
100/100
100/100
100/100
100/100
N+
1
100/100
0/0
0/0
0/0
Criminal behavior
A–
1
100/100
100/100
100/100
100/100
C–
1
100/100
100/100
100/100
0/0
Note. Replication sample size = Sample size obtained in the replication study (cf. Table 1). Original sample size = Sample size obtained in the
original study. Original sample size × 2.5 = Sample size 2.5 times as large as the original study (cf. Simonsohn, 2015). Sample size with 80% power
= Sample size required to provide 80% statistical power to detect the original effect size. E = Extraversion. A = Agreeableness. C =
Conscientiousness. N = Negative Emotionality. O = Open-Mindedness. + = Hypothesized positive association. – = Hypothesized negative
association. n.a. = Not applicable, because required information was not available from the original study. For replication success rates, values left of
the forward slash represent the observed trait-outcome associations, and values right of the slash represent the corrected associations. For outcomes
that include multiple sub-outcomes or subsamples, results are aggregated within each outcome.
Replicability of Trait-Outcome Associations 18
Testing Overall Replicability
How replicable is the personality-outcome literature, overall? Our second set of planned
analyses addressed this question by aggregating the results of the 78 replication attempts
summarized in Table 1. These analyses compared the results of the LOOPR Project with two
benchmarks: (a) the results that would be expected if all of the original findings represented true
effects (i.e., if the personality-outcome literature did not include any false positive results), and
(b) the results of a previous large-scale replication project conducted to estimate the overall
replicability of psychological science, the RP:P (Open Science Collaboration, 2015).2
We began by examining the rate of successful replication, defined simply as the
proportion of replication attempts that yielded statistically significant results in the hypothesized
direction. The results of this analysis are presented in Figure 1. Across the 76 trait-outcome
associations with an original effect size available for power analysis, the present research
obtained successful replication rates of 87.2% (66.3 successes, 95% CI [79.7%, 94.7%]) in tests
of the observed associations, and 87.9% (66.8 successes, 95% CI [80.6%, 95.2%]) after partially
correcting for the unreliability of abbreviated outcome measures. These success rates were
significantly lower than the rate of 99.3% (75.5 successes, 95% CI [97.4%, 100.0%]) expected
from power analyses of the original effect sizes and replication sample sizes (for observed
associations, χ2(1) = 8.79, p = .003; for corrected associations, χ2(1) = 8.23, p = .004). However,
they were significantly higher than the success rate of 36.1% (35 successes in 97 attempts, 95%
2 Because some of our replication attempts were dependent (due to a shared Big Five trait) rather
than independent, or aggregated results across multiple sub-outcomes or subsamples, the p-
values for these analyses should be considered approximate rather than exact.
Replicability of Trait-Outcome Associations 19
CI [26.5%, 45.6%]) obtained in the RP:P (for observed associations, χ2(1) = 45.96, p < .001; for
corrected associations, χ2(1) = 47.25, p < .001). These significant differences from the RP:P also
held for the complete set of 78 trait-outcome associations, with success rates of 87.6% (68.3
successes, 95% CI [80.2%, 94.9%]) for the observed associations, and 88.2% (68.8 successes,
95% CI [81.1%, 95.4%]) for the corrected associations (for observed associations, χ2(1) = 47.39,
p < .001; for corrected associations, χ2(1) = 48.69, p < .001).
The results presented in Table 2 indicate that these findings were also fairly robust to
variations in sample size. Specifically, the expected replication success rates would be 80.9%
(60.7 successes in 75 attempts, 95% CI [72.0%, 89.8%]) when using the same sample size as the
original study,3 89.1% (66.8 successes in 75 attempts, 95% CI [82.0%, 96.1%]) when using a
sample size 2.5 times as large as the original study, and 59.9% (45.5 successes in 76 attempts,
95% CI [48.9%, 70.9%]) when using a sample size that provides 80% statistical power to detect
the original effect. After partially correcting for unreliability, these expected success rates were
80.9% (60.7 successes, 95% CI [72.0%, 89.8%]), 89.7% (67.3 successes, 95% CI [82.9%,
96.6%]), and 64.1% (48.7 successes, 95% CI [53.3%, 74.9%]), respectively. All of these success
rates were significantly lower than would be expected from power analyses (all χ2(1) ≥ 4.77, p ≤
.029), but significantly higher than those obtained in the RP:P (all χ2(1) ≥ 9.71, p ≤ .002).
Next, we examined the frequency with which the replication attempts obtained a trait-
outcome association weaker than the corresponding original effect, or not in the expected
direction. Across the 76 trait-outcome associations with an original effect size available for
comparison, the observed replication effect was weaker than the original effect 71.1% of the time
3 The original sample size was not available for one outcome.
Replicability of Trait-Outcome Associations 20
(54 cases, 95% CI [60.9%, 81.2%]); after partially correcting for the unreliability of abbreviated
outcome measures, the rate was 63.2% (48 cases, 95% CI [52.3%, 74.0%]). Binomial tests
indicated that both of these rates were significantly higher than the 50% rate that would be
expected if all of the original effect sizes represented true effects (for observed associations, p <
.001; for corrected associations, p = .029). However, Fisher’s exact tests indicated that the rate of
weaker replication effects obtained in the present research was less than the corresponding rate
of 82.8% (82 of 99 cases, 95% CI [75.4%, 90.3%]) obtained in the RP:P, and that this difference
was significant after correcting for unreliability (for observed associations, p = .070; for
corrected associations, p = .005).
Focusing on cases where the observed replication effect was substantially weaker than
the original effect (i.e., the z-transformed replication effect was at least 0.10 less than the
transformed original effect; Cohen, 1988), or not in the expected direction, yielded a similar
pattern of results. In the present research, the observed replication effect was substantially
weaker than the original effect 42.1% of the time (32 of 76 cases, 95% CI [31.0%, 53.2%]); after
correcting for unreliability, the rate was 30.3% (23 of 76 cases, 95% CI [19.9%, 40.6%]).
Fisher’s exact tests indicated that both of these rates were significantly lower than the
corresponding rate of 69.1% (67 of 97 cases, 95% CI [59.9%, 78.3%]) obtained in the RP:P (for
observed associations, p = .001; for corrected associations, p < .001).
Finally, we tested whether the mean and median of the z-transformed replication effect
sizes differed from the transformed original effect sizes, and whether the median effect size ratio
(i.e., the ratio of the replication effect size to the original effect size) differed between the present
research and the RP:P. Paired-samples t-tests indicated that the mean original effect size of 0.29
(95% CI [.26, .32]) was significantly stronger than both the mean observed replication effect of
Replicability of Trait-Outcome Associations 21
0.23 (95% CI [.20, .27], t(75) = 3.46, p = .001) and the mean corrected replication effect of 0.26
(95% CI [.22, .29], t(75) = 2.06, p = .043). Similarly, Wilcoxon signed-rank tests indicated that
the median original effect of 0.27 (95% CI [.23, .31]) was significantly stronger than the median
observed replication effect of 0.19 (95% CI [.17, .26], z = 3.59, p < .001) and the median
corrected replication effect of 0.22 (95% CI [.18, .27], z = 2.40, p = .016). However, Mann-
Whitney U tests indicated that the median effect size ratios of 0.77 (95% CI [.63, .92]) for
observed trait-outcome associations and 0.87 (95% CI [.73, .97]) for corrected associations
obtained in the present research were both significantly greater than the corresponding median
ratio of 0.43 (95% CI [,28, .62]) obtained in the RP:P (for observed effects, z = 4.22, p < .001;
for corrected effects, z = 4.86, p < .001). The results of this analysis, presented in Figure 2,
indicate that the replication effects obtained in the LOOPR Project were typically about 80% as
large as the corresponding original effects.
Taken together, these results support our hypothesis that the personality-outcome
literature is less replicable than would be expected if it did not include any false positive results,
but more replicable than the broader set of psychology studies examined by the RP:P. This
conclusion held whether replicability was assessed in terms of statistical significance or effect
size.
Replicability of Trait-Outcome Associations 22
Figure 1. Replication success rates obtained in the LOOPR Project, compared with the rate
expected from power analyses of the original effect size and replication sample size, and the rate
obtained in the Reproducibility Project: Psychology. A successful replication was defined as a
statistically significant effect (i.e., two-tailed p-value < .05) in the hypothesized direction.
Corrected associations were partially disattenuated to correct for the unreliability of abbreviated
outcome measures. Error bars represent 95% confidence limits.
Replicability of Trait-Outcome Associations 23
Figure 2. Median effect size ratios obtained in the LOOPR Project, compared with the ratio
expected if all original effect sizes represented true effects, and the median ratio obtained in the
Reproducibility Project: Psychology. Effect size ratios were computed as the ratio of the z-
transformed replication effect size to the transformed original effect size. Corrected associations
were partially disattenuated to correct for the unreliability of abbreviated outcome measures.
Error bars represent 95% confidence limits.
Replicability of Trait-Outcome Associations 24
Predictors of Replicability
What factors might influence the replicability of a trait-outcome association? Our final,
exploratory set of analyses searched for predictors of replicability. Specifically, we computed
Spearman’s rank correlations (ρ) across the set of 78 hypothesized trait-outcome associations to
correlate three characteristics of the original studies (effect size, sample size, and obtained p-
value4), two characteristics of the replication attempts (sample size and statistical power to detect
the original effect), and three aspects of similarity between the original study and the replication
attempt (whether the outcome was measured using the same indicators, the same data source, and
the same assessment timeline), with five indicators of replicability (statistical significance of the
replication effect, replication effect size, whether the replication effect was stronger than the
original effect, whether the replication effect was not substantially weaker than the original
effect, and ratio of the replication effect size to the original effect size).
These correlations, presented in Table 3, suggest three noteworthy patterns. First, the
original effect size positively predicted the replication effect size (for observed effects, ρ(74) =
.34, 95% CI [.12, .53], p = .002; for corrected effects, ρ(74) = .39, 95% CI [.18, .58], p < .001).
The original effect size also negatively predicted the likelihood that the replication effect would
be stronger than the original effect (for observed effects, ρ(74) = -.40, 95% CI [-.58, -.18], p <
.001; for corrected effects, ρ(74) = -.30, 95% CI [-.49, -.07], p = .009), as well as the likelihood
that the replication effect would not be substantially weaker than the original effect (for observed
effects, ρ(74) = -.40, 95% CI [-.58, -.18], p < .001; for corrected effects, ρ(74) = -.22, 95% CI [-
4 Because many original studies did not report exact p-values, we estimated these from the
reported effect size and degrees of freedom.
Replicability of Trait-Outcome Associations 25
.43, .01], p = .053). This pattern, illustrated in Figure 3, indicates that strong original effects were
more likely to yield strong replication effects, but also provided more room for the replication
effect to be weaker than the original effect.
The second noteworthy pattern was that the likelihood of successful replication (i.e., a
statistically significant effect in the hypothesized direction) was positively predicted by the
statistical power (for observed effects, ρ(74) = .37, 95% CI [.15, .55], p = .001; for corrected
effects, ρ(74) = .33, 95% CI [.11, .52], p = .003) and sample size (for observed effects, ρ(76) =
.25, 95% CI [.03, .45], p = .026; for corrected effects, ρ(76) = .27, 95% CI [.05, .47], p = .017) of
the replication attempt. This pattern likely reflects the influence of sample size on statistical
significance, especially when attempting to detect small effects.
The final pattern was that the replication effect size and the effect size ratio were both
positively predicted by whether the original study and the replication measured the target
outcome using the same items or indicators, as well as the same data source and format (i.e., a
self-report questionnaire) (all ρ ≥ .19, all p ≤ .107, see Table 3 for 95% CIs). This pattern,
although weaker and less consistent than the previous two, indicates that replications using
assessment methods more similar to the original studies tended to obtain trait-outcome
associations that were somewhat stronger and more comparable to the original effects.
Taken together, the results presented in Table 3 and Figure 3 suggest that the predictors
of replicability vary depending on how replicability is indexed: original effect size was the best
predictor of replication effect size, whereas replication power and sample size were the best
predictors of statistical significance. However, the conclusions that can be drawn from these
results should be tempered by the limited variability of some predictors (e.g., replication sample
Replicability of Trait-Outcome Associations 26
size and statistical power were generally quite high) and some replicability indicators (e.g.,
relatively few replication effects were not statistically significant).
Replicability of Trait-Outcome Associations 27
Table 3
Predictors of Replicability across the 78 Hypothesized Trait-Outcome Associations
Replication success
Replication
effect size
Replication
effect stronger
Replication effect not
substantially weaker
Effect size ratio
Observed
Corrected
Observed
Corrected
Observed
Corrected
Observed
Corrected
Observed
Corrected
Original study characteristics
Effect size
.12
.07
.34
**
.39
***
-.40
***
-.30
**
-.40
***
-.22
-.26
*
-.22
[-.11,
.33]
[-.15,
.29]
[.12,
.53]
[.18,
.58]
[-.58,
-.18]
[-.49,
-.07]
[-.58,
-.18]
[-.43,
.01]
[-.46,
-.03]
[-.43,
.01]
Sample size
-.13
-.12
-.17
-.18
.26
*
.14
.10
.05
.05
.04
[-.35,
.10]
[-.33,
.11]
[-.38,
.06]
[-.39,
.05]
[.03,
.46]
[-.09,
.36]
[-.13,
.32]
[-.18,
.28]
[-.18,
.27]
[-.19,
.26]
P-value
.09
.08
.02
-.01
.07
-.02
.14
.09
.16
.13
[-.14,
.31]
[-.15,
.30]
[-.21,
.25]
[-.24,
.22]
[-.16,
.29]
[-.25,
.21]
[-.10,
.35]
[-.14,
.31]
[-.07,
.37]
[-.10,
.35]
Replication characteristics
Sample size
.25
*
.27
*
.29
*
.31
**
.07
.09
.02
.20
.14
.17
[.03,
.45]
[.05,
.47]
[.06,
.48]
[.08,
.50]
[-.16,
.29]
[-.14,
.31]
[-.21,
.24]
[-.03,
.41]
[-.09,
.35]
[-.06,
.39]
Statistical power
.37
**
.33
**
.27
*
.30
**
-.21
-.09
-.12
.03
-.05
-.03
[.15,
.55]
[.11,
.52]
[.05,
.47]
[.08,
.50]
[-.41,
.02]
[-.31,
.14]
[-.34,
.11]
[-.20,
.25]
[-.28,
.17]
[-.26,
.20]
Similarity of original study and replication
Outcome indicators
-.04
-.05
.24
*
.19
.14
.08
.23
*
.17
.24
*
.19
[-.26,
.19]
[-.27,
.18]
[.01,
.44]
[-.03,
.40]
[-.09,
.36]
[-.14,
.30]
[.00,
.43]
[-.06,
.38]
[.02,
.45]
[-.04,
.40]
Outcome data source
.16
.19
.25
*
.29
*
.09
.11
.06
.27
*
.19
.23
*
[-.07,
.37]
[-.04,
.40]
[.03,
.45]
[.06,
.48]
[-.13,
.31]
[-.12,
.33]
[-.17,
.28]
[.05,
.47]
[-.04,
.40]
[.00,
.44]
Assessment timeline
.03
.04
-.04
.00
-.16
-.06
-.20
-.07
-.19
-.16
[-.20,
.25]
[-.18,
.26]
[-.26,
.18]
[-.23,
.22]
[-.37,
.07]
[-.28,
.17]
[-.41,
.03]
[-.29,
.16]
[-.40,
.03]
[-.38,
.07]
Note. *p < .05. **p < .01. ***p < .001. N = 73 to 78. Values are Spearman’s rank correlations. Values in brackets are 95% confidence limits. Replication success =
Replication effect was statistically significant in the hypothesized direction. Replication effect stronger = Replication effect was in the hypothesized direction and
stronger than the original effect. Replication effect not substantially weaker = Replication effect was in the hypothesized direction and not substantially weaker
than the corresponding original effect (i.e., Cohen’s q > -.10). Effect size ratio = Ratio of the z-transformed replication effect to the transformed original effect.
Observed = Analyses of observed trait-outcome associations. Corrected = Analyses of trait-outcome associations partially corrected for the unreliability of
abbreviated outcome measures. Outcome indicators = Whether the original study and replication used the same items or indicators to measure the outcome (1 =
Both used the same indicators; 0.5 = Replication used a subset of the original indicators; 0 = Replication used different indicators). Outcome data source =
Whether the original study and replication used the same data source and format to measure the outcome (1 = Both used self-report questionnaire data; 0.5 =
Original study used either self-report or questionnaire data. 0 = Original study used neither self-report nor questionnaire data). Assessment timeline = Whether the
original study and replication used the same timeline to assess the trait and outcome (1 = Both used concurrent assessment of the trait and outcome. 0.5 = Original
study aggregated results from concurrent and non-concurrent assessments. 0 = Original study did not assess the trait and outcome concurrently.)
Replicability of Trait-Outcome Associations 28
Figure 3. Scatterplot of the z-transformed original and (observed) replication effect sizes, by
success of the replication attempt. Successful replication = Replication effect was statistically
significant in the hypothesized direction. Unsuccessful replication = Replication effect was not
statistically significant or not in the hypothesized direction. Partial replication = Replication was
successful for some sub-outcomes or subsamples, but not for others. The solid, diagonal line
represents replication effect sizes equal to the original effect sizes. The dashed, horizontal line
represents a replication effect size of 0, and points below this line represent replication effects
that were not in the hypothesized direction.
Replicability of Trait-Outcome Associations 29
Discussion
The LOOPR Project was conducted to estimate the replicability of the personality-
outcome literature by attempting preregistered, high-powered replications of 78 previously
published trait-outcome associations. When replicability was defined in terms of statistical
significance, we successfully replicated 87% of the hypothesized effects, or 88% after partially
correcting for the unreliability of abbreviated outcome measures. A replication effect was
typically 77% as strong as the corresponding original effect, or 87% after correcting for
unreliability. Moreover, the statistical significance of a replication attempt was best predicted by
the sample size and statistical power of the replication, whereas the strength of a replication
effect was best predicted by the original effect size.
These results can be interpreted either optimistically or pessimistically. An optimistic
interpretation is that replicability estimates of 77% to 88% (across statistical significance and
effect size criteria) are fairly high. These findings suggest that the extant personality-outcome
literature provides a reasonably accurate map of how the Big Five traits relate with consequential
life outcomes (Ozer & Benet-Martinez, 2006). In contrast, a pessimistic interpretation is that our
replicability estimates are lower than would be expected if all the originally published findings
were unbiased estimates of true effects. This suggests that the personality-outcome literature
includes some false-positive results, and that reported effect sizes may be inflated by researcher
degrees of freedom and publication bias. Thus personality psychology, like other areas of
behavioral science, stands to benefit from efforts to improve replicability by constraining
researcher degrees of freedom, increasing statistical power, and reducing publication bias. Taken
together, these interpretations leave us cautiously optimistic about the current state and future
prospects of the personality-outcome literature (cf. Nelson, Simmons, & Simonsohn, 2018).
Replicability of Trait-Outcome Associations 30
Compared with previous large-scale replication projects in the behavioral sciences, the
LOOPR Project obtained relatively high replicability estimates. Why was this? When evaluating
replicability in terms of statistical significance, one likely contributor to our high success rates
was the large sample size (median N = 1,504) and correspondingly high statistical power
(median > 99.9%) of the replication attempts. When evaluating replicability in terms of relative
effect size, we speculate that the relatively high estimates obtained here may reflect
methodological norms in personality-outcome research, which typically examines the main
effects of traits using samples of several hundred participants and standardized measures (Fraley
& Vazire, 2014; Open Science Collaboration, 2015; Simmons et al., 2011). However, we note
that comparisons between replication projects should be tempered by the fact that different
projects have used different approaches to select the original studies and design the replication
attempts. Additional research is clearly needed to further investigate variation in replicability
across scientific disciplines and research literatures.
The present findings also have implications for understanding why replication attempts in
the behavioral sciences might generally succeed or fail. Failures to replicate are sometimes
attributed to unmeasured moderators: subtle differences between the original study and the
replication attempt that cause an effect to be observed in the former but not the latter (e.g.,
Stroebe & Strack, 2014). In the LOOPR Project, there were unavoidable differences between the
original studies and the replication attempts in terms of historical context (original studies
conducted from the 1980s to 2000s vs. replication in 2017), local context (many original research
sites vs. national American samples), sampling method (mostly student or community samples
vs. survey panels), administration method (mostly in-person surveys or interviews vs. online
surveys), and personality measures (many original measures vs. the BFI-2). The relatively high
Replicability of Trait-Outcome Associations 31
replicability estimates obtained despite these differences converge with previous results
suggesting that unmeasured moderators are not generally powerful enough to explain many
failures to replicate (Ebersole et al., 2016; Klein et al., 2014).
Strengths, Limitations, and Future Directions
The LOOPR Project had a number of important strengths, including its broad sample of
life outcomes, representative samples, preregistered design, and high statistical power. However,
it also had some noteworthy limitations that suggest promising directions for future research.
Most notably, all of the present data come from cross-sectional, self-report surveys completed by
online research panels, whereas some of the original studies used longitudinal designs or other
data sources (e.g., interviews, informant-reports, community samples). Indeed, our analyses of
replicability predictors indicated that replication effect sizes tended to be somewhat stronger
when the original study had also used a self-report survey to measure the target outcome. Thus,
the present research is only a first step toward establishing the replicability of these trait-outcome
associations, and future research using longitudinal designs, as well as alternative sampling and
assessment methods, is clearly needed.
A broader issue is that large-scale replication projects can be conducted using different
approaches (Tackett, McShane, Bockenholt, & Gelman, 2017). Any particular approach will
have advantages and disadvantages, and the choice of an optimal approach will depend on the
goals of a particular project. The main goal of the LOOPR Project was to estimate the overall
replicability of the personality-outcome literature. We therefore adopted a many-studies
approach that attempted to replicate a large number of original effects, with one replication
attempt per effect and relatively brief outcome measures (cf. Camerer et al., 2016; Cova et al., in
press; Open Science Collaboration, 2015). An alternative approach would be to replicate a
Replicability of Trait-Outcome Associations 32
smaller number of effects, with lengthier measures or multiple replication attempts per effect
(i.e., a many-labs approach; cf. Ebersole et al., 2016; Hagger et al., 2016; Klein et al., 2014).
Such an approach would be less well suited for estimating the overall replicability of a literature,
but better suited for achieving other goals. For example, future research can complement the
LOOPR Project by testing individual trait-outcome associations more robustly, and by directly
investigating factors—such as location, sampling method, mode of administration, measures, and
analytic method—that might moderate these associations.
Conclusion
The results of the LOOPR Project provide grounds for cautious optimism about the
personality-outcome literature. Optimism, because we successfully replicated most of the
hypothesized trait-outcome associations, with many replication effect sizes comparable to the
original effects. Caution, because these replicability estimates were lower than would be
expected in the absence of published false positives. We therefore conclude that the extant
literature provides a reasonably accurate map of how the Big Five personality traits relate with
consequential life outcomes, but that personality psychology still stands to gain from ongoing
efforts to improve the replicability of behavioral science.
Replicability of Trait-Outcome Associations 33
Author Contributions
Christopher J. Soto designed the study, collected and analyzed the data, and drafted and
revised the manuscript.
Acknowledgements
The author thanks Alison Russell and Samantha Rizzo for their assistance with this
research.
Declaration of Conflicting Interests
Christopher J. Soto is a copyright holder for the Big Five Inventory–2 (BFI-2), which was
used in the present research. The BFI-2 is freely available for research use at
http://www.colby.edu/psych/personality-lab.
Funding
This research was supported by a faculty research grant from Colby College to
Christopher J. Soto.
Open Practices
Supporting materials, including the preregistration protocol and revisions, list of original
effects selected for replication, materials, data, analysis code, and detailed results are publicly
available at https://osf.io/d3xb7.
Replicability of Trait-Outcome Associations 34
References
Allport, G. W. (1961). Pattern and growth in personality. Oxford, England: Holt, Reinhart &
Winston.
Almlund, M., Duckworth, A. L., Heckman, J., & Kautz, T. (2011). Personality psychology and
economics. In E. A. Hanushek, S. Machin, & L. Woessmann (Eds.), Handbook of the
economics of education (Vol. 4, pp. 1-181). Amsterdam, Netherlands: Elsevier.
Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò,
M. R. (2013). Power failure: Why small sample size undermines the reliability of
neuroscience. Nature Reviews Neuroscience, 14, 365-376.
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., . . . Chan, T.
(2016). Evaluating replicability of laboratory experiments in economics. Science, 351,
1433-1436.
Chernyshenko, O. S., Kankaraš, M., & Drasgow, F. (2018). Social and emotional skills for
student success and well-being. Paris, France: OECD Publishing.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Cova, F., Strickland, B., Abatista, A., Allard, A., Andow, J., Attie, M., . . . Zhou, X. (in press).
Estimating the Reproducibility of Experimental Philosophy. Review of Philosophy and
Psychology.
De Raad, B., Perugini, M., Hrebícková, M., & Szarota, P. (1998). Lingua franca of personality:
Taxonomies and structures based on the psycholexical approach. Journal of Cross-
Cultural Psychology, 29, 212-232.
Replicability of Trait-Outcome Associations 35
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., .
. . Boucher, L. (2016). Many Labs 3: Evaluating participant pool quality across the
academic semester via replication. Journal of Experimental Social Psychology, 67, 68-82.
Fraley, R. C., & Vazire, S. (2014). The N-pact factor: Evaluating the quality of empirical
journals with respect to sample size and statistical power. PLoS ONE, 9, e109019.
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences:
Unlocking the file drawer. Science, 345, 1502-1505.
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist,
48, 26-34.
Hagger, M. S., Chatzisarantis, N. L., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., . . .
Zwienenberg, M. (2016). A multilab preregistered replication of the ego-depletion effect.
Perspectives on Psychological Science, 11, 546-573.
John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the integrative big five trait
taxonomy Handbook of personality: Theory and research (3rd ed., pp. 114-158). New
York, NY: Guilford.
Kautz, T., Heckman, J. J., Diris, R., ter Weel, B., & Borghans, L. (2014). Fostering and
measuring skills: Improving cognitive and non-cognitive skills to promote lifetime
success. NBER Working Paper 20749.
Klein, R. A., Ratliff, K. A., Vianello, M., Adams Jr, R. B., Bahník, Š., Bernstein, M. J., . . .
Brumbaugh, C. C. (2014). Investigating variation in replicability: A “many labs”
replication project. Social Psychology, 45, 142-152.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Oxford, England:
Addison-Wesley.
Replicability of Trait-Outcome Associations 36
Nelson, L. D., Simmons, J., & Simonsohn, U. (2018). Psychology's renaissance. Annual Review
of Psychology, 69, 511-534.
OECD. (2015). Skills for social progress: The power of social and emotional skills. Paris,
France: OECD Publishing.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science.
Science, 349, aac4716.
Ozer, D. J., & Benet-Martinez, V. (2006). Personality and the prediction of consequential
outcomes. Annual Review of Psychology, 57, 401-421.
Primi, R., Santos, D., John, O. P., & De Fruyt, F. (2016). Development of an inventory assessing
social and emotional skills in Brazilian youth. European Journal of Psychological
Assessment, 32, 5-16.
Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Goldberg, L. R. (2007). The power of
personality: The comparative validity of personality traits, socioeconomic status, and
cognitive ability for predicting important life outcomes. Perspectives on Psychological
Science, 2, 313-345.
Rossi, J. S. (1990). Statistical power of psychological research: What have we gained in 20
years? Journal of Consulting and Clinical Psychology, 58, 646-656.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed
flexibility in data collection and analysis allows presenting anything as significant.
Psychological Science, 22, 1359-1366.
Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results.
Psychological Science, 26, 559-569.
Replicability of Trait-Outcome Associations 37
Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2): Developing and
assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and
predictive power. Journal of Personality and Social Psychology, 113, 117-143.
Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The
effect of the outcome of statistical tests on the decision to publish and vice versa. The
American Statistician, 49, 108-112.
Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication.
Perspectives on Psychological Science, 9, 59-71.
Tackett, J. L., McShane, B. B., Bockenholt, U., & Gelman, A. (2017). Large scale replication
projects in contemporary psychological research. Unpublished manuscript.
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological
Bulletin, 76, 105-110.
Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI
studies of emotion, personality, and social cognition. Perspectives on Psychological
Pcience, 4, 274-290.