Content uploaded by Christopher J Soto
Author content
All content in this area was uploaded by Christopher J Soto on Oct 17, 2017
Content may be subject to copyright.
THE BFI-2-S AND BFI-2-XS 1
Soto, C. J., & John, O. P. (2017). Short and extra-short forms of the Big Five Inventory–2: The
BFI-2-S and BFI-2-XS. Journal of Research in Personality, 68, 69-81.
Short and Extra-Short Forms of the Big Five Inventory–2: The BFI-2-S and BFI-2-XS
Christopher J. Soto
Colby College
Oliver P. John
University of California, Berkeley
Author Note
Christopher J. Soto, Department of Psychology, Colby College; Oliver P. John,
Department of Psychology, University of California, Berkeley.
This research was supported by faculty research grants from Colby College to
Christopher J. Soto, and from the University of California, Berkeley to Oliver P. John. The
authors thank Seth Butler, Emma Heilbronner, Sara Heilbronner, Caroline Minott, Natalia Van
Doren, and Carylanne Wolfington for providing access to some of the college and university
sample data. For downloadable versions of the BFI-2 and its short forms, visit the Colby
Personality Lab website (http://www.colby.edu/psych/personality-lab/). To initiate a translation
of the BFI-2 or its short forms, please contact the authors.
Correspondence concerning this article should be addressed to Christopher J. Soto,
Department of Psychology, Colby College, 5550 Mayflower Hill, Waterville, ME 04901. Email:
christopher.soto@colby.edu.
THE BFI-2-S AND BFI-2-XS 2
Abstract
The Big Five Inventory–2 (BFI-2) uses 60 items to hierarchically assess the Big Five personality
domains and 15 more-specific facet traits. The present research develops two abbreviated forms
of the BFI-2—the 30-item BFI-2-S and the 15-item BFI-2-XS—and then examines their
measurement properties. At the level of the Big Five domains, we find that the BFI-2-S and BFI-
2-XS retain much of the full measure’s reliability and validity. At the facet level, we find that the
BFI-2-S may be useful for examining facet traits in reasonably large samples, whereas the BFI-
2-XS should not be used to assess facets. Finally, we discuss some key tradeoffs to consider
when deciding whether to administer an abbreviated form instead of the full BFI-2.
Keywords: Big Five; five-factor model; facets; personality measurement; short measures
THE BFI-2-S AND BFI-2-XS 3
The BFI-2-S and BFI-2-XS: Short and Extra-Short Forms of the Big Five Inventory–2
Individual differences in people’s characteristic patterns of thinking, feeling, and
behaving can be organized in terms of the Big Five personality trait domains (Goldberg, 1993;
John, Naumann, & Soto, 2008; McCrae & Costa, 2008). Moreover, these five broad domains can
be conceptualized hierarchically, with each domain subsuming several more-specific facet traits
(DeYoung, Quilty, & Peterson, 2007; McCrae & Costa, 2010; Roberts, Chernyshenko, Stark, &
Goldberg, 2005). The Big Five Inventory–2 (BFI-2; Soto & John, in press) is a 60-item
questionnaire that operationalizes this hierarchical conceptualization of personality structure by
assessing the Big Five domains and 15 facets: Extraversion (with facets of Sociability,
Assertiveness, and Energy Level), Agreeableness (Compassion, Respectfulness, and Trust),
Conscientiousness (Organization, Productiveness, and Responsibility), Negative Emotionality
(Anxiety, Depression, and Emotional Volatility), and Open-Mindedness (Intellectual Curiosity,
Aesthetic Sensitivity, and Creative Imagination). The present research was conducted to (a)
develop a 30-item short form (the BFI-2-S) and a 15-item extra-short form (the BFI-2-XS) of the
BFI-2, (b) examine the extent to which these short forms retain the reliability and validity of the
full BFI-2, and (c) test whether the BFI-2 short forms should only be used to assess personality at
the level of the Big Five domains, or whether they are also appropriate for examining facet-level
traits.
The Big Five Inventory–2 and the Need for Short Forms
The BFI-2 has some important psychometric strengths. First, it has a conceptually
coherent and empirically robust hierarchical structure, with three facets nested within each Big
Five domain (Soto & John, in press). This hierarchical measurement model helps address the
bandwidth-fidelity tradeoff: the phenomenon that broadly defined traits tend to predict a wider
THE BFI-2-S AND BFI-2-XS 4
range of criteria, whereas narrowly defined traits tend to predict closely aligned criteria more
accurately (Cronbach & Gleser, 1957; John, Hampson, & Goldberg, 1991). By balancing
descriptive breadth at the domain level with specificity at the facet level, the BFI-2’s hierarchical
structure enhances its power to accurately predict a wide range of external criteria (Soto & John,
in press).
Second, the BFI-2 effectively minimizes the influence of acquiescent responding: the
tendency of some individuals to consistently agree (yea-saying) or disagree (nay-saying) with
items regardless of their content (Jackson & Messick, 1958). Uncontrolled individual differences
in acquiescence can bias the results of analyses conducted at both the scale and item levels; for
example, they can distort a measure’s factor structure (Rammstedt & Farmer, 2013; Soto, John,
Gosling, & Potter, 2008) and associations with external criteria (Danner, Aichholzer, &
Rammstedt, 2015). By including an equal number of true-keyed and false-keyed items on each
domain and facet scale, the BFI-2 automatically controls individual differences in acquiescence
at the scale level. This balanced item content also allows researchers to easily control
acquiescence in item-level analyses, either by estimating latent variable models that include an
acquiescence method factor (e.g., Aichholzer, 2014; Soto & John, in press), or through simple
within-person centering: subtracting an individual’s overall mean response across the full set of
60 BFI-2 items from each of their individual item responses. (However, note that within-person
centering can sometimes introduce other psychometric problems; Baron, 1996.)
Third, the BFI-2 is easy to understand. Its items are short phrases that elaborate on a trait-
descriptive adjective (e.g., persistent) by adding a synonym, definition, or context (e.g., “Is
persistent, works until the task is finished.”). These phrased items retain the simplicity and
brevity of adjective ratings, while addressing the limitation that individual trait adjectives often
THE BFI-2-S AND BFI-2-XS 5
have ambiguous or multiple meanings (Goldberg & Kilkowski, 1985). Finally, the BFI-2 is
efficient. Its 60 items can be completed in approximately 5 to 10 minutes, whereas many
broadband personality measures include hundreds of items and can take an hour or more to
administer.
The full BFI-2’s reasonably short completion time makes it appropriate for many basic
and applied research contexts. However, there are some circumstances in which administering
the full set of 60 items may not be feasible, and an even shorter measure is needed. For example,
some large-scale surveys—such as the British Household Panel Survey (Taylor, Brice, Buck, &
Prentice-Lane, 2010), the German Socio-Economic Panel study (Wagner, Frick, & Schupp,
2007), and the Household Income and Labour Dynamics in Australia survey (Summerfield et al.,
2015)—are designed to measure many dozens of personal and environmental characteristics as
efficiently as possible. When assessing each participant, such surveys may only be able to devote
a minute or two to assessing personality traits. Another circumstance concerns within-subjects
designs that ask participants to complete the same personality measure multiple times. For
example, a single participant may be asked to rate their own personality in several different
contexts (Wood & Roberts, 2006), or to rate several other participants in a round-robin design
(Srivastava, Guglielmo, & Beer, 2010). In such situations, very brief measures may be needed to
prevent participant fatigue, frustration, and careless responding. Finally, some laboratory studies
may wish to briefly assess personality traits while still reserving as much time as possible for
experimental manipulations and direct behavioral observation.
A Bottom-Up Strategy for Developing the BFI-2 Short Forms
Given their possible value and most likely uses, we developed the BFI-2-S and BFI-2-XS
with two key goals in mind. First, we wanted the short forms to coherently assess each Big Five
THE BFI-2-S AND BFI-2-XS 6
domain and clearly differentiate between the domains, thereby retaining the BFI-2’s clear Big
Five structure. Second, we also wanted the short forms to adequately represent each domain’s
considerable bandwidth—rather than narrowing the range of personality content assessed—in
order to maintain the BFI-2’s descriptive and predictive breadth. To help balance these two
goals, we used a bottom-up approach to scale construction organized around the 15 BFI-2 facets.
Specifically, we constructed the BFI-2-XS by selecting a single item to represent each facet, and
then constructed the BFI-2-S by adding a second item per facet.
Because the BFI-2 facets have a clear Big Five structure (Soto & John, in press), we
expected that this strategy would provide the short forms with a similarly robust domain-level
structure. And because same-domain BFI-2 facets can be meaningfully distinguished from each
other (Soto & John, in press), we also expected that this strategy would preserve a suitably broad
range of content within each domain. Furthermore, selecting an item set that equally represents
each BFI-2 facet within the Big Five domains raises the possibility that the short forms, like the
full measure, might prove useful for assessing personality traits hierarchically. While validating
the BFI-2-S and BFI-2-XS, we therefore investigated whether these short forms should only be
used to assess the Big Five domains, or whether they are also appropriate for examining facet-
level traits.
Despite its strengths, we expected that our bottom-up approach to constructing the BFI-2
short forms would also have some drawbacks. Perhaps most notably, compared with alternative
strategies focused on maximizing internal consistency within each Big Five domain (e.g., by
selecting items with especially high content overlap, high inter-item correlations, or high
domain-level factor loadings), we expected that representing each BFI-2 facet equally might
result in relatively low internal consistency reliability for some of the six-item BFI-2-S and
THE BFI-2-S AND BFI-2-XS 7
(especially) three-item BFI-2-XS domain scales. However, reviews of the psychometric literature
have noted that content breadth is generally more important than internal consistency for
enhancing the validity of brief measures (Smith, McCarthy, & Anderson, 2000; see also John &
Soto, 2007; Stanton, Sinar, Balzer, & Smith, 2002). Thus, prioritizing content validity over
internal consistency should help the BFI-2 short forms retain as much of the full measure’s
validity as possible (cf. Gosling, Rentfrow, & Swann, 2003; Rammstedt & John, 2007).
Overview of the Present Research
In sum, the present research was conducted to develop two short forms of the BFI-2—the
30-item BFI-2-S and the 15-item BFI-2-XS—and to address two key research questions. First, to
what extent do the BFI-2 short forms retain the reliability and validity of the full measure?
Second, is it appropriate to use the BFI-2 short forms as hierarchical personality measures? In
other words, should the BFI-2-S and BFI-2-XS only be used to assess personality at the level of
the Big Five domains, or are they also appropriate for examining facet-level traits?
Study 1
Study 1 had two main goals. The first was to select items for the BFI-2-S and BFI-2-XS,
using a joint rational-empirical approach to scale construction. The second was to examine the
two short forms’ basic measurement properties, using data from three samples. Ideally, the short
forms would converge strongly with the full BFI-2 domain scales, demonstrate adequate
reliability, and retain a clear Big Five structure.
Method
Participants and procedure. Study 1 analyzed data from three item selection samples:
an Internet sample, a university sample, and a college sample. As described below, data from
some of these participants were previously analyzed to validate the full BFI-2 (Soto & John, in
THE BFI-2-S AND BFI-2-XS 8
press, Study 3). However, none of the present data overlapped with those used to select items for
the full measure (Soto & John, in press, Study 2).
Internet selection sample. Participants in this sample were 1,000 adult visitors to a
personality test website (50% male, 50% female) who completed the BFI-2 online in exchange
for automatically generated feedback. Their median age was 24 years old, with 77% between
ages 18 and 35. Regarding ethnicity, 66% described their ethnic identification as
White/Caucasian, 7% as Asian/Asian-American, 7% as Hispanic/Latino, 6% as Black/African-
American, 1% as Native American/American Indian, 4% as another ethnicity, and 5% as
multiple ethnicities, with 5% not reporting ethnicity. Most participants (82%) were residents of
the United States, with smaller numbers residing in the United Kingdom (9%), Canada (7%), and
Australia or New Zealand (3%). These participants previously constituted the Internet sample in
Study 3 of Soto and John (in press).
University selection sample. Participants in this sample were 784 students (68% female,
32% male) enrolled in introductory psychology and business courses at the University of
California, Berkeley, who completed the BFI-2 online in exchange for partial fulfillment of a
course requirement. Their median age was 21 years old, with 95% between ages 18 and 25; 46%
described their primary ethnic identification as Asian/Asian-American, 35% as White/Caucasian,
13% as Hispanic/Latino, 2% as Black/African-American, and 1% as another ethnicity, with 3%
not reporting ethnicity. A subsample of 137 participants also completed the BFI-2 a second time,
with an average retest interval of approximately two months; 110 of these participants were also
part of the student sample analyzed in Study 3 of Soto and John (in press).
College selection sample. Participants in this sample were 318 students at Colby College
(63% female, 29% male, 1% another gender, 7% did not report gender) who completed the BFI-
THE BFI-2-S AND BFI-2-XS 9
2 online in exchange for a chance to win one of several gift cards. Their median age was 20 years
old, with 99% between ages 18 and 22; 70% described their ethnic identification as
White/Caucasian, 9% as Asian/Asian American, 2% as Hispanic/Latino, 2% as Black/African
American, 1% as another ethnicity, and 9% as multiple ethnicities, with 7% not reporting
ethnicity. A subsample of 121 participants also completed the BFI-2 a second time, with an
average retest interval of approximately three months.
The Big Five Inventory-2. The BFI-2 (Soto & John, in press) is a hierarchical measure
of the Big Five personality domains and 15 more-specific facet traits. Its 60 items are short,
descriptive phrases with the common item stem “I am someone who...” (e.g., “Is outgoing,
sociable,” “Tends to be quiet”). Respondents rate each item using a 5-point scale ranging from
disagree strongly to agree strongly. Soto and John (in press) provided evidence for the
reliability, structure, and validity of the BFI-2 domain and facet scales. Alpha reliabilities of the
12-item domain scales averaged .86 in each of the three present samples, with a total range of .81
to .90 across samples. Alphas of the four-item facet scales averaged .75 in each sample, with a
total range of .59 to .86.
Results and Discussion
Developing the BFI-2-XS. The first goal of Study 1 was to develop short forms that
would accurately represent the content and structure of the full BFI-2. With this goal in mind, we
constructed the 15-item BFI-2-XS by selecting a single item from each of the 15 BFI-2 facet
scales, using a combination of empirical and rational criteria. The first criterion was each
individual item’s correlation with its total facet scale. The second criterion was each item’s
standardized loading on its facet factor in a bifactor confirmatory factor analysis (CFA) model of
the facet scale. This CFA model allowed each item to load on a substantive facet factor and an
THE BFI-2-S AND BFI-2-XS 10
acquiescence method factor; to ensure that the latter factor would represent acquiescence, all
items (including both true-keyed and false-keyed items) were constrained to load equally on it,
and the acquiescence factor was not allowed to correlate with the facet factor (cf. Soto & John, in
press, Figure 2). The third criterion was an item response theory (IRT) estimate of the total
information that each item provided about its facet trait, aggregated across the range of three
standard deviations below to three standard deviations above the trait’s mean level. The fourth
criterion was the two authors’ conceptual judgments regarding the extent to which each item’s
content represents the overall meaning of its facet scale. The fifth criterion was each item’s retest
reliability in the university and college samples. The sixth criterion was each item’s pattern of
loadings in an exploratory factor analysis (EFA) and a principal components analysis (PCA) of
the preliminary BFI-2-XS that extracted and varimax-rotated five dimensions. The seventh and
final criterion was to ensure that the BFI-2-XS included both true-keyed and false-keyed items
within each Big Five domain. Together, these seven criteria were designed to identify a set of 15
items that (a) represent the content and overall meaning of each BFI-2 domain (criteria 1, 2, 3,
and 4), (b) are rated reliably (criterion 5), (c) have a clear Big Five structure (criterion 6), and (d)
minimize the influence of acquiescent responding (criterion 7).
For seven of the 15 facets, these criteria consistently suggested one particular item for
selection. For the remaining facets, they identified two (for Energy Level, Compassion,
Organization, Productiveness, Depression, and Intellectual Curiosity) or three (for
Respectfulness and Creative Imagination) viable items. In these cases, we finalized our selection
through discussion of each item’s empirical and conceptual strengths and weaknesses. For
example, the Organization items “Tends to be disorganized” and “Keeps things neat and tidy”
had similar empirical properties. After discussion, we selected the former item due to its broader
THE BFI-2-S AND BFI-2-XS 11
psychological meaning, as compared with the latter item’s focus on physical neatness. The
complete set of 15 items selected for the BFI-2-XS is provided in the Appendix.
Developing the BFI-2-S. We constructed the 30-item BFI-2-S by supplementing the 15-
item BFI-2-XS with one additional item from each of the BFI-2 facet scales.1 As with the BFI-2-
XS, we selected these supplemental items using a set of empirical and rational criteria. Our first
criterion was the correlation of each two-item facet composite (the BFI-2-XS item plus the
supplemental item) with its total facet scale. The second criterion was the total information
provided by each two-item composite about its facet trait, as estimated from an IRT analysis of
the complete facet scale. The third criterion was each two-item composite’s retest reliability in
the undergraduate and college samples. The fourth criterion was the inter-item correlation within
each two-item composite; ideally, this correlation would be strong enough to indicate consistent
responding but not so strong as to indicate excessive redundancy. The fifth criterion was the two
authors’ conceptual judgments of breadth vs. redundancy in item content (with a preference for
greater breadth). The sixth criterion was each item’s pattern of loadings in an EFA and a PCA of
the preliminary BFI-2-S that extracted and varimax-rotated five dimensions. The seventh and
final criterion was to ensure that the BFI-2-S included one true-keyed item and one false-keyed
item for each facet. These criteria were designed to select a set of 30 items that (a) represent the
content and meaning of each BFI-2 domain (criteria 1 and 2), (b) are reported reliably (criterion
3), (c) avoid excessive redundancy between items (criteria 4 and 5), (d) have a clear Big Five
structure (criterion 6), and (e) minimize the influence of acquiescence (criterion 7).
1 An alternative strategy would have been to select item pairs for the BFI-2-S independently,
without requiring that they include the BFI-2-XS items. Preliminary analyses suggested that this
would have produced no more than trivial differences in our final item selections. We therefore
decided to maximize comparability between the two short forms by nesting the BFI-2-XS items
within the BFI-2-S.
THE BFI-2-S AND BFI-2-XS 12
For 10 of the 15 facets, these criteria consistently suggested one particular item for
selection. For the remaining facets (Assertiveness, Energy Level, Depression, Emotional
Volatility, and Creative Imagination), they identified two viable items. We resolved these cases
through discussion. For example, the Energy Level items “Is less active than other people” and
“Rarely feels excited or eager” were both viable supplements to the BFI-2-XS item “Is full of
energy.” After discussion, we selected the former item due to its higher retest reliability and
weaker secondary loading on Agreeableness. The complete set of 30 items selected for the BFI-
2-S is provided in the Appendix.
A preliminary examination of the short forms’ measurement properties. The second
goal of Study 1 was to briefly examine the BFI-2 short forms’ basic measurement properties. To
do this, we examined the part-whole correlations (i.e., the correlation of each short-form scale
with the corresponding full BFI-2 scale; cf. Rammstedt & John, 2007), reliabilities, and
multidimensional structure of the BFI-2-S and BFI-2-XS in the three selection samples. Part-
whole correlations of the six-item BFI-2-S domain scales with the full, 12-item BFI-2 domain
scales averaged either .95 or .96 in each sample (total range across the three samples = .93 to
.97), and correlations of the three-item BFI-2-XS domain scales with the full BFI-2 domain
scales averaged .90 in each sample (total range = .85 to .93). These results suggest that our
empirical and rational item selection criteria successfully produced short forms that adequately
represent the content and overall meaning of the BFI-2 domains, with the BFI-2-S providing (as
expected) better representation than the BFI-2-XS.
Alpha reliabilities of the BFI-2-S domain scales averaged .77 or .78 in each sample (total
range = .73 to .83); these scales’ retest reliabilities averaged .76 in the university sample (range =
.69 to .83) and .83 in the college sample (range = .77 to .88). Alphas of the BFI-2-XS domain
THE BFI-2-S AND BFI-2-XS 13
scales averaged between .61 and .63 in each sample (total range = .51 to .72); these scales’ retest
reliabilities averaged .70 in the university sample (range = .60 to .80) and .76 in the college
sample (range = .71 to .80). These results suggest that both short forms are adequately reliable,
with the BFI-2-S providing greater reliability than the BFI-2-XS. They also suggest that the BFI-
2-XS scales tend to have greater retest reliability than alpha reliability, a pattern typical for brief
scales that prioritize content validity over internal consistency (cf. Gosling et al., 2003).
To examine the multidimensional structure of the BFI-2-S and BFI-2-XS, we submitted
each short form’s items to a random intercept EFA that extracted and varimax-rotated five
factors, using maximum likelihood estimation. Random intercept EFA (Aichholzer, 2014) is a
procedure for examining the multidimensional structure of an item set while using a method
factor to model individual differences in acquiescent responding, thereby minimizing the
negative structural effects of acquiescence variance (Rammstedt & Farmer, 2013; Soto et al.,
2008). For both short forms, each item had its strongest loading on the expected factor in all
three samples. For the BFI-2-S, the absolute primary loadings averaged .59 or .60 in each
sample, whereas the absolute secondary loadings averaged only .09 or .10. For the BFI-2-XS, the
absolute primary loadings averaged .59 to .61 in each sample, whereas the absolute secondary
loadings averaged only .10. For both short forms, all congruence coefficients comparing pairs of
corresponding factors across samples were at least .96. These results suggest that both the BFI-2-
S and BFI-2-XS have a clear Big Five structure.
Study 2
Study 1 selected items for the 15-item BFI-2-XS and the 30-item BFI-2-S, and provided
preliminary evidence regarding these short forms’ part-whole convergence, reliability, and
multidimensional structure in three samples. However, the measurement properties observed in
THE BFI-2-S AND BFI-2-XS 14
Study 1 were potentially biased by the fact that we selected items for the short forms based partly
on exploratory analyses of these same samples. Therefore, one major goal of Study 2 was to
more thoroughly examine the reliability and validity of the BFI-2-S and BFI-2-XS using data
from two independent validation samples. Our second major goal was to test whether the BFI-2
short forms should only be used to assess personality at the level of the broad Big Five domains,
or whether they are also appropriate for examining more-specific, facet-level traits within each
domain.
Method
Participants and procedure. Study 2 analyzed data from two validation samples: an
Internet sample and a university sample. As described below, some of the present data were
previously analyzed to validate the full BFI-2 (Soto & John, in press, Study 3). However, none of
these data were used to select items for the full measure (Soto & John, in press, Study 2) or the
short forms (Study 1, above). Therefore, the present study can provide unbiased estimates of the
BFI-2 short forms’ measurement properties.
Internet validation sample. Participants in this sample were 2,000 adults (50% male,
50% female) recruited and assessed using the same procedure as the Internet selection sample
analyzed in Study 1. The median age of participants in the Internet validation sample was 24
years old, with 77% between ages 18 and 35. Regarding ethnicity, 64% described themselves as
White/Caucasian, 8% as Black/African-American, 7% as Asian/Asian-American, 7% as
Hispanic/Latino, 1% as Native American/American Indian, 4% as another ethnicity, and 4% as
multiple ethnicities, with 5% not reporting ethnicity. Participants were residents of the United
States (80%), the United Kingdom (8%), Canada (7%), or Australia or New Zealand (5%).
THE BFI-2-S AND BFI-2-XS 15
University validation sample. Participants in this sample were 423 students (66% female,
31% male, 3% did not report gender) enrolled in introductory psychology courses at the
University of California, Berkeley, who completed the BFI-2 online in exchange for partial
fulfillment of a course requirement. Their median age was 21 years old, with 90% between ages
18 and 25. Regarding ethnicity, 50% described themselves as Asian/Asian-American, 24% as
White/Caucasian, 12% as Hispanic/Latino, 2% as Black/African-American, and 8% as another
ethnicity, with 4% not reporting ethnicity. Most of these participants (360) were also members of
the student sample analyzed in Study 3 of Soto and John (in press).
Measures. All participants in both validation samples completed the full BFI-2 item set,
which was then used to score the BFI-2-S and BFI-2-XS. Some members of the university
sample were also assessed using a set of self-reported and peer-reported criteria, described
below. (For further details about these measures, see Soto & John, in press.)
Behavioral self-reports. Approximately two weeks after completing the BFI-2, 392
members of the university validation sample described their behavior during the previous six
months using a set of 80 items (Bardi & Schwartz, 2003). Each item was rated on a 5-point
frequency scale ranging from never to all the time. Following the recommendation of the original
authors (Bardi & Schwartz, 2003), each respondent’s complete set of ratings was within-person
centered to remove the substantial individual differences in acquiescent responding typically
observed for this measure. The centered items were then aggregated into scales corresponding
with the 10 values of the Schwartz values circumplex: conformity, tradition, benevolence, power,
universalism, hedonism, security, stimulation, achievement, and self-direction. As in previous
research, after within-person centering, the alpha reliabilities of these scales varied considerably,
averaging .48 (range = .27 to .72) (cf. Bardi & Schwartz, 2003; Pozzebon & Ashton, 2009).
THE BFI-2-S AND BFI-2-XS 16
Psychological Well-Being Scales. Approximately two weeks before completing the BFI-
2, 185 members of the university validation sample completed the Psychological Well-Being
Scales (Ryff, 1989), which assess six aspects of well-being: positive relations with others,
purpose in life, environmental mastery, self-acceptance, autonomy, and personal growth. These
scales include a total of 84 items that respondents rate on a 5-point agreement scale. In this
sample, the scales’ alpha reliabilities averaged .86 (range = .84 to .90).
Peer-reports. Approximately two months after completing the BFI-2, 230 members of
the university validation sample were rated by a knowledgeable peer. Most peers were friends
(62%) or romantic partners (28%). Each peer rated the target participant on the full BFI-2, and
on a set of additional items assessing four criteria: social connectedness (4 items), likability (2
items), stress resistance (5 items), and positive affect (2 items). All items were rated on a 5-point
agreement scale. Alpha reliabilities for the BFI-2 peer-reports averaged .86 for the domain scales
(range = .84 to .90) and .75 for the facet scales (range = .63 to .84). Alphas for the additional
peer-reported criteria were .75 for social connectedness, .76 for likability, .79 for stress
resistance, and .66 for positive affect.
Results and Discussion
Domain-level measurement properties of the BFI-2 short forms. Table 1 presents
part-whole correlations, alpha reliabilities, and self-peer agreement correlations for the BFI-2,
BFI-2-S, and BFI-2-XS domain scales. The leftmost part of this table shows that, in each
validation sample, part-whole correlations for the BFI-2-S averaged .95 (total range across the
two samples = .94 to .97), and correlations for the BFI-2-XS averaged .89 or .90 (total range =
.86 to .92). Closely replicating the preliminary findings from Study 1, these results indicate that
THE BFI-2-S AND BFI-2-XS 17
the BFI-2-S and BFI-2-XS capture approximately 91% and 80%, respectively, of the total
variance in the full BFI-2 domain scales.
The middle part of Table 1 shows that, in each sample, alpha reliabilities of the full BFI-2
domain scales averaged .85 or .86 (total range = .82 to .90), alphas of the BFI-2-S domains
averaged .77 or .78 (total range = .73 to .84), and alphas of the BFI-2-XS domains averaged .59
or .62 (total range = .49 to .73). Again replicating the preliminary findings from Study 1, these
results indicate a moderate decrease in internal consistency from the full BFI-2 to the BFI-2-S
(which retained approximately 91% of the full measure’s internal consistency), and a more
substantial decrease to the BFI-2-XS (which retained approximately 71% of the full measure’s
internal consistency). As noted above, these decreases were an expected consequence of our
strategy for developing the BFI-2 short forms, which prioritized content validity over internal
consistency (cf. Gosling et al., 2003; Smith et al., 2000; Stanton et al., 2002).
Finally, the rightmost part of Table 1 shows that, in the university sample, self-peer
agreement correlations averaged .55 for the full BFI-2 domain scales (range = .47 to .69), .53 for
the BFI-2-S domains (range = .46 to .66), and .50 for the BFI-2-XS domains (range = .42 to .59).
These results indicate that the BFI-2-S and BFI-2-XS retain approximately 93% and 82%,
respectively, of the full BFI-2 domain scales’ self-peer agreement.
Domain-level structure of the BFI-2 short forms. To examine the domain-level
structure of the BFI-2-S and BFI-2-XS, we submitted each short form’s items to a random
intercept EFA that extracted and varimax-rotated five factors. The factor loadings from these
analyses are presented in Tables 2 and 3, respectively. As these tables show, for both short forms
each item had its strongest loading on the expected factor in both validation samples. For the
BFI-2-S, in each sample the absolute primary loadings averaged .58 or .59 (total range = .39 to
THE BFI-2-S AND BFI-2-XS 18
.81), whereas the absolute secondary loadings averaged only .09 or .10 (total range = .00 to .35).
For the BFI-2-XS, in each sample the absolute primary loadings averaged .57 or .59 (total range
= .40 to .76), whereas the absolute secondary loadings averaged only .09 or .11 (total range = .00
to .30). Supporting the utility of modeling acquiescence variance, loadings on the acquiescence
method factor were generally small but not trivial, averaging . 14 or .15 in each sample for both
short forms (total range = .12 to .24). When we examined replication of the factor loadings
across the two validation samples, the congruence coefficients between corresponding factors
were all at least .97 (M = .98) for the BFI-2-S, and at least .95 (M = .97) for the BFI-2-XS.2
Replicating the preliminary findings from Study 1, these results indicate that both BFI-2 short
forms have a clear Big Five structure.
Domain-level external validity of the BFI-2 short forms. To examine the BFI-2 short
forms’ external validity, we first computed correlations of the BFI-2, BFI-2-S, and BFI-2-XS
domain scales with the set of 20 self-reported and peer-reported criteria assessed in the university
validation sample; we then regressed each criterion on each set of domain scales. The resulting
correlations and standardized regression coefficients are presented in Table 4. As this table
shows, the pattern of criterion associations was very similar across the three BFI-2 forms.
Column-vector correlations comparing the pattern of domain-criterion correlations for the full
BFI-2 against the BFI-2-S were at least .99 for each of the 20 criteria, and the corresponding
column-vector correlations comparing the patterns of standardized regression coefficients were
2 Much previous personality research has used PCAs rather than EFAs to examine
multidimensional structure. Other research has used latent variable models that do not impose
orthogonality constraints between the Big Five. Therefore, in addition to the analyses reported in
Tables 2 and 3, we also conducted PCAs with varimax rotation and random intercept EFAs with
direct oblimin rotation (gamma = 0). The results of these analyses are presented in the Online
Supplementary Material, Tables S1-S4. Overall, their results were very similar to those reported
in Tables 2 and 3. In each analysis, all items had their strongest loading on the expected
dimension, and all congruence coefficients with the corresponding factors reported in Tables 2
and 3 were at least .97. Thus, the present results were quite robust across alternative methods.
THE BFI-2-S AND BFI-2-XS 19
all at least .92 (M = .99). When comparing the BFI-2 with the BFI-2-XS, column-vector
correlations for the pattern of domain-criterion correlations were all at least .97 (M = .99), and
those for the pattern of regression coefficients were all at least .88 (M = .97).
Beyond these similar patterns of criterion associations, how well do the BFI-2 short
forms retain the full domain scales’ overall level of predictive power? To address this question,
we compared the total proportion of variance in each criterion variable explained by the BFI-2,
BFI-2-S, and BFI-2-XS domain scales, respectively; these proportions are presented in Table 5.
As this table shows, the BFI-2-S retained approximately 93% (mean R2 = .236), and the BFI-2-
XS approximately 84% (mean R2 = .214), of the full BFI-2’s predictive power (mean R2 = .254).
Inspection of the individual criteria revealed that this pattern of decreasing predictive power with
decreasing scale length generalized across the behavioral, psychological, and peer-reported
criteria. Taken together, these results indicate that the BFI-2 short forms capture the full
measure’s domain-level pattern of substantive associations with external criteria quite well, but
that using a short form instead of the full measure entails a loss of overall predictive power.
Facet-level external validity of the BFI-2 short forms. The results presented thus far
converge in indicating that the BFI-2-S and BFI-2-XS efficiently measure the Big Five domains.
They also indicate that developing these short forms using a strategy that prioritized content
breadth over internal consistency helped them retain much of the full BFI-2’s domain-level
validity. Are the short forms also appropriate for examining facet-level traits? To address this
question, we repeated the correlation and regression analyses of external validity, just described,
at the facet level.
As with the domain-level analyses, the pattern of facet-criterion correlations was similar
across the three BFI-2 forms. Column-vector correlations comparing the pattern of criterion
THE BFI-2-S AND BFI-2-XS 20
correlations for the four-item BFI-2 facet scales against the two-item BFI-2-S scales were at least
.95 (M = .98) for each of the 20 criteria, and those comparing the BFI-2 facet scales against the
single BFI-2-XS items were all at least .86 (M = .95). The pattern of facet-level standardized
regression coefficients was also reasonably similar between the full BFI-2 and the BFI-2-S for
the 10 self-reported behavioral criteria. Column-vector correlations comparing these measures’
regression coefficients were at least .78 (M = .90) for each of these 10 criteria. Moreover,
averaged across these criteria, the BFI-2-S facets retained approximately 89% (mean R2 = .215)
of the full BFI-2’s predictive power (mean R2 = .241).
However, the pattern of facet-level standardized regression coefficients was much less
consistent between the BFI-2 and the BFI-2-S for the well-being and peer-reported criteria.
Column-vector correlations comparing these measures’ regression coefficients averaged only .64
(minimum = -.15) across these 10 criteria. The facet-level regression coefficients were also
rather inconsistent between the full BFI-2 and the BFI-2-XS for the complete set of 20 self-
reported and peer-reported criteria, with column-vector correlations averaging only .75
(minimum = .39).
Taken together, this pattern of results suggests three conclusions. First, the BFI-2-S may
be useful for examining facet-level associations in reasonably large samples. The self-reported
behavioral criteria were assessed in a sample of approximately 400 participants, and the BFI-2
and BFI-2-S produced similar patterns of facet-criterion associations in this sample, although the
short form had less overall predictive power than the full measure. Second, the BFI-2-S should
not be used to examine facet-level associations in smaller samples. The well-being and peer-
reported criteria were each assessed in samples of approximately 200 participants, and in these
samples the BFI-2-S showed substantial discrepancies from the full BFI-2’s pattern of facet-level
THE BFI-2-S AND BFI-2-XS 21
associations.3 Finally, the present results suggest that the BFI-2-XS should be used to assess
personality only at the domain level. Across the full set of external criteria, there were substantial
discrepancies between the facet-level associations observed for the full BFI-2 vs. the BFI-2-XS.
Facet-level measurement properties of the BFI-2-S. Because the correlation and
regression analyses just described suggest that the BFI-2-S may be useful for assessing facet
traits in reasonably large samples, our final set of analyses examined this short form’s facet-level
measurement properties. Table 6 presents part-whole correlations, alpha reliabilities, and self-
peer agreement correlations for the BFI-2 and BFI-2-S facet scales. The leftmost part of this
table shows that, in each validation sample, part-whole correlations for the BFI-2-S averaged .91
(total range = .86 to .95), indicating that this short form captures approximately 83% of the total
variance in the full BFI-2 facet scales.
The middle part of Table 6 shows that, in each sample, alpha reliabilities of the full BFI-2
facet scales averaged .74 or .75 (total range = .59 to .84), whereas alphas of the BFI-2-S facets
averaged .60 or .61 (total range = .39 to .79). As with the BFI-2-XS domain scales, the relatively
low internal consistency of the BFI-2-S facet scales reflects their extreme brevity (2 items per
scale) and our decision to maintain content validity by prioritizing breadth over internal
consistency (cf. Gosling et al., 2003; Smith et al., 2000; Stanton et al., 2002).
Finally, the rightmost part of Table 6 shows that, in the university sample, self-peer
agreement correlations averaged .49 for the full BFI-2 facet scales (range = .30 to .72), and .45
3 This conclusion was further supported by analyses of the BFI-2 and BFI-2-S facet scales’
associations with the self-reported behavioral criteria in the subsamples of participants who were
also assessed on the well-being (N = 177) and peer (N = 225) criteria. Associations between the
facet scales and the behavioral criteria replicated somewhat less clearly between the full BFI-2
and the BFI-2-S in these smaller subsamples than in the full sample of 392 participants who
completed the self-reported behavioral measure.
THE BFI-2-S AND BFI-2-XS 22
for the BFI-2-S facets (range = .23 to .69). These results indicate that the BFI-2-S retains
approximately 85% of the full BFI-2 facet scales’ self-peer agreement.4
General Discussion
The present research pursued three main goals. First, we used a joint rational-empirical
approach to develop two short forms of the Big Five Inventory–2: the 30-item BFI-2-S and the
15-item BFI-2-XS. Second, we examined how well these short forms retain the reliability and
validity of the full BFI-2. At the level of the Big Five domains, our analyses of multiple
indicators converge in showing that the BFI-2-S retains about 90%, and the BFI-2-XS about
80%, of the BFI-2 domain scales’ reliability, self-peer agreement, and external validity (see
Tables 1 and 5). Finally, we examined whether the BFI-2 short forms are appropriate for
hierarchically assessing more-specific facet traits within the broad Big Five domains. Here our
results suggest that the BFI-2-S may be useful for examining facet-level traits in reasonably large
samples (approximately 400 or more observations), in which case we estimate that they retain
approximately 85% of the full BFI-2 facet scales’ reliability and validity (see Table 6). In
contrast, our results indicate that the BFI-2-XS is so brief that it should only be used to assess
personality at the level of the Big Five domains, not at the facet level.
When (and When Not) to Use the BFI-2 Short Forms
When designing a study and deciding whether to administer one of the BFI-2 short forms
instead of the full measure, researchers should consider a few key points. First, the short forms
do offer some savings in assessment time over the full BFI-2. Based on experiences with the
4 Our recommendation against using the BFI-2-XS to assess facet traits was further supported by
analyses of this short form’s facet-level measurement properties. For the single BFI-2-XS items,
part-whole correlations averaged only .79 in each validation sample, and self-peer agreement
correlations averaged only .38 in the university validation sample. These results indicate a
substantial loss of reliability and self-peer agreement from the BFI-2 and BFI-2-S facet scales to
the single BFI-2-XS items.
THE BFI-2-S AND BFI-2-XS 23
original BFI and the BFI-2, we estimate that most participants can complete the full BFI-2 in
four to ten minutes, the BFI-2-S in three to five minutes, and the BFI-2-XS in two or three
minutes. These modest time savings may be important for certain studies in which minimizing
assessment time and respondent fatigue are vital concerns. Some examples include large-scale
surveys designed to efficiently assess many variables beyond personality traits, within-subjects
designs that require each participant to complete the same personality measure multiple times,
and laboratory studies that must reserve considerable time for experimental manipulations and
behavioral observations. In these highly constrained contexts, administering the full BFI-2 may
not be feasible, but administering one of the BFI-2 short forms would clearly be better than not
measuring personality at all.
However, it is important to recognize that the short forms’ gains in efficiency come at a
cost to reliability and validity. We have attempted to reduce such costs by drawing on a
combination of rational and empirical criteria to carefully select items for the short forms. Even
so, at the level of the Big Five domains the present results indicate that the BFI-2-S provides
approximately 10%, and the BFI-2-XS approximately 20%, less reliability and validity than the
full BFI-2 domain scales. We should therefore expect to observe systematically smaller
associations between personality traits and external criteria when using a short form rather than
the full BFI-2 (John & Soto, 2007). An important but easily overlooked implication of such
shrinkage is that studies administering abbreviated measures will need to recruit larger samples
of participants to maintain adequate statistical power. For example, imagine that we expect a
zero-order correlation of .20 between a BFI-2 domain scale and an external criterion (a typical
effect size in psychological research; Richard, Bond, & Stokes-Zoota, 2003). For 80% power to
detect this effect, we would need to recruit approximately 190 participants if administering the
THE BFI-2-S AND BFI-2-XS 24
full BFI-2, 220 participants if administering the BFI-2-S, or 240 participants if administering the
BFI-2-XS.
The decreased reliability of the BFI-2 short forms also has important implications for
testing more complex effects. One common example is research assessing whether a newly
proposed psychological construct can provide incremental validity beyond established constructs
(e.g., the Big Five) treated as control variables. Specifically, assessing the established control
variables with a high degree of reliability helps prevent spurious evidence of incremental validity
(Westfall & Yarkoni, 2016). Similar reliability concerns apply to mediation analyses, in which
regressions or partial correlations are used to test whether the effect of a predictor variable on an
outcome can be accounted for by a mediator (Baron & Kenny, 1986). We therefore recommend
that researchers administer the full BFI-2 when evaluating the incremental validity of constructs
beyond the Big Five domains and facets, or when testing mediation relationships.
Finally, researchers should consider whether they wish to examine personality traits
hierarchically, by distinguishing between more-specific facet traits within each broad Big Five
domain. Such hierarchical assessment combines the benefits of high bandwidth at the domain
level with high fidelity at the facet level, and can therefore substantially enhance a measure’s
capacity to accurately predict a wide range of external criteria (Ashton, Jackson, Paunonen,
Helmes, & Rothstein, 1995; Paunonen & Ashton, 2001; Soto & John, in press). All three forms
of the BFI-2 are appropriate for domain-level personality assessment, but only the full BFI-2 and
the BFI-2-S (in reasonably large samples) are also appropriate for examining facet-level traits.
The BFI-2-XS’ lack of hierarchical assessment entails a further cost to validity: the BFI-2-XS
domain scales provide only about two-thirds as much predictive power as the BFI-2 facet scales.
THE BFI-2-S AND BFI-2-XS 25
Thus, for example, if the BFI-2 facets explain 50% of the variance in a particular criterion, then
we would only expect the BFI-2-XS domains to explain about 33% of the criterion variance.
We therefore advise researchers to carefully weigh the relative values of efficiency,
reliability, validity, bandwidth, fidelity, and statistical power when deciding which form of the
BFI-2 to administer in a particular study. For example, would saving four minutes of assessment
time by administering the BFI-2-XS instead of the full BFI-2 be worth increasing a study’s
planned sample size by 25% to maintain statistical power, decreasing the size of the observed
effects by 10%, and losing the capacity to examine facet-level traits? For many studies, the BFI-
2 short forms’ modest time savings will be clearly be outweighed by the full measure’s superior
measurement properties; we therefore strongly recommend administering the full BFI-2 in most
research contexts. However, in studies where assessment time and respondent fatigue are core
concerns that cannot feasibly be mitigated by other design decisions, the BFI-2 short forms’
small gains in efficiency may be crucially important; in such cases, the short forms are useful
alternatives to the full measure.
Limitations and Future Directions
The present research had a number of important strengths, including its joint rational-
empirical approach to scale construction, its use of multiple independent samples to directly
replicate key results, and its assessment of both self-reported and peer-reported validity criteria.
However, it also had some limitations that highlight important directions for future research.
First, the retest reliabilities for the BFI-2-S and BFI-2-XS observed in Study 1 may be positively
biased because they were computed in the same samples that we used to select items for these
short forms. Second, we administered the short forms embedded within the full BFI-2, rather
than administering each form separately (cf. Rammstedt & John, 2007; Smith et al., 2000).
THE BFI-2-S AND BFI-2-XS 26
Additional research is therefore needed to further examine the short-term retest reliability and
long-term stability of the BFI-2-S and BFI-2-XS, and to test whether the short forms’
measurement properties are affected by separate vs. embedded administration.
Third, our current recommendation that the BFI-2-S may be useful for examining facet-
level traits in reasonably large samples (approximately 400 or more observations), and that the
BFI-2-XS is not appropriate for examining facet traits, should be considered provisional. Our
recommendation is based on analyses of 20 criterion variables assessed in three partially
overlapping subsamples. Additional research using larger samples and broader sets of criteria
could provide more definitive conclusions. For example, future studies can test whether the BFI-
2-XS may be useful for examining facet traits in very large samples. The gains in statistical
power and precision that come from samples numbering in the thousands rather than hundreds of
observations may adequately compensate for the limited reliability entailed by using only a
single item to assess each facet (cf. Robins, Trzesniewski, Tracy, Gosling, & Potter, 2002).
Finally, additional research is needed to compare the BFI-2 short forms with other brief
Big Five measures, including the Ten Item Personality Inventory (TIPI; Gosling et al., 2003) and
the Mini-IPIP (Donnellan, Oswald, Baird, & Lucas, 2006). These brief measures differ in terms
of their length (ranging from 10 to 30 items), item format (adjective pairs for the TIPI vs. short
phrases for the Mini-IPIP and BFI-2), and scale development strategy (prioritizing simple factor
structure for the Mini-IPIP vs. content breadth for the TIPI and BFI-2). Comparative research
can therefore examine how these differences affect the measures’ reliability and validity.
Conclusion
The BFI-2 is a 60-item questionnaire that hierarchically assesses the Big Five personality
domains and 15 more-specific facet traits. The present research developed and validated a 30-
THE BFI-2-S AND BFI-2-XS 27
item short form (the BFI-2-S) and a 15-item extra-short form (the BFI-2-XS) of the BFI-2. These
abbreviated forms offer savings in assessment time over the full BFI-2. They also retain most of
the full measure’s reliability and validity, especially at the domain level. The BFI-2-S and BFI-2-
XS should therefore prove useful for assessing personality traits in research contexts where, due
to pressing concerns about assessment time or respondent fatigue, administering the full BFI-2
would not be feasible. For most studies, however, we recommend administering the full measure
due to its greater reliability and validity.
THE BFI-2-S AND BFI-2-XS 28
References
Aichholzer, J. (2014). Random intercept EFA of personality scales. Journal of Research in
Personality, 53, 1-4.
Ashton, M. C., Jackson, D. N., Paunonen, S. V., Helmes, E., & Rothstein, M. G. (1995). The
criterion validity of broad factor scales versus specific facet scales. Journal of Research
in Personality, 29, 432-442.
Bardi, A., & Schwartz, S. H. (2003). Values and behavior: Strength and structure of relations.
Personality and Social Psychology Bulletin, 29, 1207-1220.
Baron, H. (1996). Strengths and limitations of ipsative measurement. Journal of Occupational
and Organizational Psychology, 69, 49-56.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social
psychological research: Conceptual, strategic, and statistical considerations. Journal of
Personality and Social Psychology, 51, 1173-1182.
Cronbach, L. J., & Gleser, G. C. (1957). Psychological tests and personnel decisions. Urbana,
IL: University of Illinois Press.
Danner, D., Aichholzer, J., & Rammstedt, B. (2015). Acquiescence in personality questionnaires:
Relevance, domain specificity, and stability. Journal of Research in Personality, 57, 119-
130.
DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10 aspects
of the Big Five. Journal of Personality and Social Psychology, 93, 880-896.
Donnellan, M. B., Oswald, F. L., Baird, B. M., & Lucas, R. E. (2006). The Mini-IPIP scales:
Tiny-yet-effective measures of the Big Five factors of personality. Psychological
Assessment, 18, 192-203.
THE BFI-2-S AND BFI-2-XS 29
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist,
48, 26-34.
Goldberg, L. R., & Kilkowski, J. M. (1985). The prediction of semantic consistency in self-
descriptions: Characteristics of persons and of terms that affect the consistency of
responses to synonym and antonym pairs. Journal of Personality and Social Psychology,
48, 82-98.
Gosling, S. D., Rentfrow, P. J., & Swann, W. B. (2003). A very brief measure of the Big-Five
personality domains. Journal of Research in Personality, 37, 504-528.
Jackson, D. N., & Messick, S. (1958). Content and style in personality assessment.
Psychological Bulletin, 55, 243-252.
John, O. P., Hampson, S. E., & Goldberg, L. R. (1991). Is there a basic level of personality
description? Journal of Personality and Social Psychology, 60, 348-361.
John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the integrative Big-Five trait
taxonomy: History, measurement, and conceptual issues. In O. P. John, R. W. Robins, &
L. A. Pervin (Eds.), Handbook of personality: Theory and research (3rd ed., pp. 114-
158). New York, NY: Guilford.
John, O. P., & Soto, C. J. (2007). The importance of being valid. In R. W. Robins, R. C. Fraley,
& R. F. Krueger (Eds.), Handbook of research methods in personality psychology (pp.
461-494). New York, NY: Guilford.
McCrae, R. R., & Costa, P.T. (2008). The Five-Factor Theory of personality. In O. P. John, R.
W. Robins, & L. A. Pervin (Eds.), Handbook of personality: Theory and research (3rd
ed., pp.159-181). New York, NY: Guilford.
THE BFI-2-S AND BFI-2-XS 30
McCrae, R. R., & Costa, P. T. (2010). NEO Inventories professional manual. Lutz, FL:
Psychological Assessment Resources.
Paunonen, S. V., & Ashton, M. C. (2001). Big Five factors and facets and the prediction of
behavior. Journal of Personality and Social Psychology, 81, 524-539.
Pozzebon, J. A., & Ashton, M. C. (2009). Personality and values as predictors of self- and peer-
reported behavior. Journal of Individual Differences, 30, 122-129.
Rammstedt, B., & Farmer, R. F. (2013). The impact of acquiescence on the evaluation of
personality structure. Psychological Assessment, 25, 1137-1145.
Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item
short version of the Big Five Inventory in English and German. Journal of Research in
Personality, 41, 203-212.
Richard, F. D., Bond Jr, C. F., & Stokes-Zoota, J. J. (2003). One hundred years of social
psychology quantitatively described. Review of General Psychology, 7, 331-363.
Roberts, B. W., Chernyshenko, O. S., Stark, S., & Goldberg, L. R. (2005). The structure of
conscientiousness: An empirical investigation based on seven major personality
questionnaires. Personnel Psychology, 58, 103-139.
Robins, R. W., Trzesniewski, K. H., Tracy, J. L., Gosling, S. D., & Potter, J. (2002). Global self-
esteem across the life span. Psychology and Aging, 17, 423-434.
Ryff, C. D. (1989). Happiness is everything, or is it? Explorations on the meaning of
psychological well-being. Journal of Personality and Social Psychology, 57, 1069-1081.
Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the sins of short-form
development. Psychological Assessment, 12, 102-111.
THE BFI-2-S AND BFI-2-XS 31
Soto, C. J., & John, O. P. (in press). The next Big Five Inventory (BFI-2): Developing and
assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and
predictive power. Journal of Personality and Social Psychology.
Soto, C. J., John, O.P., Gosling, S.D., & Potter, J. (2008). The developmental psychometrics of
Big Five self-reports: Acquiescence, factor structure, coherence, and differentiation from
ages ten to twenty. Journal of Personality and Social Psychology, 94, 718-737.
Srivastava, S., Guglielmo, S., & Beer, J. S. (2010). Perceiving others’ personalities: Examining
the dimensionality, assumed similarity to the self, and stability of perceiver
effects. Journal of Personality and Social Psychology, 98, 520-534.
Stanton, J. M., Sinar, E. F., Balzer, W. K., & Smith, P. C. (2002). Issues and strategies for
reducing the length of self‐report scales. Personnel Psychology, 55, 167-194.
Summerfield, M., Freidin, S., Hahn, M., Li, N., Macalalad, N., Mundy, L., Watson, N., Wilkins,
R., & Wooden, M. (2015). HILDA user manual: Release 14. Melbourne, Australia:
Melbourne Institute of Applied Economic and Social Research, University of Melbourne.
Taylor, M. F., Brice, J., Buck, N., Prentice-Lane, E. (2010). British Household Panel Survey user
manual: Volume A. Introduction, technical report and appendices. Colchester, England:
University of Essex.
Wagner, G. G., Frick, J. R., Schupp, J. (2007). The German Socio-Economic Panel Study
(SOEP): Scope, evolution and enhancements. Schmollers Jahrbuch, 127, 139-169.
Westfall, J., & Yarkoni, T. (2016). Statistically controlling for confounding constructs is harder
than you think. PLoS ONE, 11, e0152719.
Wood, D., & Roberts, B. W. (2006). Cross‐sectional and longitudinal tests of the Personality and
Role Identity Structural Model (PRISM). Journal of Personality, 74, 779-809.
THE BFI-2-S AND BFI-2-XS 32
Table 1
Summary and Comparison of Measurement Properties for the BFI-2, BFI-2-S, and BFI-2-XS Domain Scales (Study 2)
Part-whole correlations
Alpha reliabilities
Self-peer correlations
BFI-2-S
BFI-2-XS
BFI-2
BFI-2-S
BFI-2-XS
BFI-2
BFI-2-S
BFI-2-XS
Domain scale
Inter.
Univ.
Inter.
Univ.
Inter.
Univ.
Inter.
Univ.
Inter.
Univ.
Univ.
Univ.
Univ.
Extraversion
.95
.96
.90
.92
.86
.87
.77
.78
.63
.66
.69
.66
.59
Agreeableness
.94
.94
.86
.87
.82
.83
.75
.75
.55
.49
.47
.49
.48
Conscientiousness
.96
.95
.89
.88
.88
.85
.78
.73
.61
.55
.53
.47
.45
Negative Emotionality
.97
.97
.92
.92
.90
.86
.84
.82
.73
.69
.62
.59
.56
Open-Mindedness
.95
.95
.88
.90
.84
.84
.74
.77
.57
.58
.47
.46
.42
Mean coefficient
.95
.95
.89
.90
.86
.85
.78
.77
.62
.59
.55
.53
.50
Proportion of BFI-2
domain scale
variance retained
.91
.91
.79
.80
—
—
.90
.91
.72
.70
—
.93
.82
Note. Inter. = Internet validation sample (N = 2,000). Univ. = University validation sample (N = 423; for self-peer correlations, N =
230). All correlations are statistically significant at p < .001. Proportion of BFI-2 domain scale variance retained equals the mean
squared part-whole correlation, the ratio of the mean alpha reliability for a short form compared to the full BFI-2, or the ratio of the
mean squared self-peer correlation for a short form compared to the full BFI-2.
THE BFI-2-S AND BFI-2-XS 33
Table 2
Loadings of the BFI-2-S Items in a Random Intercept EFA with Varimax Rotation (Study 2)
Domain, facet, and item text
Extra-
version Agree-
ableness Conscien-
tiousness Negative
Emotionality
Open-
Mindedness
Extraversion
Sociability
Is outgoing, sociable.
.69/ .76
.21/ .10
.04/ .02
-.10/-.08
.02/ .10
Tends to be quiet.
-.68/-.72
-.01/ .04
.08/ .07
.02/-.02
.03/ .01
Assertiveness
Is dominant, acts as a leader.
.61/ .59
-.19/-.16
.20/ .12
-.08/-.08
.17/ .11
Prefers to have others take charge.
-.50/-.43
.10/ .14
-.18/-.17
.14/ .15
-.20/-.21
Energy Level
Is full of energy.
.57/ .64
.18/ .14
.15/ .06
-.24/-.14
.03/-.09
Is less active than other people.
-.39/-.44
-.09/-.04
-.23/-.23
.24/ .22
-.14/-.16
Agreeableness
Compassion
Is compassionate, has a soft heart.
.12/-.03
.56/ .54
.02/ .13
.17/ .14
.13/ .23
Can be cold and uncaring.
-.11/-.10
-.66/-.71
-.09/-.05
.10/ .05
-.02/-.07
Respectfulness
Is respectful, treats others with respect.
.06/-.10
.52/ .50
.16/ .23
-.12/-.14
.11/ .17
Is sometimes rude to others.
.11/ .15
-.65/-.56
-.12/-.05
.21/ .28
-.06/-.08
Trust
Assumes the best about people.
.13/ .19
.51/ .56
-.01/ .02
-.13/-.20
.01/-.10
Tends to find fault with others.
-.04/-.06
-.52/-.57
-.05/ .01
.28/ .22
.00/ .01
Conscientiousness
Organization
Keeps things neat and tidy.
.08/ .04
.04/ .03
.72/ .69
-.04/-.01
-.07/ .00
Tends to be disorganized.
-.04/-.01
-.05/-.06
-.81/-.80
.09/ .12
.04/ .01
Productiveness
Is persistent, works until the task is finished.
.12/ .11
.08/ .03
.52/ .43
-.15/-.12
.10/ .10
Has difficulty getting started on tasks.
-.16/-.11
-.04/-.04
-.51/-.44
.21/ .22
-.03/ .08
Responsibility
Is reliable, can always be counted on.
.12/ .07
.25/ .26
.43/ .40
-.19/-.18
.06/ .20
Can be somewhat careless.
.02/-.09
-.25/-.19
-.49/-.43
.14/ .10
-.05/-.14
Negative Emotionality
Anxiety
Worries a lot.
-.14/-.19
-.03/-.07
-.03/-.04
.65/ .60
-.02/-.07
Is relaxed, handles stress well.
.12/ .11
.09/ .00
.07/ .07
-.69/-.65
.05/ .02
Depression
Tends to feel depressed, blue.
-.26/-.28
-.16/-.23
-.18/-.22
.61/ .51
.02/ .05
Feels secure, comfortable with self.
.31/ .35
.08/ .18
.21/ .19
-.52/-.47
.06/-.01
Emotional Volatility
Is temperamental, gets emotional easily.
.08/ .05
-.06/-.12
-.08/-.07
.70/ .70
-.05/-.06
Is emotionally stable, not easily upset.
.04/ .04
.08/ .07
.07/ .07
-.78/-.81
.02/-.03
THE BFI-2-S AND BFI-2-XS 34
Open-Mindedness
Aesthetic Sensitivity
Is fascinated by art, music, or literature.
-.01/-.09
.06/ .09
-.05/ .01
.07/ .11
.53/ .61
Has few artistic interests.
.00/ .04
-.01/-.10
.05/ .00
.00/-.05
-.57/-.65
Intellectual Curiosity
Is complex, a deep thinker.
.04/ .04
-.04/-.02
.04/ .15
.13/ .05
.48/ .46
Has little interest in abstract ideas.
.00/-.03
-.06/-.05
.09/-.03
.11/ .04
-.58/-.62
Creative Imagination
Is original, comes up with new ideas.
.22/ .27
.09/ .02
.07/ .00
-.11/-.16
.58/ .51
Has little creativity.
-.08/-.15
-.08/-.04
.01/ .00
.07/ .08
-.71/-.72
Note. BFI-2 items copyright 2015 by Oliver P. John and Christopher J. Soto. Reprinted with permission. EFA =
Exploratory factor analysis. Random intercept EFA constrains each item to load on an acquiescence method
factor, in addition to the substantive factors. Loadings left of the forward slash are from the Internet validation
sample (N = 2,000). Loadings right of the forward slash are from the university validation sample (N = 423).
The average loading on the acquiescence method factor was .14 (range = .13 to .23) in the Internet sample and
.15 (range = .12 to .24) in the university sample. Absolute loadings ≥ .30 are bolded.
THE BFI-2-S AND BFI-2-XS 35
Table 3
Loadings of the BFI-2-XS Items in a Random Intercept EFA with Varimax Rotation (Study 2)
Domain and item text
Extra-
version Agree-
ableness Conscien-
tiousness Negative
Emotionality
Open-
Mindedness
Extraversion
Tends to be quiet.
-.66/-.76
-.01/ .01
.07/ .07
.08/ .02
.02/-.03
Is dominant, acts as a leader.
.59/ .58
-.14/-.15
.22/ .14
-.07/-.09
.15/ .12
Is full of energy.
.55/ .59
.20/ .17
.17/ .09
-.28/-.19
.02/-.08
Agreeableness
Is compassionate, has a soft heart.
.11/-.06
.58/ .49
.04/ .20
.13/ .16
.10/ .22
Is sometimes rude to others.
.16/ .19
-.58/-.49
-.13/-.02
.23/ .30
-.08/-.08
Assumes the best about people.
.09/ .17
.51/ .55
.00/ .04
-.16/-.24
.04/-.07
Conscientiousness
Tends to be disorganized.
-.03/ .00
-.04/-.05
-.73/-.70
.10/ .11
.04/-.01
Has difficulty getting started on tasks.
-.15/-.08
-.03/-.04
-.54/-.45
.20/ .24
.01/ .10
Is reliable, can always be counted on.
.11/ .01
.22/ .21
.45/ .40
-.18/-.20
.03/ .25
Negative Emotionality
Worries a lot.
-.10/-.13
.00/-.07
-.03/-.03
.66/ .59
-.05/-.05
Tends to feel depressed, blue.
-.21/-.23
-.10/-.19
-.19/-.26
.67/ .55
.01/ .05
Is emotionally stable, not easily upset.
.01/ .02
.04/ .04
.10/ .11
-.71/-.74
.05/ .02
Open-Mindedness
Is fascinated by art, music, or literature.
.01/-.04
-.05/ .00
.09/-.02
.08/ .04
-.64/-.72
Has little interest in abstract ideas.
.02/-.08
.09/ .09
-.03/ .01
.09/ .14
.53/ .55
Is original, comes up with new ideas.
.22/ .24
.10/ .08
.10/-.04
-.11/-.17
.50/ .46
Note. BFI-2 items copyright 2015 by Oliver P. John and Christopher J. Soto. Reprinted with permission. EFA =
Exploratory factor analysis. Random intercept EFA constrains each item to load on an acquiescence method
factor, in addition to the substantive factors. Loadings left of the forward slash are from the Internet validation
sample (N = 2,000). Loadings right of the forward slash are from the university validation sample (N = 423).
The average loading on the acquiescence method factor was .14 (range = .12 to .18) in the Internet sample and
.15 (range = .12 to .21) in the university sample. Absolute loadings ≥ .30 are bolded.
THE BFI-2-S AND BFI-2-XS 36
Table 4
Associations of the BFI-2, BFI-2-S, and BFI-2-XS Domain Scales with Self-Reported and Peer-Reported Criteria (Study 2)
Correlations
Standardized regression coefficients
Criterion
Extra-
version
Agree-
ableness
Conscien-
tiousness
Negative
Emotionality
Open-
Mindedness
Extra-
version
Agree-
ableness
Conscien-
tiousness
Negative
Emotionality
Open-
Mindedness
Self-reported behavioral criteria
Conformity
-.34/-.33/-.32
.23/ .21/ .19
.13/ .13/ .15
.05/ .06/ .06
-.15/-.16/-.14
-.35/-.34/-.31
.26/ .22/ .19
.20/ .20/ .18
.10/ .08/ .09
-.15/-.15/-.14
Tradition
-.23/-.21/-.20
.05/ .04/ .06
.00/-.01/ .03
-.11/-.10/-.09
-.16/-.12/-.12
-.27/-.25/-.23
.01/ .02/ .03
.02/ .00/ .00
-.18/-.18/-.14
-.12/-.09/-.11
Benevolence
.08/ .06/ .01
.51/ .46/ .41
.20/ .19/ .21
-.11/-.13/-.09
.16/ .12/ .11
.05/ .02/ .00
.52/ .44/ .39
.06/ .09/ .15
.11/ .05/ .06
.05/ .06/ .06
Power
.33/ .33/ .39
-.51/-.45/-.44
-.12/-.11/-.13
.05/ .06/ .04
-.20/-.17/-.16
.40/ .41/ .41
-.49/-.43/-.41
-.07/-.07/-.07
-.04/ .03/ .00
-.17/-.17/-.15
Universalism
-.01/-.02/-.01
.14/ .15/ .18
-.05/-.04/-.06
.02/ .00/-.01
.14/ .13/ .12
.00/-.02/-.01
.17/ .16/ .20
-.11/-.08/-.10
.04/ .02/ .01
.13/ .13/ .10
Hedonism
-.06/-.08/-.05
-.21/-.20/-.23
-.38/-.36/-.34
.17/ .16/ .14
-.07/-.06/-.05
.04/ .01/-.02
-.10/-.11/-.16
-.36/-.33/-.30
.01/ .03/-.01
-.01/-.01/-.02
Stimulation
.23/ .22/ .21
-.12/-.10/-.11
-.23/-.22/-.23
-.02/-.02/-.02
.15/ .13/ .11
.26/ .25/ .21
-.10/-.08/-.08
-.33/-.29/-.26
-.09/-.05/-.07
.17/ .13/ .11
Security
-.22/-.20/-.20
.04/ .03/ .03
.21/ .19/ .17
.00/ .01/-.01
-.15/-.13/-.12
-.27/-.23/-.21
.00/ .00/ .01
.31/ .27/ .19
.02/ .02/ .00
-.14/-.13/-.11
Achievement
.20/ .19/ .17
-.09/-.06/-.04
.19/ .19/ .16
.01/ .02/ .02
-.02/-.01/-.02
.19/ .20/ .18
-.11/-.08/-.06
.22/ .20/ .19
.11/ .12/ .10
-.07/-.05/-.03
Self-Direction
.05/ .06/ .00
-.05/-.08/-.07
.04/ .02/-.02
-.05/-.07/-.02
.45/ .42/ .42
-.07/-.04/-.05
-.18/-.16/-.13
.00/-.02/-.02
-.11/-.12/-.06
.49/ .45/ .43
Self-reported well-being criteria
Positive relations
.41/ .40/ .32
.45/ .46/ .35
.30/ .26/ .27
-.39/-.39/-.40
.26/ .24/ .21
.33/ .32/ .26
.35/ .37/ .25
.00/-.02/ .06
-.14/-.16/-.24
.09/ .12/ .13
Environmental mastery
.44/ .45/ .34
.32/ .29/ .23
.60/ .58/ .60
-.59/-.55/-.54
.15/ .13/ .13
.22/ .24/ .23
.01/ .05/ .00
.38/ .38/ .45
-.37/-.34/-.31
.01/ .00/ .04
Purpose in life
.42/ .41/ .32
.30/ .29/ .21
.53/ .51/ .51
-.36/-.37/-.36
.25/ .25/ .21
.26/ .26/ .23
.08/ .10/ .05
.37/ .34/ .40
-.09/-.14/-.14
.10/ .12/ .13
Self-acceptance
.45/ .48/ .36
.34/ .33/ .23
.53/ .50/ .47
-.57/-.58/-.57
.22/ .21/ .19
.25/ .28/ .24
.05/ .10/ .03
.27/ .25/ .27
-.36/-.37/-.39
.07/ .09/ .10
Autonomy
.28/ .26/ .21
.20/ .16/ .11
.29/ .27/ .26
-.31/-.33/-.30
.40/ .37/ .33
.11/ .10/ .12
-.02/-.01/-.02
.12/ .11/ .14
-.22/-.24/-.21
.34/ .32/ .29
Personal growth
.31/ .31/ .20
.37/ .32/ .23
.33/ .31/ .30
-.22/-.22/-.23
.44/ .38/ .36
.18/ .22/ .14
.24/ .23/ .12
.13/ .12/ .19
.01/-.01/-.07
.32/ .29/ .31
Peer-reported criteria
Social connectedness
.27/ .23/ .23
.33/ .33/ .29
.21/ .18/ .17
-.27/-.27/-.27
.05/ .06/ .07
.22/ .17/ .18
.27/ .28/ .23
.04/ .04/ .06
-.08/-.11/-.13
-.03/ .02/ .04
Likability
.18/ .16/ .14
.26/ .24/ .22
.11/ .07/ .09
-.10/-.11/-.09
.09/ .11/ .12
.17/ .14/ .12
.26/ .23/ .20
.01/-.01/ .04
.06/ .01/ .01
.02/ .08/ .09
Stress resistance
.23/ .24/ .19
.19/ .16/ .13
.20/ .17/ .18
-.51/-.49/-.45
.00/-.01/ .03
.07/ .08/ .08
.00/ .00/-.01
-.02/-.02/ .03
-.50/-.47/-.42
.00/-.02/ .03
Positive affect
.37/ .36/ .34
.14/ .16/ .13
.24/ .21/ .21
-.43/-.43/-.41
-.03/-.01/-.02
.26/ .24/ .26
-.01/ .03/ .02
.05/ .04/ .07
-.33/-.32/-.31
-.07/-.04/-.03
Note. Positive relations = Positive relations with others. For behavioral criteria, N = 392 and absolute correlations ≥ .10 are statistically significant at
p < .05. For well-being criteria, N = 185 and absolute correlations ≥ .15 are statistically significant at p < .05. For peer-reported criteria, N = 230 and
absolute correlations ≥ .13 are statistically significant at p < .05. Absolute coefficients ≥ .20 are bolded.
THE BFI-2-S AND BFI-2-XS 37
Table 5
Proportion of Variance in Self-Reported and Peer-Reported Criteria Explained by the BFI-2,
BFI-2-S, and BFI-2-XS Domain Scales (Study 2)
Criterion
BFI-2 BFI-2-S BFI-2-XS
Self-reported behavioral criteria
Conformity
.23
.21
.19
Tradition
.10
.08
.08
Benevolence
.28
.22
.19
Power
.42
.37
.37
Universalism
.05
.04
.05
Hedonism
.16
.15
.14
Stimulation
.17
.15
.13
Security
.15
.11
.09
Achievement
.09
.08
.06
Self-Direction
.24
.21
.19
Mean
.19
.16
.15
Self-reported well-being criteria
Positive relations
.37
.38
.31
Environmental mastery
.55
.53
.52
Purpose in life
.39
.39
.36
Self-acceptance
.50
.52
.46
Autonomy
.27
.25
.22
Personal growth
.33
.29
.24
Mean
.40
.39
.35
Peer-reported criteria
Social connectedness
.18
.17
.16
Likability
.10
.08
.07
Stress resistance
.26
.24
.21
Positive affect
.25
.24
.23
Mean
.20
.18
.17
Grand mean
.25
.24
.21
Proportion of BFI-2 domain
scale validity retained
—
.93
.84
Note. For behavioral criteria, N = 392 and R2 ≥ .03 are statistically significant at p < .05. For
well-being criteria, N = 185 and R2 ≥ .06 are statistically significant at p < .05. For peer-reported
criteria, N = 230 and R2 ≥ .05 are statistically significant at p < .05. Proportion of BFI-2 domain
scale validity retained equals the ratio of mean criterion variance explained by a short form
compared to the full BFI-2.
THE BFI-2-S AND BFI-2-XS 38
Table 6
Summary and Comparison of Measurement Properties for the BFI-2 and BFI-2-S Facet Scales
(Study 2)
Part-whole
correlations
Alpha reliabilities
Self-peer
correlations
BFI-2-S
BFI-2
BFI-2-S
BFI-2
BFI-2-S
Facet scale
Inter.
Univ.
Inter.
Univ.
Inter.
Univ.
Univ.
Univ.
Extraversion
Sociability
.95
.95
.83
.84
.70
.71
.72
.69
Assertiveness
.91
.92
.76
.75
.72
.67
.57
.50
Energy Level
.87
.89
.70
.69
.60
.49
.49
.50
Agreeableness
Compassion
.87
.86
.59
.65
.48
.55
.38
.41
Respectfulness
.91
.89
.70
.69
.48
.48
.37
.37
Trust
.90
.90
.68
.71
.53
.60
.48
.44
Conscientiousness
Organization
.94
.94
.83
.82
.79
.75
.56
.52
Productiveness
.92
.88
.77
.70
.58
.45
.51
.44
Responsibility
.90
.89
.71
.62
.47
.39
.32
.23
Negative Emotionality
Anxiety
.92
.91
.77
.77
.65
.67
.56
.48
Depression
.94
.94
.81
.80
.67
.66
.50
.50
Emotional Volatility
.94
.94
.83
.83
.75
.76
.56
.50
Open-Mindedness
Intellectual Curiosity
.89
.91
.66
.73
.42
.57
.52
.45
Aesthetic Sensitivity
.90
.93
.73
.82
.54
.69
.48
.38
Creative Imagination
.92
.91
.75
.77
.64
.65
.30
.31
Mean coefficient
.91
.91
.74
.75
.60
.61
.49
.45
Proportion of BFI-2
facet scale
variance retained
.83
.83
—
—
.81
.81
—
.85
Note. Inter. = Internet validation sample (N = 2,000). Univ. = University validation sample (N =
423; for self-peer correlations, N = 230 and all are statistically significant at p < .001). Proportion
of BFI-2 facet scale variance retained equals the mean squared part-whole correlation, the ratio
of the mean alpha reliability for the BFI-2-S compared to the full BFI-2, or the ratio of the mean
squared self-peer correlation for the BFI-2-S compared to the full BFI-2.
THE BFI-2-S AND BFI-2-XS 39
Appendix
The BFI-2 Short Forms and Scoring Information
Here are a number of characteristics that may or may not apply to you. For example, do you agree that you are
someone who likes to spend time with others? Please write a number next to each statement to indicate the extent
to which you agree or disagree with that statement.
1
Disagree
strongly
2
Disagree
a little
3
Neutral;
no opinion
4
Agree
a little
5
Agree
strongly
I am someone who...
1. Tends to be quiet.
2. Is compassionate, has a soft heart.
3. Tends to be disorganized.
4. Worries a lot.
5. Is fascinated by art, music, or literature.
6. Is dominant, acts as a leader.
7. Is sometimes rude to others.
8. Has difficulty getting started on tasks.
9. Tends to feel depressed, blue.
10. Has little interest in abstract ideas.
11. Is full of energy.
12. Assumes the best about people.
13. Is reliable, can always be counted on.
14. Is emotionally stable, not easily upset.
15. Is original, comes up with new ideas.
16. Is outgoing, sociable.
17. Can be cold and uncaring.
18. Keeps things neat and tidy.
19. Is relaxed, handles stress well.
20. Has few artistic interests.
21. Prefers to have others take charge.
22. Is respectful, treats others with respect.
23. Is persistent, works until the task is finished.
24. Feels secure, comfortable with self.
25. Is complex, a deep thinker.
26. Is less active than other people.
27. Tends to find fault with others.
28. Can be somewhat careless.
29. Is temperamental, gets emotional easily.
30. Has little creativity.
Please check: Did you write a number in front of each statement?
Note. BFI-2 items copyright 2015 by Oliver P. John and Christopher J. Soto. Reprinted with permission.
Scoring the BFI-2-S Domain and Facet Scales
The BFI-2-S includes items 1 to 30 shown on the above administration form. Item numbers for scoring
the BFI-2-S domain and facet scales are listed below; false-keyed items are denoted by “R.” Due to the limited
reliability of the two-item facet scales, we only recommend using them in samples with approximately 400 or
more observations. For downloadable versions and more information about the BFI-2, visit the Colby Personality
Lab website (http://www.colby.edu/psych/personality-lab/).
Extraversion: 1R, 6, 11, 16, 21R, 26R
THE BFI-2-S AND BFI-2-XS 40
Sociability: 1R, 16
Assertiveness: 6, 21R
Energy Level: 11, 26R
Agreeableness: 2, 7R, 12, 17R, 22, 27R
Compassion: 2, 17R
Respectfulness: 7R, 22
Trust: 12, 27R
Conscientiousness: 3R, 8R, 13, 18, 23, 28R
Organization: 3R, 18
Productiveness: 8R, 23
Responsibility: 13, 28R
Negative Emotionality: 4, 9, 14R, 19R, 24R, 29
Anxiety: 4, 19R
Depression: 9, 24R
Emotional Volatility: 14R, 29
Open-Mindedness: 5, 10R, 15, 20R, 25, 30R
Aesthetic Sensitivity: 5, 20R
Intellectual Curiosity: 10R, 25
Creative Imagination: 15, 30R
Administering and Scoring the BFI-2-XS Domain Scales
The BFI-2-XS is the first half of the BFI-2-S. Therefore, the BFI-2-XS includes items 1 to 15 shown on
the above BFI-2-S administration form, and the BFI-2-XS domain scales are scored from the first three items
listed for each domain scale on the above BFI-2-S scoring key. Due to the limited reliability of single items, we
do not recommend using the BFI-2-XS to examine facet-level traits.