Content uploaded by Christian Kandler
Author content
All content in this area was uploaded by Christian Kandler on Nov 04, 2017
Content may be subject to copyright.
Construct Validation Using Multitrait-Multimethod-Twin
Data: The Case of a General Factor of Personality
RAINER RIEMANN*and CHRISTIAN KANDLER
Department of Psychology, University of Bielefeld, Bielefeld, Germany
Abstract
We describe a behavioural genetic extension of the classic multitrait-multimethod study
design that allows estimating genetic and environmental influences on method effects in
twin studies (MTMM-T). Genetic effects and effects of the environment shared by siblings
are interpreted as indicators of convergent validity. In an application of the MTMM study
design, we used self- and peer report data to examine the higher-order structure of the
NEO-PI-R. Structural equation modelling did not support a general factor of personality in
multimethod data. The higher-order factor Stability turns out to be, at most, a weak trait
factor. Genetic effects on method factors indicate that especially self-reports but also peer
reports show convergent validity between twins but not between methods. Copyright #
2010 John Wiley & Sons, Ltd.
Key words: personality traits; construct validity; behavioural genetics; twin; general
factor of personality
INTRODUCTION
When Campbell and Fiske (1959) introduced their model to validate psychological
measures, the focus was on the relative influence of two sources of variance: trait variance
and method variance. Psychological tests were conceptualized as trait-method units
emphasizing that any psychological measurement confounds trait variance with method
variance. Only by varying methods and traits systematically, both sources of variance can
be disentangled. The present paper focuses on the two most frequently used methods of
personality measurement – self- and peer reports – in order to show that behavioural
genetic designs to collect personality data provide an important extension of the classical
multitrait-multimethod (MTMM) analysis. We demonstrate the usefulness of this model in
a study of a general factor of personality (GFP) in self- and peer reports on the Revised
NEO Personality Inventory (NEOPI-R; Costa & McCrae, 1992) using a German twin
sample.
European Journal of Personality
Eur. J. Pers. 24: 258–277 (2010)
Published online in Wiley InterScience
(www.interscience.wiley.com) DOI: 10.1002/per.760
*Correspondence to: Rainer Riemann, Department of Psycho logy, Biele feld Unive rsity, Universit a
¨tsstr. 25,
D-33615 Bielefeld, Germany. E-mail: rainer.riemann@uni-bielefeld.de
Copyright #2010 John Wiley & Sons, Ltd.
Received 28 October 2009
Revised 20 January 2010
Accepted 21 January 2010
In MTMM analyses, method variance refers to all effects a procedure of data collection
has on the covariation of traits measured by this procedure. Although there are some basic
categories for methods of personality measurement (like self-report, observer rating and
objective test), the definition of method and as a consequence the degree to which method
effects are controlled for depends on the aims of a particular study. Method effects can be
studied within these categories, by contrasting, for example positively or negatively scored
items of self-report questionnaires or between categories using data from multiple sources.
In addition, if we consider a broader sample of personality traits, a single mode of data
collection (e.g. observer ratings) may be linked to more than one method factor. For
example observers’ knowledge of target persons’ physical characteristics may distort
ratings on a number of activity or extraversion related traits, whereas knowledge about
targets’ mental capacities may differentially affect ability related trait judgments. Thus, the
pattern of correlations among traits measured by observer ratings may give rise to two
separate method factors.
Thus, estimates of method effects in MTMM analyses obviously depend on the specific
combination of methods chosen for a particular study (Eid, Lischetzke, Nussbeck, &
Trierweiler, 2003). For Campbell and Fiske (1959) method variance has been almost
inevitably invoked by irrelevant features of the measurement procedure. However, they
also emphasize that the distinction between trait and method is ‘relative to the test
constructor’s intent. What is an unwanted response set for one tester may be a trait for
another...’ (Campbell & Fiske, 1959, p. 85). While, for example much of the covariance
among explicit self-reported personality traits may reflect method variance in comparison
to implicit personality measures and/or physiological data (Egloff, Wilhelm, Neubauer,
Mauss, & Gross, 2002; Grumm & von Collani, 2007; Schultheiss & Brunstein, 2001) there
is a longstanding tradition in personality research to study self-concepts.
The scientific study of personality requires the use of objective measures that capture the
nature of a person independent of the particular scientist who applies the measure.
Objectivity is not an issue for most current measures of personality – like standardized tests
– as long as the attribute that is measured is defined operationally. As soon as we interpret
objectively registered responses as indicators of hypothetical or latent constructs the
objectivity of these inferences is a controversial matter. To ensure the objectivity of such
inferences is the central concern of construct validation procedures (Cronbach & Meehl,
1955) and the demonstration of convergent validity of independent measures is the central
means to achieve it. Campbell and Fiske’s (1959) emphasis on discriminant validity, in
addition, helps to maintain an organized system of psychological concepts, by requiring
that hetero-method measures of the same constructs correlate stronger than mono-method
measures of different constructs. This has important consequences for personality research
that go beyond the mere validation of personality measures.
We doubt that there will ever be a general agreement upon ‘master measures’ of
personality constructs (like, e.g. a set of gene loci) that serve as points of reference for the
calibration of other more economic measures. Thus, personality theory must rely on that
part of the measured variance that is shared by methods, which reflects consensually
validated variance of a personality construct (i.e. trait variance, Campbell & Fiske, 1959,
also denoted as universe variance in generalizability theory by Cronbach, Gleser, Nanda, &
Rajaratnam, 1972, or simply target variance, Hoyt, 2000). It is not sufficient to demonstrate
convergent and discriminant validity of a personality measure and then to proceed, as if
there is no method variance. For example if molecular genetic studies find an association
between a genetic polymorphism and a self-report measure of some trait – given the
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 259
average effect size of these associations – there is no way to decide whether the association
is with the method or with the trait.
Hofstee (1994) argues against the use of self-reports in personality research, because
they are inherently subjective. Acknowledging that individuals may have a privileged
access to information about their motives, preferences or secrets, he concludes: ‘If we
would emphasize subjective aspects, we would withdraw personality from scientific study.
In a scientific context, personality is by definition a public phenomenon’. (Hofstee, 1994, p.
155). Hofstee advocates the use of mean peer report measures, since judgment errors
cannot be averaged out in self-report measures.
In sum, the classical MTMM design in combination with modern structural equation
modelling allows one to quantify the relative influence of method effects and traits on
personality measures as well as to test specific models of the structure of latent personality
constructs adjusted for method distortions. Combining the MTMM analysis with
genetically informative data, on the one hand, offers insight into the sources of personality
traits that are not confounded with method effects and, on the other hand, allows a
decomposition of method effects into genetic and environmental effects, which hints at the
nature of method variance. We limit our discussion to self- and peer report measures of
personality, because the meaning of method and trait variance cannot be analysed without
taking specific method–trait combinations into account, although a behavioural genetic
extension of the MTMM analyses can meaningfully be applied to many method–trait
combinations (e.g. the measurement of intelligence by standard IQ-tests and elementary
cognitive tasks; Neubauer, Spinath, Riemann, Borkenau, & Angleitner, 2000).
We name our extension of the MTMM study design multitrait-multimethod-twin model
(MTMM-T). As in the classic MTMM design there are at least two methods (raters) for
each trait. In addition, measures are collected in a genetically informative study design
using mono- and dizygotic twins reared together (see Figure 1). The variance of observed
variables (T
1
for trait 1 and T
2
for trait 2 measured by two methods M
1
and M
2
for each twin
sibling Xand for the co-twin Y) is decomposed into a trait component (T
1
/T
2
), a method
component (M
1
/M
2
) and a random error component (e). Thus, if we focus on the data for a
single sibling, we have the classical MTMM analysis. The genetically informative study
design enables us to decompose both the method and trait factors into biometric
components, for example into an additive genetic component (A), an environmental
component shared by twins (C) and a nonshared environmental component not shared by
twins (E) in Figure 1. The combination of MTMM analysis and biometric modelling adds
substantial to the complexity of the structural equation model, but basically we add four
(Figure 1; two traits and two method factors) univariate behavioural genetic analyses to the
MTMM model.
Genetic and shared environmental effects on methods and traits contribute to the
covariation of measures between twin siblings, whereas Ehas no effect on the twin
covariation. Eis a residual effect, which is estimated from the comparison of correlations
within individuals (twin sibling) with correlations within twin pairs. As it is true for the
classical MTMM analysis, methods affect correlations between different traits measured
by the same method, and trait variance is the source for correlations of traits across
methods. Thus, Aand Cmethod effects increase hetero-trait mono-method correlations
between the two siblings of a twin pair and Aas well as Ctrait effects contribute to mono-
trait hetero-method correlations between twins. If there is method variance that has neither
a genetic nor a shared environmental component, it must by definition be due to nonshared
environmental influences.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
260 R. Riemann and C. Kandler
As can be seen from Figure 1, univariate or mono-trait mono-method behavioural
genetic analyses confound trait and method effects. As a rule of thumb, the estimates of
genetic and environmental effects from mono-method studies equal the averaged
environmental and genetic effects on method and trait factors in multimethod studies,
weighted by the relative strength of method and trait factors. In general, estimates from
mono-method studies may be grossly misleading, if method and trait factors differ in their
aetiology. However, the few existing studies using self-and peer report personality
measures show convergent estimates (e.g. Riemann, Angleitner & Strelau, 1997), while
some but not all observational studies of personality (see Borkenau, Riemann, Spinath, &
Angleitner, 2000, for a review) indicate effects of the shared environment that are not found
in self- and peer report studies.
The decomposition of trait variance (controlled for method effects) into genetic and
environmental components is important from a behavioural genetic perspective. But there
are two caveats for the interpretation of these components. First, depending on the
Figure 1. The minimal multimethod-multitrait-twin (MTMM-T) model. Two methods (M
1
and M
2
) are shown
for every trait (T
1
and T
2
)twin (Xand Y) combination. T¼convergent valid trait factor; M¼method factor;
e¼measurement error; X¼observed variable of one twin sibling; Y¼observed variable of co-twin; a¼1.0 for
MZ twins and 0.5 for DZ twins; D¼1.0 for all twins; A¼additive genetic factors; C¼shared environmental
factors; E¼nonshared environmental factors.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 261
convergent validity of the specific methods used, trait variance common to different
measures may explain only a small fraction of the observed variance. If we measure
personality with reliable instruments via self- and peer reports (averaged across two peers
who know the target person very well) roughly one-third of the variance is shared by both
measures since the correlation between these measures usually is in the range of .50–.60.
Second, from the MTMM-T analysis we have little information about the aetiology of
correspondence among measures. To study traits independent of method effects does not
necessarily imply that we tap on the core of traits more directly. As we have argued before,
convergence of methods is a central requirement for the scientific study of personality
concepts and there are costs associated with scientific rigor. Interesting and theoretically
important facets of personality captured by only one method require additional validation
processes. It may, for example well be that the convergence in self- and peer ratings
actually captures those aspects of personality that are most under our conscious control
reflecting those aspects of personality we choose to present (Johnson, 24th August 2009).
Correlations between self-report measures from independent individuals who share their
genetic makeup and were reared in the same family (monozygotic twins) indicate a lower
limit of the convergent validity of these measures. The convergent validity is
underestimated to the degree that environmental experiences not shared by the twins
have an effect on the measure. In the Jena Twin Study of Social Attitudes (Sto
¨ßel, Ka
¨mpfe,
& Riemann, 2006), for example convergence between twins’ self-reports for the highly
reliable domain scores of the NEO-PI-R (Costa & McCrae, 1992) was .61, .58, .60, .52, .57
(intra-class correlations, N¼226) for Neuroticism, Extraversion, Openness, Agreeable-
ness and Conscientiousness, while the corresponding correlations between self- and
averaged peer reports were on average slightly lower (r¼.50, .62, .58, .44 and .53;
averaged across two twin siblings, N¼394). Self-report method variance, controlled for
trait and random error effects (i.e. the self-report method factor), may on the one hand
reflect all kinds of self-rater biases (see Paulhus & John, 1998, for a review), like individual
differences in self presentation or stereotypes shared by twins. On the other hand, it may
capture important aspects of the twins’ personality that are not accessible to the peer
judgment (Kraemer, Measelle, Ablow, Essex, Boyce, & Kupfer, 2003). The MTMM-T
allows decomposing this variance into genetic and environmental components.
Idiosyncratic judgment processes as well as specific environmental influences on the
use of self-report inventories (e.g. mediated by verbal comprehension), on self-concepts,
and/or on judgmental biases are reflected in the nonshared environmental component of the
self-report method factor. Genetic and shared environmental effects on the self-report
method factor may reflect (validated) deviations from the trait score that are only accessible
to self-report measures on the one hand or rater bias (shared by twins) that has a genetic or
environmental basis on the other hand. From the variance decomposition alone, little can
be said about the nature of these effects since they may reflect diverse mechanisms like
influences of cognitive abilities on the measure, shared forms of self-presentation, self-
perception or mere response style variance.
The interpretation of systematic effects on the peer report method factor is easier, if
independent peers provide the measures. In this case, rater biases should not contribute to the
correlation between twin siblings. The most reasonable interpretation of biases which
contribute to genetic and shared environmental effects on peer report method factors is that
stereotypes shared within the population from which the raters are drawn (Letzring, Wells, &
Funder, 2006) and again (validated) deviations from the trait score are the source for these
effects. However, as quality and quantity of personality-relevant information should reduce
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
262 R. Riemann and C. Kandler
stereotypes (Funder, Kolar, & Blackman, 1995; Letzring et al., 2006), well-informed
acquaintances should not provide personality assessments affected by stereotypes.
In sum, we have outlined here a behavioural genetic extension of the MTMM analysis.
Both mono-trait as well as hetero-trait twin correlations (cross-correlations) offer insight
into the sources of method variance. This is especially important for methods based on self-
reports, because correlations between self-reports provided by monozygotic twin siblings
are estimates of the lower bound of their convergent validity. Fitting structural equation
models derived from the basic MTMM-T model to empirical data will usually require
additional specifications of correlations among traits or their higher-order structure as well
as specifications of the method factor structure. Our model shares the basic idea to use
behavioural genetic data for the study of fundamental measurement problems with Bartels,
Boomsma, Hudziak, van Beijsterveldt and van den Oord’s (2007) model of rater (dis-)
agreement. It should be noted, however that the model elaborated by Bartels et al. requires
ratings of both twin siblings provided by the same observer (e.g. mother and father rate
both twin children) while our model requires methods to be independent (i.e. peer reports).
McCrae et al. (2008, Study 3) used a reduced version of the MTMM-T model to examine
the higher-order structure of the NEO-PI-R analysing the covariation among the NEO-PI-R
domain scores in a twin sample. In addition to five trait factors (Neuroticism, Extraversion,
Openness, Agreeableness and Conscientiousness), two higher-order trait factors – Digman’s
(1997) a, or socialization and b, or personal growth – were considered. The a-factor was
defined by low Neuroticism (N) and high Agreeableness (A) and Conscientiousness (C); b
was defined by Extraversion (E) and Openness (O). Based on previous research (Biesanz &
West, 2004; Paulhus & John, 1998) two correlated method factors were included in the model,
designated A-bias and B-bias. Comparisons between nested models indicated that both
method factors as well as higher-order trait factors were important to account for the
covariation among scale scores. Additional analyses revealed genetic and nonshared
environmental influences on the self-report method factors. However, McCrae et al. did not
report a full decomposition of trait and method factors into genetic and environmental sources
of variance.
In a study of the structure of the five personality domains measured with the NEO-PI-R
Kandler, Riemann, Spinath and Angleitner (in press) applied the full MTMM-T model. In
each of the five analyses (one for each domain) the variance and covariance of the six facets
was decomposed into six traits factors (one for each facet) and two method factors (self- vs.
peer report). To account for the correlations among the facets a higher-order trait-factor
was included in the model. On average, self- and peer reports – measured at the facet level
of the Five Factor model – shared 19.4% of their variance (r¼.44 on average). Genetic
effects on the domain factors, controlled for facet-specific, method-specific and error
variance, explained on average about 63% of the variance, nonshared environmental effects
the remaining 37%. The self-report method factors were moderately (51%) influenced by
genetic effects whereas environmental influences shared by twins were negligible on
average (5%). For the peer report method factors substantially different sources were
found. Genetic influences were small (18%), shared environmental effects were again
negligible (1%) and the nonshared environment explained the remaining variance. On the
one hand, genetic effects on the self-report method factor may reflect genetic effects on
response distortions (e.g. response styles, self-enhancement). On the other hand, self-
reports may be partially based on information that is not accessible to peers or weighted
less in peers’ personality judgments (like motives, mental states). Genetic effects on
averaged peer reports should reflect real individual differences. They may result from more
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 263
accurate comparisons of the target person with other persons that are not accessible to self-
raters. Again, differences in weighting the same information may play a role. Finally, it
cannot be ruled out that stereotypes shared by observers (e.g. inferences based on physical
characteristics; Kenny, 1994) contribute to a genetic component in peer reports.
An application of the multitrait-multimethod-twin model
In recent years, there has been a renewed interest in a general factor of the Big Five (Musek,
2007). However, the practical utility and empirical substance of a postulated GFP is
strongly questionable (Ashton, Lee, Goldberg, & de Vries, 2009; Ba
¨ckstro
¨m, Bjo
¨rklund, &
Larsson, 2009; Biesanz & West, 2004; DeYoung, 2006). We used self- and peer reports on
the NEO-PI-R in a study of twins reared together to illustrate the MTMM-T model and
extended the higher-order factor MTMM-T analyses of McCrae et al. (2008) to hierarchical
analyses comparing different models with different levels of trait generality (Big Five and
Big Two) and a general factor at the highest level (Big One). In addition, we established full
variance decomposition in genetic and environmental effects on each trait-level as well as
on A- and B-bias method factors and correlations between them.
The focus of our analyses is on the hierarchical structure of personality factors. We did
not take into account that correlations among the NEO-PI-R domain scales may be due to
an unbalanced representation of facet scales in personality inventories that measure
blends of orthogonal factors (Ashton et al., 2009). However, applying a model formally
analogous to the blended variables model suggested by Ashton et al., we were able to test,
whether the higher order structure resulted from isolated correlations among domain
scores that were assigned to different second order factors (e.g. between Neuroticism and
Extraversion).
METHOD
For the present analyses we used the same data as in Kandler et al. (in press). Thus, we
summarize the data collection just briefly here.
Participants
We combined data from the third wave of the Bielefeld Longitudinal Study of Adult Twins
(BiLSAT; Spinath, Angleitner, Borkenau, Riemann, & Wolf, 2002) and the sample of the
Jena Twin Study of Social Attitudes (JeTSSA; Sto
¨ßel et al., 2006). In contrast to a previous
analysis, using a combined sample of BiLSAT and JeTSSA (McCrae et al., 2008), we
included data from unmatched twin pairs (UM), when data were available for one twin
sibling without zygosity diagnosis (N¼81). The resulting sample consisted of 1615
individuals from 919 twin pairs (433 MZ, 263 DZ and 223 UM), who were between 17 and
82 years old (M¼36.3, SD ¼13.1); 1243 (77%) were females. Twins were instructed to
ask acquaintances, who knew them but not their twin sibling very well, to provide the peer
judgments. Thus, there were different peers for each twin sibling. Most peers were friends
and spouses. For 92.6% of the twins at least one peer report was available and for 86.1%
two peer reports were available.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
264 R. Riemann and C. Kandler
Measures
Twins completed the self-report and peers the peer report version of the German NEO
Personality Inventory Revised (NEO-PI-R; Ostendorf & Angleitner, 2004), which measures
the personality domains Neuroticism, Extraversion, Openness, Agreeableness and
Conscientiousness (Cronbach’s aranges between .87 for Agreeableness and .92 for
Neuroticism, Ostendorf and Angleitner, German normative sample). Peer reports were
averaged for each target.
Since age and sex effects can bias twin correlations, self- and averaged peer reports of
the domain scores were adjusted for sex and linear as well as quadratic age effects using a
regression procedure, and the regression residuals were used in subsequent analyses. We
estimated variance–covariance matrices for each group (MZ, DZ and UM) using an
Expectation Maximization (EM) algorithm (Little & Rubin, 2002) for handling missing
values.
Structural equation modelling
As outlined before, we decomposed the variance of the observed variables X(phenotype of
one twin sibling) and Y(phenotype of the other) into trait, method and random error
components. In this application, however, the basic MTMM-T model (Figure 1) was
extended with respect to both the method and the trait structure. Figure 2 shows the full
extended phenotypic model without overlaying the genetically informative structure. Two
correlated method factors were included for each mode of data collection (self- vs. peer-
report): A-bias and B-bias (McCrae et al., 2008). A-bias is the tendency to distort self-
descriptions in a way to enhance approval by others and is also called moralistic bias
(Paulhus & John, 1998). It is related to negative valence (Tellegen, Grove, & Waller, 1991)
and has loadings on Neuroticism, Agreeableness and Conscientiousness. B-bias – called
egoistic bias by Paulhus and John – designates peoples’ tendency to describe themselves in
glowing terms, is related to the need for power (Paulhus & John), and has loadings on
Extraversion and Openness.
Although these method factors are primarily discussed as distortions of self-descriptions
(Paulhus & John, 1998), informant-specific effects noted by Biesanz and West (2004)
suggest that they may operate in observers, too. Although there is some discussion, to what
degree A-bias and B-Bias might reflect true personality variance, in our model they are
entered as method (bias) factors. Self-report measures (manifest variables, rectangles in
Figure 2, with indices of SR) of reflected Neuroticism scores (Emotional Stability, ES),
Agreeableness (A) and Conscientiousness (C) load on self-report A-bias (A-Bias
SR
), the
respective peer report measures (manifest variables, rectangles in Figure 2, with indices of
PR) load on peer report A-bias (A-Bias
PR
). The measures of Extraversion (E) and
Openness (O) have loadings on self-report and peer report B-bias (B-Bias
SR
and B-Bias
PR
).
A- and B-biases are allowed to correlate within methods (s
SR
and s
PR
) but not between
methods. Thus, these method factors affect correlations among self-report or peer report
measures, but not between self-report and peer report measures. To the degree that they
have genetic or shared environmental sources, they contribute to the mono-method (self- or
peer) correlation between twins but not to hetero-method correlations.
Eight hierarchically organized trait factors were considered in our model (see the right
side of Figure 2). Five trait factors are at the lowest level of the trait hierarchy representing
the domains of the NEO-PI-R (ES, E,O,Aand C). Self- and peer report measures have
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 265
loadings on the corresponding trait factor. At the next level there are two trait factors:
Stability and Plasticity (DeYoung, 2006). The Eand Otrait factors have loadings on
Plasticity, while ES, Aand Ctrait factors load on Stability. At the highest level of the trait
hierarchy we included a GFP (Musek, 2007), which results from a correlation between
Stability and Plasticity.
Previous research on the higher-order structure of personality has not considered small
but significant correlations between Emotional Stability (Neuroticism) and Extraversion
(e.g. DeYoung, Peterson, & Higgins, 2002; Digman 1997). However, the correlation has
often been found across different measures of the Big Five, different methods and several
languages (Ba
¨ckstro
¨m et al., 2009; Costa & McCrae, 1992; Graziano & Ward, 1992; John,
Goldberg, Angleitner, 1984; Ostendorf & Angleitner, 2004; Yik & Bond, 1993). Since this
correlation is not considered by the two-factor structure of Plasticity and Stability, it may
artificially lead to a significant GFP solution, although all other correlations which are
assumed to be zero in the two-factor solution (between Eand A,Eand C,Oand ES, Oand
A,Oand C) may in fact be zero. We took the correlation between Eand ES explicitly into
account by allowing for secondary loadings (formally similar to blended facet scales in
Ashton et al., 2009) from measures of E(SR
E
and PR
E
) on the ES-factor (dashed arrows in
Figure 2). We denote these models as blended factor models.
Figure 2. The full phenotypic Big 5 þ2þ1 model. SR, self-report; PR, peer report; E, Extraversion,
O, Openness; ES, Emotional Stability; A, Agreeableness; C, Conscientiousness; GFP, general factor of
personality; empty arrows, random measurement error; s
SR
, covariance between self-report-specific A- and
B-bias; s
PR
, covariance between peer-report-specific A- and B-bias; dashed arrows, blended variable loadings
from measures of E on the ES-factor; further description in the text.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
266 R. Riemann and C. Kandler
The model presented in Figure 2 reflects the full phenotypic model which was named
Big 5 þ2þ1
model (the asterisk stands for blended factor models allowing for secondary
loadings from measures of Eon the ES-factor). Reduced model modifications are nested in
the full model: a model allowing only for the two-factor solution (Big 5 þ2
), and a model
without any higher-order structure (Big 5
). A model allowing only for the GFP (Big
5þ1
) with direct paths to the Big Five was also tested. These four models were also
analysed without the blended factor loadings. Thus, eight models were compared whereby
the random error and systematic method components were included in all models. The
models were fitted to the EM variance–covariance matrices via maximum likelihood using
the statistical software package AMOS 17.0 (Arbuckle, 2007). Nested models were
compared by using the x
2
-difference test. Not nested models were descriptively compared
by the comparative fit index (CFI). The overall model fit was evaluated by the root-mean-
square error of approximation (RMSEA) in conjunction with its 90% confidence interval.
Not shown in Figure 2, each trait and each method factor was decomposed into an
additive genetic component (a
2
), a genetic dominance component (d
2
) and a nonshared
environmental component (e
2
). Consequently, each model modification about one
phenotypic factor includes three degrees of freedom. Since previous research on genetic
and environmental effects on NEO-PI-R domain scores (see Bouchard & Loehlin, 2001,
for a review), and – more importantly – an inspection of the twin correlations do not
indicate systematic effects of the environment shared by siblings, we did not consider
shared environmental effects in our models. To the degree that assortative mating of twins’
parents, gene–environment interaction and gene–environment correlation affect the
phenotypes observed in our study, parameter estimates will be distorted (see Neale &
Maes, 2004, for more details).
RESULTS
In order to provide an impression of trait and method effects, the classic MTMM
correlation matrix is presented in the Appendix. Since this matrix is quite complex, we will
only sketch the main trends here and then focus on SEM results. Since we want to explore
the higher order structure of the NEO-PI-R domain scores, hetero-trait correlations are in
the focus of the inspection of Appendix. Two points are quite obvious: First, as expected,
phenotypic mono-method hetero-trait correlations within twin siblings are substantially
higher than the corresponding hetero-method correlations, indicating substantial method
effects. Judged from averaging hetero-trait correlations within mono- versus hetero-
method same twin-sibling versus cross-twin blocks, hetero-method correlations calculated
within twin siblings are of about the same size as mono-method correlations across twin
siblings. Second, there is hardly any evidence for a substantial heritable GFP. A heritable
GFP that generalizes across methods of measurement would result in substantial
correlations of phenotypic trait scores between self- and peer reports across monozygotic
twin siblings (Table 1). However, only four of the 10 averaged correlations among the
NEO-PI-R domain scores are greater than or equal .10, while the mono-trait correlations
are larger than .25. This pattern severely questions the validity of a GFP.
SEM fit statistics are presented in Table 2. All phenotypic models showed an acceptable
up to a good overall model fit (upper limits of RMSEA <0.08; Browne & Cudeck, 1993).
Starting from the simplest model (Big 5), a model allowing for a GFP (Big 5 þ1) and a
model allowing for Plasticity and Stability (Big 5 þ2) on the second order fitted the data
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 267
significantly better than the reduced model (Dx
2
¼61.85, Dd.f. ¼3, p<.05 and
Dx
2
¼103.20, Dd.f. ¼6, p<.05). A model allowing for both a GFP and the Big Two
(Big 5 þ2þ1) fitted the data significantly better than the Big 5 þ2 model (Dx
2
¼14.85,
Dd.f. ¼3, p<.05).
All these models were compared with their corresponding blended factor models
(marked by an asterisk in Table 2). Consistently, the blended factor models fitted the data
significantly better (see Ashton et al., 2009, for similar results referring to the facet level).
Within the blended factor models, the model comparisons bore the similar pattern as within
the models without secondary loadings except for one comparison: The full model (Big
5þ2þ1
) did not fit the data significantly better than the Big 5 þ2
model (Dx
2
¼0.03,
Dd.f. ¼3, p>.05). The Big 5 þ2
model also achieved the best overall model fit (the
smallest RMSEA and the highest CFI). As supposed, the evidence of a GFP rested
exclusively upon a significant correlation between ES and E, when method effects are
controlled for.
As factor loadings are useful to illustrate the correlation of lower-order variables with
higher-order variables, standardized factor loadings from five of the eight model variants
are presented in Table 3. A factor loading of .50 implies that 25% of variance in the lower-
order variable is attributable to the higher-order factor. Not surprisingly, factor loadings
and the implied resulting convergent valid trait variance was larger in the mean peer reports
Table 1. Averaged cross-correlations among observed NEO-PI-R domain scores for MZ twins
Emotional
stability Extraversion Openness Agreeableness Conscientiousness
Emotional stability 0.26
Extraversion 0.11 0.43
Openness 0.03 0.15 0.45
Agreeableness 0.01 0.04 0.10 0.31
Conscientiousness 0.12 0.05 0.02 0.02 0.39
Correlations are averaged Hetero-method (self- versus peer report) across twin siblings (twin A versus B)
coefficients.
Note: Correlations were averaged using Fisher’s rto ztransformation. The complete correlation matrix is given in
Appendix.
Table 2. Model fit statistics
Models
Fit statistics
x
2
d.f. CFI RMSEA 90% CI
Big 5 1475.23 430 0.831 0.052 0.049–0.054
Big 5 þ1 1413.38 427 0.840 0.050 0.047–0.053
Big 5 þ2 1372.03 424 0.847 0.049 0.046–0.052
Big 5 þ2þ1 1357.18 421 0.849 0.049 0.046–0.052
Big 5
1302.89 429 0.859 0.047 0.044–0.050
Big 5 þ1
1255.53 426 0.866 0.046 0.043–0.049
Big 5 R2
1153.26 423 0.882 0.043 0.040–0.046
Big 5 þ2þ1
1153.23 420 0.881 0.044 0.041–0.047
Blended factor models allowing for secondary loadings from SR
E
and PR
E
on the latent ES-factor. Best fitting
model is presented in boldface. Each latent variable is decomposed in two genetic and one environmental variance
components and thus the exclusion of one latent variable comprises three degrees of freedom (d.f.).
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
268 R. Riemann and C. Kandler
than for self-reports, since averages over two judges minimize individual biases and
increase reliability and as a consequence also convergent validity (Campbell & Fiske,
1959; Hofstee, 1994).
The latent Big Five factors accounted for 41% (self-reports on Emotional Stability,
SR
ES
) to 67% (peer reports on Extraversion, PR
E
) of variance in the manifest variables
(based on the Big 5 model, Table 3). Based on the best fitting model (Big 5 þ2
), 8% in ES,
13% in Aand 10% in Cwas explained by Stability and 36% in Eand 39% in Oby Plasticity.
Consequently, the Stability factor accounted for 3% (SR
ES
)to7%(PR
A
) and Plasticity
accounted for 17% (SR
E
) to 29% (PR
O
) of the total variance in the observed variables.
For the sake of completeness, we additionally considered the case of the GFP based on
the best fitting model of unblended variable models (Big 5 þ2þ1). The GFP accounted
for 71% of variance in Stability and 23% of variance in Plasticity. However, the GFP
Table 3. Standardized factor loadings
Factor loadings
Selected models
Big 5 Big 5 þ2 Big 5 þ2þ1Big 5 R2
Big 5 þ2þ1
Convergent valid traits
SR
E
on E0.72 0.74 0.74 0.68 0.68
PR
E
on E0.82 0.85 0.85 0.76 0.76
SR
O
on O0.64 0.67 0.68 0.67 0.67
PR
O
on O0.78 0.82 0.83 0.83 0.83
SR
ES
on ES 0.64 0.65 0.65 0.66 0.66
PR
ES
on ES 0.75 0.74 0.75 0.75 0.75
SR
A
on A0.67 0.69 0.69 0.69 0.69
PR
A
on A0.71 0.73 0.73 0.72 0.72
SR
C
on C0.70 0.71 0.71 0.71 0.71
PR
C
on C0.72 0.73 0.73 0.73 0.73
Eon Plasticity 0.51 0.53 0.60 0.60
Oon Plasticity 0.60 0.60 0.65 0.65
ES on Stability 0.28 0.31 0.28 0.28
Aon Stability 0.35 0.39 0.36 0.36
Con Stability 0.31 0.34 0.31 0.31
Stability on GFP 0.84 0.19
Plasticity on GFP 0.48 0.09
SR
E
on ES 0.34 0.34
PR
E
on ES 0.38 0.38
Self-report specificity
SR
E
on B-Bias
SR
0.50 0.47 0.47 0.46 0.46
SR
O
on B-Bias
SR
0.52 0.50 0.50 0.50 0.50
SR
ES
on A-Bias
SR
0.32 0.31 0.31 0.29 0.29
SR
A
on A-Bias
SR
0.44 0.42 0.41 0.40 0.40
SR
C
on A-Bias
SR
0.39 0.37 0.37 0.36 0.36
Peer report specificity
PR
E
on B-Bias
PR
0.57 0.55 0.55 0.56 0.56
PR
O
on B-Bias
PR
0.43 0.34 0.33 0.33 0.33
PR
ES
on A-Bias
PR
0.48 0.38 0.37 0.38 0.38
PR
A
on A-Bias
PR
0.28 0.24 0.23 0.26 0.26
PR
C
on A-Bias
PR
0.51 0.49 0.48 0.49 0.49
Note: SR, self-report; PR, averaged peer report.
Blended factor models allowing for secondary loadings from SR
E
and PR
E
on the latent ES-factor. The best fitting
model is presented in boldface.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 269
explained only 6% (E) to 11% (A) of variance in the Big Five. Consequently, just about 2%
(SR
ES
) to 6% (PR
A
) of variance in the manifest variables was accounted for by the GFP.
That is, although the Big 5 þ2þ1 model fitted the data significantly better than the Big
5þ2 model, the GFP accounted for a marginal proportion of variance in measured
personality domains (unless the specific correlation between Eand ES is taken into
account).
Factor loadings from self- and mean peer report measures of the Big Five on self- and
peer- report-specific A-Bias and B-Bias factors are not easily convertible in variance
components of manifest variables, because A-Bias and B-Bias are allowed to correlate
across the models. Based on the best fitting model (Big 5 þ2
), the correlation between A-
Bias and B-Bias was r
SR
¼.51 for self-reports and r
PR
¼.35 for peer reports indicating the
presence of self- and peer-report-specific general method components which might have
yielded a markedly significant GFP within mono-method studies (Musek, 2007; Rushton &
Irwing, 2008).
In addition to the phenotypic model comparisons, we compared different genetically
informative models based on the best fitting phenotypic model (Big 5 þ2
). First, we tested
for the significance of nonadditive genetic effects (d
2
¼0), and second, for the significance
of additive genetic effects (a
2
¼0). The restricted model in which all genetic effects were
assumed to be additive did not fit the data significantly poorer than the full twin model
(Dx
2
¼10.19, Dd.f. ¼13, p<.05, CFI ¼0.882, RMSEA ¼0.043). However, the reduced
model in which all genetic effects were assumed to be due to dominance deviations fitted
the data significantly poorer than the full twin model (Dx
2
¼29.29, Dd.f. ¼13, p>.05,
CFI ¼0.879, RMSEA ¼0.043). Thus, genetic effects on different levels of personality
were assumed to be additive.
Additive genetic and nonshared environmental variance components for each latent
variable based on the best fitting model are presented in Table 4. Common variance of self-
and averaged peer reports, which was assumed to have a genetic basis (McCrae et al.,
2000), showed a clear genetic influence (on average 81%). Genetic effects on convergent
valid second-order traits Plasticity and Stability were smaller coming up to about a half of
Table 4. Genetic (a
2
) and environmental (e
2
) effects on trait and method factors
Latent variable and covariances
Variance components in %
a
2
e
2
Extraversion 86 14
Openness 92 8
Emotional Stability 59 41
Agreeableness 85 15
Conscientiousness 81 19
Plasticity 50 50
Stability 40 60
A-Bias
SR
83 17
B-Bias
SR
64 36
A-Bias
PR
32 68
B-Bias
PR
0 100
s
SR
53 47
s
PR
0 100
Note: Parameter estimates derived from the best fitting model.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
270 R. Riemann and C. Kandler
variance. This indicates that common variance in Oand Eas well as in ES, Aand Cappears
to be genetically as well as environmentally influenced.
Self-report-specific method factors showed a clear genetic basis, whereas of the peer
report method factors only the A-Bias was affected by genetic variance. The correlation
between A-Bias
SR
and B-Bias
SR
was attributable to genetic as well as environmental
effects indicating a self-report-specific general factor, which has almost equal genetic and
environmental sources. The correlation between A-Bias
PR
and B-Bias
PR
was exclusively
due to environmental effects indicating a peer-report-specific environmentally influenced
general factor.
DISCUSSION
The application of the MTMM-T design to self- and peer report data on the NEO-PI-R
supports four conclusions: (1) When method effects are controlled, there is no support for a
GFP and the second order factor Stability turns out to be a weak trait factor. (2) The
correlations between both the two self-report and between the two peer report specific
method factors indicate method specific general factors that are not validated across
methods. (3) If we consider the variance shared by self- and peer reports on the NEO-PI-R
domains, this is largely genetically determined (h
2
>0.80) for all domains but Emotional
Stability. (4) Genetic effects on method factors indicate that especially self-reports but also
peer reports capture variance that shows convergent validity between twins but not between
methods. Each of these conclusions will be briefly discussed in the remainder.
Campbell and Fiske (1959) have argued that mono-method studies are severely limited
with respect to the scientific study of personality. Our analyses strongly support this view.
While there is support for a common source of variance both in self- and in peer report data,
the conjecture of a meaningful GFP is not supported in the analysis of variance shared by
self- and peer reports and controlled for method specific effects. Although compared to
simpler models not including blended factors the inclusion of a GFP in the higher-order
structure resulted in a significant improvement of model fit, the GFP explains only
marginal proportions of the observed variance. In addition, in this model, the putative GFP
basically reflects the comparably high correlation between Emotional Stability and
Extraversion (for a similar conclusion see Musek, 2007). This correlation is comparably
high even across methods and between twin siblings (e.g. Twin A’s self-report on
Emotional Stability correlated with Twin B’s peer-report on Extraversion). Thus, a blended
factor model that takes this correlation into account at the first order factor (domain score)
level fits the data substantially better and supports this conclusion convincingly. We used
the blended factor models here only as a convenient way to test whether the correlation
between Plasticity and Stability is mainly due to the correlation between ES and E. These
models were inspired by Ashton et al.’s (2009) blended variables model.
While there is a notable correlation between Extraversion and Openness that gives rise to
the higher-order factor Plasticity, the Stability trait factor explains only a marginal
proportion of the variance in phenotypic measures of Emotional Stability, Extraversion and
Agreeableness, which make up this factor. A similar pattern of loadings on Stability and
Plasticity was found in a comparable analysis of American data (McCrae et al., 2008; study
2). Stability explained more variance in the American adult sample than in our analyses,
however, it was quite weak in a sample of American adolescents. Similarly, Anusic,
Schimmack, Pinkus and Lockwood (2009) were not able to identify Stability as a second
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 271
order factor in some of their analyses of Big Five measures. Thus, while our results are in
line with a hypothesized Plasticity factor, our German MTMM data question the usefulness
of the Stability factor. Since the Plasticity factor is based on a single correlation,
independent evidence is needed to clarify whether Plasticity is indeed a meaningful higher
order factor and not a misrepresentation of causal effects between E and O.
In addition, it should be kept in mind that we did not test blended variables models since
this had resulted in overly complicated SEMs. Thus, the alternative explanation (Ashton
et al., 2009) that correlations among the NEO-PI-R domain scores are due to facets that
represent same-signed blends of orthogonal factors has not been faced here. Ashton et al.
argue that certain combinations of orthogonal factors are socially more important for
judging persons than other combinations.
The two correlated method factors A-bias (loading on ES, C and A) and B-bias (loading
on E and O) explain more phenotypic variance on average than the second order trait
factors. The loadings of self- and peer reports on the method factors as well as the
correlation between method factors are in line with Anusic et al.’s (2009) finding that halo
error may be responsible for mono-method correlations among Big Five traits.
Furthermore, it documents the importance of multiple sources of data for studies of
the structure of personality constructs. The method factor structure implies that in both
mono-method self-report (e.g. Rushton & Irwing, 2009) as well as mono-method other
report studies (e.g. Rushton, Bons, & Hur, 2008, study 3) robust higher-order factors will
be found, because these confound method and trait effects.
The behavioural genetic analysis of trait factors yields very high heritability estimates
for all first order trait factors except Emotional Stability. This replicates similar results
reported by Kandler et al. (in press) for NEO-PI-R facets and Riemann et al. (1997) for the
NEO-FFI (Costa & McCrae, 1989; Borkenau & Ostendorf, 1993). Obviously, those aspects
of personality that contribute to the conjoint basis of self- and peer reports have a strong
genetic basis. Genetic effects on the second order factors Stability and Plasticity are
substantially smaller. This reflects that the relative size of genetic effects does not simply
result from the aggregation of measures but rather depends on the sources of covariance
among personality constructs. The environment not shared by twin siblings seems to
contribute substantially to the covariation between Extraversion and Openness as well as
among the traits loading on Stability.
Most important in the present context is the behavioural genetic analysis of method
factors. The self-report method factors A-bias and B-bias show a very high heritability,
indicating that self-reports capture specific variance that is validated by the convergence
across twin siblings. Thus we can conclude that self-reports measure something that is not
assessed by peer reports. Results for the peer report method factors differ markedly. Peer
report A-bias is only moderately influenced by genetic variance and peer report B-bias
shows no genetic effects at all. With the exception of the moderate systematic variance
observed for A-bias, we may conclude that there is an asymmetrical relation between both
methods. Almost all systematic variance captured by peer reports is shared with self-
reports but not vice versa. These findings extend and support the results of Kandler et al.
(in press).
The difference in the sources of variance in self- and peer report data is in line with a
response bias explanation. Since in our study two independent peers rated each twin
sibling, rater bias may only correlate between twins, if it is triggered by a common feature
of the twins and shared by all raters (e.g. shared stereotypes). On the other hand, genetic
influences on all kinds of self-report response biases contribute to twin correlations. One
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
272 R. Riemann and C. Kandler
way to test this explanation is to analyse control scales that are included in numerous
questionnaires (e.g. Lie scale) or other traits (e.g. Narcissism) that may distort ratings on
traits actually measured. Loehlin and Martin (2001) found genetic effects on the Lie scale
of the EPQ (Eysenck, Eysenck, & Barrett, 1985). Vernon, Villani, Vickers and Harris
(2008) report substantial genetic influences on self-reports of Narcissism (about 0.60).
However, the test of a bias explanation requires not only that one must estimate the
heritability of control scales in self-reports, but also that one must analyse the pattern of
correlations between control scales and traits scores in self- and peer reports or,
alternatively, to statistically control trait scores for control scale variance. For example in
their analysis of self- and peer reports of the EPQ-RS (Eysenck & Eysenck, 1991),
Angleitner, Riemann and Strelau (1997) included the EPQ-RS Lie scale. The lie scale
showed a heritability of 0.40 for self-reports. For peer reports effects of the shared
environment were stronger than genetic effects. Although agreement on the lie scale
among two peers was rather low (0.41), self- and peer report scores correlated 0.47. Thus,
the EPQ-RS lie scale reflects not only self-report specific response bias but also variance
common to self- and peer reports. We are not aware of any data that provide a test of the
bias explanation.
Alternatively, the strong genetic influence on self-report method factors might be also
interpreted as supporting the view that self-report measures of personality traits provide an
extended perspective on personality. That is, self-reports might capture aspects of traits that
are not readily accessible to external observers. Commenting on the relation between life
record data (including peer reports) and self-report measures, Cattell (1950) concluded that
‘presumably these [self-report data] deal with responses too confined to the ideational and
introspective fields to have anything but oblique representation in the basic, universal,
behavioural manifestation of the personality sphere’ (p. 83). Allport (1937) already
emphasized that personality can be described appropriately only if a multitude of measures
is combined. He emphasized that ‘there are a great many legitimate methods of studying
personality, each with a proper place in the armamentarium of the psychologist’ (p. 369). If
we consider modern physiological, neuroscience, genetic or information processing
approaches to personality measurement, this view seems even more plausible. Personality
assessment, however, becomes more complex if different measures capture different
aspects of personality, since the variance common to different measures of personality may
not represent central or core features of individual differences, but just the overlap that two
or more variables happen to measure conjointly. As anticipated by Allport, personality
description requires a thorough understanding of what measure captures what aspect of
personality or as McCrae (2009) put it ‘Because traits are not easily manipulated, trait
psychology depends on the thoughtful interpretation of careful observations’ (p. 673). The
MTMM-T study design is an important tool since it allows conclusions about the valid
variance specific to measures, and thus provides new insights into the interplay of
substance and artefact in personality measurement.
REFERENCES
Allport, G. W. (1937). Personality: A psychological interpretation. New York: Holt.
Angleitner, A., Riemann, R., & Strelau, J. (1997). Genetic and environmental influences on the EPQ-
RS scales: A twin study using self- and peer reports. Poster presented at the 8th Meeting of the
International Society for the Study of Individual Differences, Arhus, Denmark, July 19–23.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 273
Anusic, I., Schimmack, U., Pinkus, R. T., & Lockwood, P. (2009). The nature and structure of
correlations among Big Five ratings: The Halo-Alpha-Beta model. Journal of Personality and
Social Psychology,97, 1142–1156.
Arbuckle, J. L. (2007). AMOS users’ guide 17.0. Chicago: SPSS.
Ashton, M. C., Lee, K., Goldberg, L. R., & de Vries, R. E. (2009). Higher order factors of personality:
Do they exist? Personality and Social Psychology Review,13, 79–91.
Ba
¨ckstro
¨m, M., Bjo
¨rklund, F., & Larsson, M. R. (2009). Five-factor inventories have a major general
factor related to social desirability which can be reduced by framing items neutrally. Journal of
Research in Personality,43, 335–344.
Bartels, M., Boomsma, D. I., Hudziak, J. J., van Beijsterveldt, T. C. E. M., & van den Oord, E. J. C. G.
(2007). Twins and the study of rater (dis)agreement. Psychological Methods,12, 451–466.
Biesanz, J. C., & West, S. G. (2004). Toward understanding assessments of the Big Five: Multitrait-
multimethod analyses of convergent and discriminant validity across measurement occasion and
type of observer. Journal of Personality,72, 845–876.
Borkenau, P., & Ostendorf, F. (1993). NEO-Fu
¨nf-Faktoren-inventar (NEO-FFI) [NEO five-factor
inventory].Go
¨ttingen: Hogrefe.
Borkenau, P., Riemann, R., Spinath, F. M., & Angleitner, A. (2000). Behavior genetics of personality:
The case of observational studies. In S. Hampson (Ed.), Advances in personality psychology
(Volume 1, pp. 107–137). Hove, England: Psychology Press.
Bouchard, T. J. Jr., & Loehlin, J. (2001). Genes, evolution, and personality. Behavior Genetics,31,
243–273.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen, & J. S.
Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park: Sage.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-
multimethod matrix. Psychological Bulletin,56, 81–105.
Cattell, R. B. (1950). Personality: A systematic theoretical and factual study. New York: McGraw-
Hill.
Costa, P. T., & McCrae, R. R. (1989). NEO personality inventory: Manual form S and form R. Odessa,
FL: Psychological Assessment Resources.
Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO personality inventory (NEO-PI-R) and NEO
five-factor inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment
Resources.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral
measurements: Theory of generalizability for scores and profiles. New York: Wiley.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological
Bulletin,52, 281–302.
DeYoung, C. G. (2006). Higher-order factors of the Big Five in a multiinformant sample. Journal of
Personality and Social Psychology,91, 1138–1151.
DeYoung, C. G., Peterson, J. B., & Higgings, D. M. (2002). Higher-order factors of the Big Five
predict conformity: Are there neuroses of health? Personality and Individual Differences,33, 533–
552.
Digman, J. M. (1997). Higher-order factors of the Big Five. Journal of Personality and Social
Psychology,73, 1246–1256.
Egloff, B., Wilhelm, F. H., Neubauer, D. H., Mauss, I. B., & Gross, J. J. (2002). Implicit anxiety
measure predicts cardiovascular reactivity to an evaluated speaking task. Emotion,2, 3–11.
Eid, M., Lischetzke, T., Nussbeck, F. W., & Trierweiler, L. I. (2003). Separating trait effects from
trait-specific method effects in multitrait-multimethod models: A multiple-indicator CT-C(M-1)
model. Psychological Methods,8, 38–60.
Eysenck, H. J., & Eysenck, S. B. G. (1991). Manual of the Eysenck personality scales. London:
Hodder & Stroughton.
Eysenck, S. B. G., Eysenck, H. J., & Barrett, P. (1985). A revised version of the Psychoticism scale.
Personality and Individual Differences,6, 21–29.
Funder, D. C., Kolar, D. C., & Blackman, M. C. (1995). Agreement among judges of personality:
Interpersonal relations, similarity, and acquaintanceship. Journal of Personality and Social
Psychology,69, 656–672.
Graziano, W. G., & Ward, D. (1992). Probing the Big Five in adolescence: Personality and adjustment
during the developmental transition. Journal of Personality,60, 425–439.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
274 R. Riemann and C. Kandler
Grumm, M., & von Collani, G. (2007). Measuring Big Five personality dimensions with the implicit
association test – Implicit personality traits or self-esteem? Personality and Individual Differences,
43, 2205–2217.
Hofstee, W. K. B. (1994). Who should own the definition of personality? European Journal of
Personality,8, 149–162.
Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do
about it? Psychological Methods,5, 64–86.
John, O. P., Goldberg, L. R., & Angleitner, A. (1984). Better than the alphabet: Taxonomies of
personality-descriptive terms in English, Dutch, and German. In H. Bonarius, G. van Heck, & N.
Smid (Eds.), Personality psychology in Europe: Theoretical and empirical developments. Lisse:
Swets & Zeitlinger.
Kandler, C., Riemann, R., Spinath, F., & Angleitner, A. (in press) Sources of variance in personality
facets: A twin study of self-self, peer-peer, and self-peer (dis-)agreement. Journal of Personality.
Kenny, D. A. (1994). Interpersonal perception: A social relations analysis. New York: Guilford.
Kraemer, H. C., Measelle, J. R., Ablow, J. C., Essex, M. J., Boyce, W. T., & Kupfer, D. J. (2003).
A new approach to integrating data from multiple informants in psychiatric assessment and
research: Mixing and matching contexts and perspectives. American Journal of Psychiatry,160,
1566–1577.
Letzring, T. D., Wells, S. M., & Funder, D. C. (2006). Information quantity and quality affect the realistic
accuracy of personality judgment. Journal of Personality and Social Psychology,9, 111–123.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (second edition). New
Jersey: Wiley.
Loehlin, J. C., & Martin, N. G. (2001). Age changes in personality traits and their heritabilities during
the adult years: Evidence from Australian twin registry samples. Personality and Individual
Differences,30, 1147–1160.
McCrae, R. R. (2009). The physics and chemistry of personality. Theory & Psychology,19, 670–687.
McCrae, R. R., Costa, P. T., Jr., Hrebı
´ckova
´, M., Ostendorf, F., Angleitner, A., Avia, M. D., et al.
(2000). Nature over nurture: Temperament, personality, and life span development. Journal of
Personality and Social Psychology,78, 173–186.
McCrae, R. R., Yamagata, S., Jang, K. L., Riemann, R., Ando, J., Ono, Y., et al. (2008). Substance and
artifact in the higher-order factors of the Big Five. Journal of Personality and Social Psychology,
95, 442–455.
Musek, J. (2007). A general factor of personality: Evidence for the Big One in the five-factor model.
Journal of Research in Personality,41, 1213–1233.
Neale, M. C., & Maes, H. H. M. (2004). Methodology for genetic studies of twins and families.
Dordrecht: Kluwer Academic Publishers B.V.
Neubauer, A. C., Spinath, F. M., Riemann, R., Borkenau, P., & Angleitner, A. (2000). Genetic and
environmental influences on two measures of speed of information processing and their relation to
psychometric intelligence: Evidence from the German observational study of adult twins.
Intelligence,28, 267–289.
Ostendorf, F., & Angleitner, A. (2004). NEO-Perso
¨nlichkeitsinventar, revidierte Form, NEO-PI-R
nach Costa und McCrae [Revised NEO personality inventory, NEO-PI-R of Costa and McCrae].
Go
¨ttingen, Germany: Hogrefe.
Paulhus, D. L., & John, O. P. (1998). Egoistic and moralistic biases in self-perception: The interplay
of self-deceptive styles with basic traits and motives. Journal of Personality,66, 1025–1060.
Riemann, R., Angleitner, A., & Strelau, J. (1997). Genetic and environmental influences on
personality: A study of twins reared together using the self- and peer report NEO-FFI scales.
Journal of Personality,65, 449–475.
Rushton, J. P., Bons, T. A., & Hur, Y.-M. (2008). The genetics and evolution of the general factor of
personality. Journal of Research in Personality,42, 1173–1185.
Rushton, J. P., & Irwing, P. (2008). A general factor of personality (GFP) from two meta-analyses of
the Big Five: Digman (1997) and Mount, Barrick, Scullen, and Rounds (2005). Personality and
Individual Differences. 45, 679–683.
Rushton, J. P., & Irwing, P. (2009). A general factor of personality in the Comrey personality scales,
the Minnesota multiphasic personality inventory-2, and the multicultural personality question-
naire. Personality and Individual Differences,46, 437–442.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 275
Schultheiss, O. C., & Brunstein, J. C. (2001). Assessment of implicit motives with a research version
of the TAT: Picture profiles, gender differences, and relations to other personality measures.
Journal of Personality Assessment,77, 71–86.
Spinath, F. M., Angleitner, A., Borkenau, P., Riemann, R., & Wolf, H. (2002). German observational
study of adult twins (GOSAT): A multimodal investigation of personality, temperament and
cognitive ability. Twin Research and Human Genetics,5, 372–375.
Sto
¨ßel, K., Ka
¨mpfe, N., & Riemann, R. (2006). The Jena twin registry and the Jena twin study of
social attitudes (JeTSSA). Twin Research and Human Genetics,9, 783–786.
Tellegen, A., Grove, W. M., & Waller, N. G. (1991). Inventory of personal characteristics #7 (IPC7).
Unpublished materials, University of Minnesota.
Vernon, P. A., Villani, V. C., Vickers, L. C., & Harris, J. A. (2008). A behavioural genetic
investigation of the dark triad and the big five. Personality and Individual Differences,44,
445–452.
Yik, M. S. M., & Bond, M. H. (1993). Exploring the dimensions of Chinese person perception with
indigenous and imported constructs: Creating a culturally balanced scale. International Journal of
Psychology,28, 75–95.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
276 R. Riemann and C. Kandler
APPENDIX
Multitrait-multimethod-twin correlation matrix
MZ
twin
DZ
method
Twin
method
trait
AB
Self Peer Self Peer
ES E O A C ES E O A C ES E O A C ES E O A C
A Self ES 0.44 0.02 0.04 0.42 0.48 0.20 0.10 0.11 0.09 0.20 0.15 0.07 0.01 0.11 0.19 0.15 0.04 0.01 0.12
E0.39 0.26 0.01 0.30 0.21 0.60 0.09 0.17 0.06 0.07 0.32 0.06 0.07 0.10 0.09 0.31 0.04 0.10 0.09
O0.07 0.45 0.08 0.10 0.06 0.12 0.59 0.02 0.04 0.01 0.03 0.28 0.02 0.03 0.02 0.10 0.17 0.03 0.04
A0.17 0.03 0.03 0.04 0.06 0.02 0.15 0.48 0.00 0.01 0.07 0.01 0.27 0.01 0.04 0.01 0.01 0.20 0.01
C0.42 0.25 0.03 0.11 0.25 0.11 0.19 0.10 0.52 0.12 0.16 0.04 0.13 0.28 0.11 0.06 0.08 0.12 0.21
Peer ES 0.40 0.24 0.04 0.01 0.25 0.27 0.11 0.03 0.44 0.01 0.01 0.04 0.06 0.07 0.15 0.02 0.11 0.08 0.04
E0.24 0.64 0.21 0.05 0.13 0.30 0.22 0.11 0.06 0.08 0.21 0.06 0.01 0.03 0.07 0.30 0.07 0.02 0.03
O0.10 0.24 0.56 0.06 0.00 0.12 0.42 0.17 0.01 0.05 0.02 0.17 0.03 0.02 0.13 0.01 0.14 0.00 0.09
A0.03 0.05 0.01 0.45 0.01 0.16 0.00 0.18 0.05 0.03 0.03 0.01 0.27 0.08 0.00 0.08 0.07 0.30 0.02
C0.19 0.04 0.02 0.03 0.54 0.45 0.12 0.10 0.16 0.05 0.03 0.06 0.09 0.11 0.11 0.00 0.13 0.05 0.11
B Self ES 0.53 0.17 0.01 0.04 0.24 0.22 0.13 0.04 0.02 0.09 0.43 0.10 0.09 0.39 0.61 0.34 0.04 0.07 0.34
E0.21 0.52 0.19 0.11 0.16 0.06 0.43 0.12 0.01 0.01 0.32 0.40 0.18 0.29 0.26 0.67 0.23 0.01 0.13
O0.06 0.25 0.56 0.08 0.03 0.05 0.20 0.49 0.06 0.02 0.05 0.40 0.18 0.02 0.03 0.18 0.55 0.03 0.05
A0.15 0.09 0.03 0.53 0.01 0.01 0.09 0.07 0.27 0.09 0.18 0.15 0.16 0.09 0.01 0.06 0.15 0.60 0.05
C0.23 0.07 0.01 0.00 0.55 0.11 0.09 0.06 0.02 0.37 0.42 0.23 0.06 0.06 0.25 0.15 0.03 0.06 0.61
Peer ES 0.30 0.10 0.01 0.01 0.15 0.37 0.14 0.10 0.03 0.17 0.48 0.17 0.02 0.04 0.24 0.36 0.00 0.13 0.48
E0.15 0.42 0.10 0.09 0.05 0.11 0.51 0.13 0.03 0.02 0.19 0.64 0.23 0.14 0.06 0.30 0.34 0.04 0.22
O0.01 0.16 0.41 0.17 0.10 0.04 0.26 0.49 0.15 0.04 0.06 0.23 0.59 0.23 0.12 0.00 0.46 0.20 0.12
A0.02 0.02 0.08 0.35 0.09 0.10 0.00 0.12 0.46 0.06 0.02 0.00 0.12 0.46 0.06 0.23 0.10 0.29 0.22
C0.14 0.06 0.05 0.11 0.40 0.20 0.05 0.03 0.09 0.43 0.17 0.07 0.04 0.08 0.48 0.43 0.12 0.05 0.20
Note: Above the diagonal the MTMM-T correlation matrix of DZ twins is shown, below the diagonal of MZ twins. Correlations >.25 are shown in boldface.
Copyright #2010 John Wiley & Sons, Ltd. Eur. J. Pers. 24: 258–277 (2010)
DOI: 10.1002/per
Construct validation using MTMM-T data 277