Content uploaded by Gregory J. Boyle
Author content
All content in this area was uploaded by Gregory J. Boyle on Feb 28, 2016
Content may be subject to copyright.
CHAPTER
10
Measures of Empathy: Self-Report, Behavioral,
and Neuroscientific Approaches
David. L. Neumann
1
, Raymond C.K. Chan
2
, Gregory. J. Boyle
3
,
Yi Wang
2
and H. Rae Westbury
1
1
Griffith University, Gold Coast, Queensland, Australia;
2
Chinese Academy of Sciences, Beijing, China;
3
University of
Melbourne, Parkville, Victoria, Australia
The measurement of empathy presents a serious challenge for researchers in disciplines ranging from social
psychology, individual differences, and clinical psychology. Part of this challenge stems from the lack of a clear,
universal definition for empathy.
Titchener (1909) used the term to describe how people may objectively enter
into the experience of another to gain a deeper appreciation and understanding of their experiences. However,
contemporary definitions are much more complex and highlight a range of cognitive, affective, and physiological
mechanisms. For example,
Batson (2009) noted eight conceptualizations: (a) knowing another’s emotional and
cognitive state; (b) matching the posture or neural response of another; (c) feeling the same as another; (d) pro-
jecting oneself into anoth er’s situation; (e) imagining how another is feeling and thinking; (f) imagi ning how one
would think and feel in another’s situation; (g) feeling distress for the suffering of another; and (h) feeling for
another person who is suffering. Furthermore, empathy overlaps with related, although distinct, constructs such
as compassion and sympathy (
Decety & Lamm, 2009; Hoffman, 2007; Preston & de Waal, 2002).
A review of the major definitions of empathy over the past 20 years reve als that there is no single definition
that is consi stently cited; indeed, the multitude of definitions is often cited as a distinct feature of the field (e.g.,
Batson, 2009; Gerdes, Segal, & Lietz, 2010). Despite this disparity, some commonality can be seen across defini-
tions, and comprehensive theoretical conceptualizations have been provided (e.g ., Preston & de Waal, 2002
). At a
broad level empathy involves an inductive affective (feeling ) and cognitive evaluative (knowing) process that
allows the individual to vicariously experience the feelings and understand the given situation of another
(
Hoffman, 2007). Its presence or absence is related to autonomic nervous system activity (Bradley, Codispoti,
Cuthbert, & Lang, 2001; Levenson & Ruef, 1992) and overt behaviors that are augmented by affecti ve intensity
and cognitive accuracy (Ickes, Stinson, Bissonette, & Garcia, 1990; Plutchi k, 1990). Further, empathy is a funda-
mental emotional and motivational component that facilitates sympathy and prosocial behavior (responding com-
passionately)(Thompson & Gullone, 2003
).
Researchers have used various approaches to measure empathy with instruments dating back to the 1940s
(e.g.,
Dymond, 1949). Largely, as a consequence of the cognitively oriented psychological zeitgeist of the mid-
20th century, empathy measurement was heavily influenced by cognitive approaches, although there were some
notable emotion-based measures (e.g., the Emotional Empathic Tendency Scale;
Mehrabian & Epstein, 1972 ).
Prominent examples of such measures from the mid-20th century include the Diplomacy Test of Empathic
Ability (
Kerr, 1960) and Hogan’s (1969) Empathy Scale. In the 1980s to 1990s, social and developmental psyc holo-
gists emphasized the multiplicity of empathy in terms of physiologically linked affective states (
Batson, 1987),
cognitive processing, or a self-awareness of these feelings (
Batson et al., 1997), and emotion regulation (Eisenberg
257
Measures of Personality and Social Psychological Constructs.
DOI:
http://dx.doi.org/10.1016/B978-0-12-386915-9.00010-3 © 2015 Elsevier Inc. All rights reserved.
et al. 1994; Gross, 1998). Furthermore, throughout this period physiological measurements, such as skin conduc-
tance and heart rate (e.g ., Leven son & Ruef, 1992
) were increasingly being used. From the 1990’s through to the
present day empathy measurement has been influenced by the development of social-cognitive neuroscience,
although self-report approaches have continued to be developed and extensively used.
Reviews of empathy measures have been provided in the past (e.g.,
Chlopan, McCain, Carbonell, & Hagen,
1985; Eisenberg & Fabes, 1990; Wispe, 1986). The present aim is to provide brief, succinct psychometric reviews
of contemporary empathy measures, and also to expand upon recent reviews on em pathy measures constructed
for specific research audiences, such as the measurement of empathy in social work (Gerdes et al., 2010
) and
medicine (
Hemmerdinger, Stoddart, & Lilford, 2007; Pederse n, 2009). Hopefully, the present chapter will enable
researchers interested in measuring empathy to gain an appreciation of what approaches are available and an
understanding of the benefits and challenges that each of the reviewed measures present. Using a combination of
measures may also counter the criticism that some measurement approaches are narrow in scope (Levenson &
Ruef, 1992
).
MEASURES REVIEWED HERE
An extensive search of literature databases (PsycINFO, Social Sciences Citation Index, and Google Scholar),
test manuals and related publications, citation searches of original scale description s, and inspection of the refer-
ence lists of relevant reports was carried out. Only measures that were constructed or extensively revised follow-
ing the first edition of this handbook were selected for review (i.e., post-1991). For this reason, questionnaires
that were constructed earlier have not been included even though they have been frequently used in research.
Examples incl ude the Hogan Empathy Scale (
Hogan, 1969), the Emotional Empathic Tendency Scale (Mehrabian
& Epstein, 1972
), and the Interpersonal Reactivity Index (Davis, 1983). In addition, due to space limitations,
empathy measures designed for specific applications were excluded. Examples of such questionnaires include
the Consultation and Relational Empathy measure (
Mercer, Maxwell, Heaney, & Watt, 2004), the Jefferson Scale
of Physician Empathy (
Hojat et al. 2001), the Nursing Empathy Scale (Reynolds, 2000), the Autism Quotient
(
Baron-Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001), the Japanese Adolescent Empathy Scale
(Hashimoto & Shiomi, 2002), the Scale of Ethnocultural Empathy (Wang et al. 2003), and the Emotional Empathy
Scale (
Ashraf, 2004). The mea sures reviewed here were grouped into three categories: self-report instruments,
behavioral observational methods, and neuroscientific approaches.
Self-Report Measures
1. Balanced Emotional Empathy Scale (
Mehrabian, 1996)
2. Multidimensional Emotional Empathy Scale (Caruso & Mayer, 1998)
3. Empathy Quotient (
Baron-Cohen & Wheelwright, 2004)
4. Feeling and Thinking Scale (Garton & Gringart, 2005)
5. Basic Empathy Scale (Joliffe & Farrington, 2006a)
6. Griffith Empathy Measure (
Dadds et al., 2008)
7. Toronto Empathy Questionnaire (Spreng, McKinnon, Mar, & Levine, 2009)
8. Questionnaire of Cognitive and Affective Empathy (
Reniers et al. 2011)
Behavioral Measures
1. Picture Viewin g Paradigms (
Westbury & Neumann, 2008)
2. Comic Strip Task (
Vo
¨
llm et al. 2006)
3. Picture Stories (
Nummenmaa, Hirvonen, Parkkola, & Hietanen, 2008)
4. Kids Empathetic Development Scale (Reid et al. 2011)
Neuroscientific Measures
1. Magnetic Reso nance Imaging (MRI)
2. Functional Magnetic Resonance Imaging (fMRI)
3. Facial Electromyography (fEMG)
4. Electroencephalogram (EEG)
5. Event-Related Potentials (ERPs)
258 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
OVERVIEW OF THE MEASURES
Self-report questionnaires include paper-and-pencil measures. Behavioral methods include evaluations of
experimental stimuli and performance on tests. Neuroscientific approaches include brain imaging techniques
(e.g., fMRI) and other measures of central nervous system activity (e.g., electroencephalography, EEG), measure
of facial electromyography (EMG), and autonomic nervous system measures (e.g., skin conductance, heart rate).
Space restrictions limited an extensive discussion of all neuroscientific measures and only some of the more
recent techniques are reviewed. Studies that have used more than one type of measure (e.g., fMRI and self-report
scales;
Mathur, Harada, Lipke, & Chiao, 2010; Singer et al. 2004) gene rally show that the different measurement
approaches correlate well with each other.
Balanced Emotional Empathy Scale (BEES)
(Mehrabian, 1996).
Variable
The BEES is a unidimensional measure that conceptualizes empathy as an increased responsiveness to
another’s emotional experience. The measure assesses the degree to which the respondent can vicariously experi-
ence another’s happiness or suffering.
Description
‘The Balanced Emotional Empathy Scale (BEES) measures both of the aforementioned components of
Emotional Empathy (i.e., vicarious experience of others’ feelings; interpersonal positi veness) in a balanced way’
(Mehrabian, 19952010). The 30 items of the BEES are rated on a 9-point Likert-type resp onse scale. The scale
yields a single score with higher scores representing greater levels of emotional empathy. A 7-item Likert-type
abbreviated scale and a French adaptation of the full scale also exist.
Sample
Separate samples of male and female college students were used in the initial construction of the BEES
(
Mehrabian, 1996).
Reliability
Internal Consistency
Cronbach alpha coefficients for the BEES have been reported as follows: .87 (
Mehrabian, 1997), .81 (Macaskill,
Maltby, & Day, 2002; Shapiro et al., 2004), .83 (Toussaint & Webb, 2005
), .90 (Courtright, Mackey, & Packard,
2005
), .85 (Smith, Lindsey, & Hansen, 2006), and .82 (Albiero, Matricardi, Speltri, & Toso, 2009).
TestRetest
Atestretest reliability coefficient (r 5 .79) was reported by
Bergemann (2009) over a six-week interval.
Validity
Convergent/Concurrent
The BEES correlates positively with the Emotional Empathetic Tendency Scale (r 5 .77) and with helping
behavior (r 5 .31;
Smith et al., 2006). It correlates positively with the Basic Empathy Scale (Jolliffe & Farrington,
2006a
) for both males (r 5 .59) and females (r 5 .70) in an Italian sample (Albiero et al., 2009). LeSure-Lester (2000)
reported that the BEES correlates positively with compliance with house rules (r 5 .67) and chores completed
(r 5 .57). Scores on the BEES are also positively associated with forgiveness of others (
Macaskill et al., 2002) and
in a sample of FBI agents, negotiation skills (
Van Hasselt et al., 2005). In an fMRI study, BEES scores were posi-
tively correlated with activation of neurons that compose the pain matrix (anterior insula and rostral anterior cin-
gulate cortex, r 5 .52 and, r 5 .72 respectively) when participants viewed significant others subje cted to pain
(
Singer et al., 2004).
259OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Divergent/Discrim inant
Smith et al. (2006) found that the BEES correlates negatively with aggression (r 52.21). Similarly, in a sample
of adolescents, negative correlations were reported between aggression towards peers (r 52.57) and aggression
towards staff (r 52.59) (
LeSure-Lester, 2000). Mehrabian (1997) also reported that BEES scores correlated nega-
tively with aggression (r 52.31) and risk of eruptive violence (r 52.50).
Construct/Factor Analytic
A principal components analysis based on the item intercorrelations investigated the structure of the BEES
(see
Mehrabian, 1997). Although three components had eigenvalues greater than one, it was concluded that a uni-
dimensional structure reflecting emotional empathy provided the most parsimonious interpretation.
Criterion/Predictive
Scores on the BEES increased significantly from pretest to posttest in educational programs designed to
increase empathy towards patients (
Shapiro et al., 2004) and towards Holocaust victims (Farkas, 2002).
Location
Mehrabian, A. (1996). Manual for the Balanced Emotional Empathy Scale (BEES). Monterey, CA: Albert
Mehrabian.
Details available at:
www.kaaj.com/psych/scales/emp.html (Retrieved December 30, 2013).
Results and Comments
Gender differences have been reported with females tending to obtain higher scores than males on the full
BEES (
Marzoli et al., 2011; Schulte-Ru
¨
ther, Markowitsch, Shah, Fink, & Piefke, 2008; Toussaint & We bb, 2005)as
well as on an abbreviated version (Mehrabian, 2000
). Although the BEES has been widely adopted by researchers,
empathy is commonly regarded as a multidimensional construct. The BEES is limited in its focus on emotional
empathy. The extent to which the single score on the measure is independent of cognitive empathy remains to be
determined.
BEES SAMPLE ITEMS
‘I cannot feel much sorrow for those who are
responsible for their own misery.’
‘Unhappy movie endings haunt me for hours
afterwards.’
Notes: Items are rated on a 9-point Likert-type
scale ranging from 14 5 ‘Very strong agreement’ to
24 5 ‘Very strong disagreement’.
Copyrightr 19952010 Albert Mehrabian.
Multidimensional Emotional Empathy Scale (MDEES)
(Caruso & Mayer, 1998).
Variable
The MDEES focuses on the affective component of empathy and is intended for use with adolescents and
adults.
Description
Thirty items describing positive and negative emotional situations are responded to on a 5-point Likert-type
scale. The MDEES is proposed to consist of six subscales labeled: Empathic Suffering, Positive Sharing,
Responsive Crying, Emotional Attention, Feeling for Others, and Emotional Contagion. The total scale score is
obtained by summing across all the items (six negatively worded items are reverse scored), although reverse-
worded items may measure a rather different construct (Boyle et al., 2008).
260 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Sample
The samples used in validating the MDEES included 503 adults (164 men and 333 women) whose mean age
was 23 years (ranging from 17 to 70 years) and 290 adolescents (115 male and 140 female; 35 no gender indicated)
whose mean age was 14 years (ranging from 11 to 18 years). (
Caruso & Mayer, 1998).
Reliability
Internal Consistency
The Cronbach alpha coefficient for the entire scale of 30 items was found to be .88 (
Caruso & Mayer, 1998).
Using the 26 items that formed six factors in the scale (see below) yielde d an alpha coefficient of .86. The alpha
coefficients for the six subscales varied from .44 to .80 (Empathic Suffering 5 .80; Positive Sharing 5 .71;
Responsive Crying 5 .72; Emotional Attention 5 .63; Feeling for Others 5 .59; Emotional Contagion 5 .44). Using
the same items from the subscales described by
Caruso and Mayer (1998), Olckers, Buys, and Grobler (2010)
reported alpha coefficients ranging from .32 to .82 (Empathic Suffering 5 .79; Positive Sharing 5 .85; Responsive
Crying 5 .69; Emo tional Attention 5 .51; Feeling for Others 5 .61; Emotional Contagion 5 .32).
TestRetest
Testretest reliability of the MDEES is not currently available.
Validity
Convergent/Concurrent
In the sample of adolescents, there was a positive correlation (r 5 .63) with an adaptation of the Emotio nal
Empathetic Tendency Scale (
Mehrabian & Epstein, 1972). Also, for the adult subsample, Emotional Attention cor-
related positively (.34) with Eisenberg’s Parenting Style scale (
Eisenberg, Fabes, & Losoya, 1997).
Divergent/Discrim inant
Caruso and Mayer (1998, p. 14) reported that, ‘The new scale did not, generally, correlate with a measure of
social loneliness, with one exception: the correlation between the Responsive Crying scale and social loneliness
was 2.13 (p , .05). However, the scores share less than 2% of the variance (r
2
5 .0 16).’ Also, higher scores for
women than men have been shown for the over all scale score and on all subscale scores (all p , .001;
Caruso &
Mayer, 1998
), although this gender difference has not always been observed (Fa ye et al. 2011). Studies have also
shown significantly higher scores for older individuals (
Caruso & Mayer, 1998; Faye et al., 2011).
Construct/Factor Analytic
Caruso and Mayer (1998) un dertook a principal components analysis to examine the structure of the MDEES
in the sample of 793 adults and adolescents described above. The PCA yielded six components (with eigenvalues
greater than one) labeled: Empathic Suffering (8 items), Positive Sharing (5 items), Responsive Crying (3 items),
Emotional Attention (4 items), Feeling for Others (3 items), and Emotional Contagion (2 items). However, in a
confirmatory factor analysis using a sample of 212 adults,
Olckers et al. (2010) were unable to verify the six-
dimensional structure claimed for the MDEES. Individual factor loadings were low for variables assoc iated with
Emotional Attention, Feel for Others, and Emotio nal Contagion.
Criterion/Predictive
The MDEES was found to predict a number of behavioral criteria.
Caruso and Mayer (1998) examined the rela-
tionship between MDEES scores and various lifespace scales. ‘Lifespace scales are self-report scales, similar to
bio-data scales, which record information on the types and frequency of behavior a subject engages in’ (Caruso &
Mayer, 1988, p. 8). The MDEES scores correlated with artistic skill s (r 5 .12), satisfaction with one’s career, social
and personal life (r 5 .23), a warm, supportive upbringing (r 5 .20), and attendance at cu ltural events in the sam-
ple of adults (r 5 .18) (
Caruso & Mayer, 1998). Scores on the MDEES also predicted (r 5 .30) preferences for per -
sonal, non-erotic touch in a sample (N 5 129) of university students (
Draper & Elmer, 2008). Also, in an Iranian
sample of 70 undergraduates, a cognitive-affective reading-based course that aids in emotion regulation signifi-
cantly predicted MDEES scores (
Rouhani, 2008).
261OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Location
Caruso, D. R., & Mayer, J. D. (1998). A measure of emotional empathy for adolescents and adults. Unpublished
Manuscript. Available online at:
www.google.com.au/url?sa5t&rct5j&q5&esrc5s&source5web&cd51&ved50CCkQFjAA&url5http%3A%
2%2Fwww.unh.edu%2Femotional_intelligence%2FEI%2520Assets%2FEmapthy%2520Scale%2FEmpathy%2520
Article%25202000.doc&ei510G-UsL9GK-0iQea3IDIDA&usg5AFQjCNHbIUirDCZr0fhyG3vTMsCfjecUYw&bvm5
bv.58187178,d.dGI
(Retrieved December 28, 2013).
Results and Comments
The MDEES aims to measure different components of affective empathy. However,
Caruso and Mayer (1998)
cautioned against using the Emotional Contagion subscale given that it contains only two items. In addition,
Olckers et al. (2010) carried out a CFA that was unable to verify the purported MDEES structure. Testretest reli-
ability of the MDEES also remains to be determined.
MDEES SAMPLE ITEMS
Circle the response which best indicates how much
you agree or disagree with each item.
The suffering of others deeply disturbs me.
I rarely take notice when other people treat each
other warmly.
Being around happy people makes me feel
happy, too.
I feel like crying when watching a sad movie.
Too much is made of the suffering of pets or animals.
I feel others’ pain.
My feelings are my own and don’t reflect how
others feel.
Note: Items are rated on a 5-point Likert-type scale rang-
ing from 1 5 ‘Strongly disagree’to55 ‘Strongly agree’.
Empathy Quotient (EQ)
(Baron-Cohen & Wheelwright, 2004).
Variable
Baron-Cohen & Wheelwright (2004) defined empathy as, ‘the drive to identify another person’s emoti ons and
thoughts, and to respond to these with an appropriate emotion’ (p. 361). In line with this definition, the EQ was
designed to be a sho rt, easy to use scale that measures both cognitive and affective components of empathy.
Description
The 60-item EQ comprises 40 empathy items and 20 filler/control items. Respondents score one a 4-point
forced-choice scale from ‘strongly agree’, ‘agree slightly’, ‘disagree slightly’ and ‘disagree strongly’ with higher
scores reflecting higher empathic capacity. The EQ contains 20-control items, included to provide some distrac-
tion to minimize the ‘relentless focus on empathy’ while responding to the EQ measure (
Baron-Cohen &
Wheelwright, 2004
, p. 166). The control items can be used to check for response bias. Furthermore, approximately
half the item s in the EQ are reverse worded, although reverse-worded items tend to measure a somewhat differ-
ent construct (Boyle et al., 2008).
Sample
Initial pilot testing of the EQ was undertaken on a small sample of 20 normal individuals (
Baron-Cohen &
Wheelwright, 2004
). Su bsequent validation samples included 90 adults with Asperger syndrome or high-
functioning autism who were compared on the EQ with 90 age-mat ched controls, and 197 adults from the general
population (71 males whose mean age was 38.8 years; a nd 136 females whose mean age was 39.5 years) (
Baron-
Cohen & Wheelwright, 2004
).
262 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Reliability
Internal Consistency
Baron-Cohen and Wheelwright (2004) reported a Cronbach alpha coefficient of .92. Other researchers have
also reported alpha coefficients of .87 (
Hambrook, Tchanturia, Schmidt, Russell, & Treasure, 2008), .78 (Kim &
Lee, 2010
), and .85 (Muncer & Ling, 2006). For a child-adapted version of the EQ (EQ-C), an a lpha coefficient of
.93 was reported (
Auyeung et al., 2009).
TestRetest
Baron-Cohen and Wheelwright (2004) also reported a 12-month interval test retest reliability coefficient of .97
for the EQ. In an independent study, the 12-month testretest reliability coefficient was found to be .84
(
Lawrence, Shaw, Baker, Baron-Cohen, & David, 2004). The testretest reliability coefficient for the EQ in both
Korean and Italian adaptations over a four-week period was r 5 .84 (Kim & Lee, 2010) and r 5 .85 (Preti et al.,
2011
).
Validity
Convergent/Concurrent
In the Korean adaptation of the EQ, positive correlations were obtained between the EQ and the Interpersonal
Reactivity Index (IRI) subscales: Perspective Taking (r 5 .33), Empathetic Concern (r 5 .25), and Fantasy (r 5 .20)
(r 5 .17) (
Kim & Lee, 2010). Lawrence et al. (2004, p. 917) reported (N 5 52) that the Emoti onal Reactivity compo-
nent of the EQ correlated positively (.31) with Beck Anxiety Inventory (BAI) scores.
Divergent/Discrim inant
The EQ score correlated negatively with the IRI Personal Distress subscale (r 52.17) (
Kim & Lee, 2010). The
EQ also exhibits significant sex differences with women scoring more highly than men (
Lawrence et al., 2004;
Muncer & Ling, 2006 ). Individuals with either Asperger’s syndrome or high-functioning autism obtained signifi-
cantly lower scores on the EQ than did normals (Baron-Cohen & Wheelwright, 2004; Kim & Lee, 2010). Lawrence
et al. (2004, p. 917)
reported (N 5 45) that the Social Skills component of the EQ correlated negatively (.35) with
Beck Depression Inventory (BDI) scores. In a French study,
Berthoz, Wessa, Kedia, Wicker, and Gre
`
zes (2008)
reported that the EQ correlated with the BDI (2 .13), with Spielberger’s State STAI (2 .08), and with the Trait
STAI (2 .11). With regard to the three EQ components, only Social Skills correlated significantly with the BDI
(2 .36), State STAI (2 .34), and Trait STAI (2 .37).
Construct/Factor Analytic
Lawrence et al. (2004) carried out a principal components analysis of the item intercorrelations and suggested
that the EQ could be better regarded as a 28-item scale with three related components of empathy (labeled: cogni-
tive empathy, emotional reactivity, and social skills), rather than a 40-item unifactorial scale.
Muncer and Ling
(2006)
conducted a confirmatory factor analysis that provided some support the proposed three factor structure.
Berthoz et al. (2008) undertook a confirmatory factor analysis of the EQ that provided support for the three-
dimensional structure of the measure.
Allison, Varon-Cohen, Wheelwright, Stone, and Muncer (2011, p. 829)
investigated the structure of the EQ using both Rasch and CFA analyses, in samples of 658 autism spectrum dis-
order patients, 1375 family members, and 3344 normals. The CFA suggested that a 26-item model exhibited a sat-
isfactory fit to the data (RMSEA 5 .05, CFI 5 .93), while the Rasch analysis suggested that the EQ provides a valid
measure of empathy.
Criterion/Predictive
The EQ has been shown to exhibit criterion/predictive validity in research pertaining to autism and gender
differences (
Auyeung et al., 2009; Baron-Cohen & Wheelwright, 2004), social functioning and aging (Bailey,
Henry, and Von Hip pel, 2008
), schizophrenia (Bora, Go
¨
kc¸en, and Veznedaroglu. 2007), and eating disorders
(
Hambrook et al., 2008).
Location
Baron-Cohen, S., & Wheelwright, S. (2004). The empathy quotient: An investigation of adults with Asperger
syndrome or high functi oning autism, and normal sex differen ces. Journal of Autism and Developmental Disorders,
34, 163 175.
263OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Results and Comments
There is some debate with regards to the structure of the EQ.
Baron-Cohen and Wheelwright (2004) based the
scale on a model of empathy as having both affective and cognitive components. However, some evidenc e sug-
gests that the scale may consist of three factors (Lawrence, 2004;
Muncer & Ling, 2006). Reniers et al. (2012)
pointed out that the EQ items tend to focus more on measu ring the empathetic process rather than the empathy
construct itself.
EQ SAMPLE ITEMS
Cognitive empathy questions
I can easily work out what another person might
want to talk about
I am good at predicting how someone will feel
Affective empathy questions
Seeing people cry doesn’t really upset me
I usually stay emotionally detached when watching
a film
Notes: Items are rated on a 4-point scale with the
response options of ‘Strongly agree’; ‘Slightly agree’;
‘Slightly disagree’to‘Strongly disagree’.
Feeling and Thinking Scale (FTS)
(Garton & Gringart, 2005).
Variable
The FTS is an adaptation of the Interpersonal Reactivity Index (IRI,
Davis, 1980) for use with children. The IRI
contains four independent subscales labeled: Empathic Concern, Perspective Taking, Per sonal Distress, and
Fantasy.
Description
The IRI items were reworded to be more easily understood by children. Item 16 (Fantasy subscale) and all
reverse worded items were removed as they were too difficult for children to comprehend. The final FTS scale
comprised 18 of the IRI items including four Empathetic Concern items, four Perspective-Taking items, six
Personal Distress items, and four Fantasy items (see
Garton & Gringart, 2005).
Sample
The initial sample used by
Garton and Gringart (2005) co mprised 413 children (194 girls and 219 boys, aged
from 7.11 to 9.11 years).
Reliability
Internal Consistency
FTS items reflecting affective and cognitive components of empathy exhibited Cronbach alpha coefficients of
.69 and .54 respectively (
Garton & Gringart, 2005). Likewise, Kokkino and Kipritsi (2012) reported alpha coeffi-
cients of .53 and.56 (for cognitive and affective components), and .68 for the total scale.
TestRetest
Testretest reliability coefficients for the FTS are not currently available.
Validity
Convergent/Concurrent
The FTS total scale score correlated positively with self-efficacy (r 5 .22), social self-efficacy (r 5 .27), and aca-
demic self-efficacy (r 5 .23) (
Kokkinos & Kipritsi, 2012).
264 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Divergent/Discrim inant
Kokkinos and Kipritsi (2012) found that the FTS total score correlated negatively with their Bullying and
Victimization scale or BVS (r 52.15). Girls scored more highly than boys on both the cognitive and affective
components of empathy (
Garton & Gringart, 2005).
Construct/Factor Analytic
A principal components analysis with oblimin rotation using the sample of 413 school children resulted in a
four-component solution (
Garton & Gringart, 2005). The resultant 12-item scale comprised a two-dimensional
structure reflecting both affective and cognitive components of empathy. Likewise, Kokkinos and Kipritsi (2012),
using a Greek sample of 206 Grade 6 children, conducted an exploratory factor analysis of the item intercorrela-
tions, resulting in separate cognitive and affective empathy factors.
Criterion/Predictive
No criterion/predictive validity evidence is currently available.
Location
Garton, A.F., & Gringart, E. (2005). The development of a scale to measure empathy in 8- and 9-ye ar old chil-
dren. Australian Journal of Education and Developmental Psychology, 5,1725.
Results and Comments
Theory consistent sex differences have been found with the FTS. Girls show significantly higher scores than
boys on both affective (p , .001) and cognitive (p , .01) factors (
Garton & Gringart, 2005 ). During construction of
the FTS,
Garton and Gringart (2005) proposed a two-factor model reflecting cognitive and affective components
of empathy. However, the FTS was bas ed upon
Davis’ (1980) IRI which comprises four subscales. Eviden tly, the
relationship between the FTS and IRI requires further investigation. Also, the testretest reliability as well as the
criterion/predictive validity remain to be investigated.
FTS SAMPLE ITEMS
Cognitive empathy question
I think people can have different opinions about the same thing.
Affective empathy question
Emergency situations make me feel worried and upset.
Note: Items are rated on a 5-point Likert-type scale ranging from: 1 5 ‘Not like me at all’; 2 5 ‘Hardly ever like me’;
3 5 ‘Occasionally like me’; 4 5 ‘Fairly like me’; and 5 5 ‘Very like me’.
Basic Empathy Scale (BES)
(Jolliffe & Farrington, 2006a).
Variable
The BES is based on a definition of empathy proposed by
Cohen and Strayer (1996) as the sharing and under-
standing of another’s emotional state or context resulting from experiencing the emoti ve state (affective) and
understanding another’s (cognitive) emotions.
Description
The BES measures five basic emotions (fear, sadness, anger, and happiness) wherein the measurements relate
more generally to cognitive and affective empathy and not to a non-specific affective state (e.g., anxiety). For the
40-item scale, reverse worded items have been included with 20 items requiring a positive response and 20
requiring a negative response (
Jolliffe & Farrington, 2006a). A shortened 20-item version is also available, along
with a French version for use with adults (
Carre
´
, Stefaniak, D’Ambrosio, Bensalah, & Besche-Richard, 2013).
265OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Sample
The BES was constructed using a sample of 363 Year 10 adolescents (194 males and 169 females whose mean
age was 14.8 years). A separate validation sample included 357 Year 10 students (182 males and 175 females).
Reliability
Internal Consistency
Jolliffe and Farrington (2006a) reported an overal l Cronbach alpha coefficient of .87 (.79 and .85 for cognitive
and affective components).
Stavrinides et al. (2010) reported alpha coefficients for cognitive (.80 and .83) and
affective components (.71 and .77), respectively. In an Italian study (
Albiero et al., 2009), an alpha coefficient of
.87 for the total scale was reported (.74 for cognitive and .86 for affective empathy). A Chinese study (
Geng, Xia,
& Qin, 2012
) reported an alpha coefficient of .77 for the total scale (.72 for cognitive, and .73 for affective empa-
thy). In a French study (N 5 370),
Carre
´
et al. (2013) reported alpha coefficients for cognitive (.71), and affective
components (.74), respectively.
TestRetest
Testretest reliability over a 3-week interval for the BES was demonstrated for a French adaptation (r 5 .66)
(
D’Ambrosia et al., 2009) and for the Chinese version over a 4-week interval (r 5 .70) (Geng et al., 2012).
D’Ambrosia et al. also reported testretest coefficients for the affective empathy subscale (r 5 .70) and for the
cognitive empathy subscale (r 5 .54). Likewise,
Carre
´
et al. (2013) reported a testretest coefficient of .56 for the
BES cognitive empathy component (N 5 222) over a 7-week interval.
Validity
Convergent/Concurrent
Jolliffe and Farrington (2006a) reported that total scores on the BES correlate positively with total scores on the
IRI for males (r 5 .53) and females (r 5 .43), respectively. The BES affective component correlates more strongly
with IRI Perspective Taking (r 5 .51) than with Empathic Concern (r 5 .33) in males. Likewise, the BES cognitive
component correlates more strongly with IRI Perspective Taking (r 5 .44) than with Empathic Concern (r 5 .37)
for females. The BES also correlated positively with the earlier constructed BEES for both males (r 5 .59) and
females (r 5 .70) in an Italian sample (
Albiero et al., 2009). Total BES scores correlate positively with agreeable-
ness in males (r 5 .30) and femal es (r 5 .24), consci entiousness for males only (r 5 .17), openness for males
(r 5 .34) and females (r 5 .15), and neuroticisim for females only (r 5 .16) (
Jolliffe & Farrington, 2006a).
Divergent/Discrim inant
Jolliffe and Farrington (2006a) reported that total BES scores correlate negatively with a measure of alexithy-
mia, although this appeared to reflect a significant negative relationship with cognitive empathy only (r 52.21
for males; r 52.31 for females). Females obtain significantly higher scores than males on affective empathy, cog-
nitive empathy and total empathy scores (
Jolliffe & Farrington, 2006a). These sex differences in reported empathy
have been replicated in an Italian study (
Albiero et al., 2009).
Construct/Factor Analytic
The BES was constructed using a principal components analysis (plus orthogonal varimax rotation) to reduce
the 40-item scale into affective and cognitive empathy factors (
Jolliffe & Farrington, 2006a). Confirmatory factor
analysis (N 5 720), revealed that a good fit to the data was obtained for the two-factor solution: GFI (.89), the
AGFI (.86), and the RMSR (.06). The affective and cognitive subscales were significantly correlated for males
(r 5 .41) and females (r 5 .43). Subsequently,
Carre
´
et al. (2013) carried out a CFA (N 5 370) which provided sup-
port for both two- and thr ee-dimensional BES structures.
Criterion/Predictive
For both males and females, BES total scores were higher among individuals who reported that they would
help in a real-life incident requiring their assistance, than in those who reported that the incident was none of
their business (
Jolliffe & Farrington, 2006a).
266 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Location
Jolliffe, D., & Farrington D.P. (2006a). Development and validation of the Basic Empathy Scale. Journal of
Adolescence, 29, 589611.
Results and Comments
The BES has been used in research into bullying (e.g.,
Jolliffe & Farrington, 2006b; Stavrinides et al., 2010) and
offending (Jolliffe & Farrington, 2007
). There is a paucity of literature that provides stability coefficients for the
BES over a time interval greater than seven weeks. Despite collection of the BES on two occasions over a 6-month
period,
Stavrinides et al. (2010) did not report testretest reliability.
BES SAMPLE ITEMS
Cognitive empathy question
It is hard for me to understand when my friends are sad.
Affective empathy question
I usually feel calm when other people are scared.
Note: Items are rated on a 5-point Likert-type scale ranging from: 1 5 ‘Strongly disagree’; 2 5 ‘Disagree’; 3 5 ‘Neither
agree nor disagree’; 4 5 ‘Agree’; 5 5 ‘Strongly agree’.
Griffith Empathy Measure (GEM)
(Dadds et al., 2008)
Variable
The GEM was constructed due to the shortage of multi-informant assessment of empathy in children and ado-
lescents, deemed important for accurate measurement of empathy in this population group (
Dadds et al., 2008,
p. 111). It is an adaption of the Bryant Index of Empathy (
Bryant, 1982) used by parents to assess child and ad o-
lescent empathy (Dadds et al., 2008).
Description
The GEM contains 23 items that are rated on a 9-point Likert-type response scale to assess parents’ level of
agreement with statements concerning their child. The GEM appears to measure cognitive and affective compo-
nents of empathy (
Dadds et al., 2008).
Sample
Construction of the GEM used a sample of 2612 parents of children aged 4 to 16 years (mean age 5 7.71 years;
SD 5 3.06) from primary and secondary schools in Australia.
Reliability
Internal Consistency
Dadds et al. (2008) reported a Cronbach alpha coefficient of .81 for the overall scale of 23 items, .62 for cogni-
tive emp athy (6 items), and .83 for affective empathy (9 items). subsequently,
Dadds et al. (2009) reported alpha
coefficients of .62 (cognitive empathy), and .77 (affective empathy).
TestRetest
For a subsample of 31 parents with non-clinic children aged 512 years,
Dadds et al. (2008) reported a
testretest reliability coefficient over a one-week interval of .91 for the GEM (affective subscale: r 5 .93; cognitive
subscale: r 5 .89). In a further sub sample of 127 parents with non-clinic children, Dadds et al. (p. 117) reported
an impressive six-month stability coefficient (r 5 .69).
267OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Inter-Rater
Dadds et al. (2008) reported that inter-parent ratings for total scores were: (boys r 5 .63, girls r 5 .69), affective
scores (boys r 5 .47, girls r 5 .41), and cognitive scores (boys r 5 .52, girls r 5 .47).
Validity
Convergent/Concurrent
Dadds and Hawes (2004) reported that for mothers, correlations between GEM total, cognitive, and affective
empathy scores, and Maximum Distress Allowed (measured via the Interpersonal Response Test) were .38, .56,
and .30, respectively. The GEM cognitive empathy component correlated .30 with verbal IQ scores (
Dadds et al.,
2008
). Positive correlations were found between the GEM and the Cruelty to Animals Inventory (Dadds et al.,
2004). Observed Pet Nurturance correlated .25 with the total GEM scale, and .34 with the GEM affective
component.
Divergent/Discrim inant
Although the GEM did not correlate with verbal IQ (r 5 .01), the affective empathy component correlated 2.15
with verbal IQ scores (
Dadds et al., 2008). Negative correlations were found between the GEM and the Cruelty to
Animals Inventory (Dadds et al., 2004). Observed Pet Cruelty correlated 2.31 with the total GEM scale, 2.35
with the affective GEM component, and 2.12 with the cognitive GEM component.
Dadds et al. (2009) examined
the relationship between parent-rated cognitive and affective empathy (on the GEM) with psychopathic traits.
For males, psychopathic traits correlated negatively with cognitive (r 52.41) and affective (r 52.17) empathy.
For females, psychopathic traits correlated negatively with cognitive (r 52.39) but not affective (r 52.02) empa-
thy. For 155 mother and father ratings on the GEM, mothers tended to rate their children more highly on total,
cognitive, and affective components (
Dadds et al., 2008).
Construct/Factor Analytic
GEM item intercorrelations were subjected to a principal components analysis with oblique (direct oblimin)
rotation, revealing separate cognitive and affective components (
Dadds et al., 2008). The two components were
found to be independent ( r 5 .07). A confirmatory factor analysis demonstrated an acceptable fit (CFI 5 .90;
RMSEA 5 .05), providing support for the proposed two-dimensional structure of the GEM across genders and
age groups.
Criterion/Predictive
Dadds and Hawes (2004) reported that Reaction Time (measured via the Interpersonal Response Test) corre-
lated negatively with total and affective empathy (r 52.56, and r 52.57) but not with cognitive empathy scores
(r 5 .15). Using behavioral measures of children’s’ nurturing behavior as well as cruel behaviors towards pets,
Observed Pet Nurturance correlated .25 with the GEM total score (.34 with the affective component, and .05 with
the cognitive component) (
Dadds et al., 2008).
Location
Dadds, M.R. et al. (2008). A me asure of cognitive and affective empathy in children using parent ratings. Child
Psychiatry and Human Development, 39, 111122.
Results and Comments
The cognitive component of the GEM, while seeming stable, does not show high internal consistency.
Furthermore, the principal components analysis extraction employed by
Dadds et al. (2008) can increase the risk
of falsely inflating component loadings. It would be recommended for future research using the GEM to re-visit
the scales factor structure. The GEM also does not incorporate a means of systematically reducing response bias.
GEM SAMPLE ITEMS
My child rarely understands why other people cry
My child becomes sad when other children around
him/her are sad
Note: Items are rated on a 9-point Likert-type scale
ranging from: 14 5 ‘Strongly agree’to4 5 ‘Strongly
disagree’.
268 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Toronto Empathy Questionnaire (TEQ)
(Spreng et al., 2009).
Variable
The development of the TEQ did not begin with a conceptual definition of empathy other than to consider it
at the broadest level and derive a measure based on existing empathy scales.
Description
Spreng et al. (2009) factor analyzed responses made on every self-report measure of empathy they could iden-
tify, resulting in 142 items from 11 different empathy and related questionnaires including the IRI (
Davis, 1980,
1983), Hogan’s Empathy Scale (Hogan, 1969
), Questionnaire Measure of Emotional Empathy (Mehrabian &
Epstein, 1972
), BEES (Mehrabian, 2000), Scale of Ethnocultural Empathy (Wang et al., 2003), Jefferson Scale of
Physician Empathy (
Hojat et al., 2001), Nursing Empathy Scale (Reynolds, 2000), Japanese Adolescent Empathy
Scale (
Hashimoto & Shiomi, 2002), and the Measure of Emotional Intelligence (Schutte et al., 1998). An additional
36 items were composed descriptive of individuals with altered empathic responding due to neurological or psy-
chiatric disease. The resulting TEQ places an emphasis on the emotional component of empathy, and consists of
16 items, with an equal number of positively and reverse worded items. Responses are made using a 5-point
Likert-type scale.
Sample
The initial scale development sample consisted of 200 unde rgraduates (100 male, 100 female) (mean age 5 18.8
years, SD 5 1.2). A validation sample comprised 79 undergraduates (24 male, 55 female ) of similar age
(mean 5 18.9 years, SD 5 3.0). Another validational sample consisted of 65 undergraduates (mean age 5 18.6
years, SD 5 2.3).
Reliability
Internal Consistency
The Cronbach alpha coefficient was fou nd to be .85 for both the developmental and validation samples. For
the additional validation sample, the alpha coefficient was found to be .87 (
Spreng et al., 200 9 ).
TestRetest
For the subsample of 65 students who completed the TEQ again after a mean interval of 66 days, the stability
coefficient was .81 (
Spreng et al., 200 9 ).
Validity
Convergent/Concurrent
The TEQ correlated positively with IRI Empathic Concern (r 5 .74) and also after reworded Empathic Concern
items were removed (r 5 .71). Total TEQ scores also correlated positively with IRI Perspective Taking (r 5 .35).
TEQ scores correlated positively with IRI Empathic Concern (r 5 .74), with Perspective Taking (r 5 .29), and
Fantasy (r 5 .52). TEQ scores also correlated positively with EQ scores (r 5 .80) (
Spreng et al., 2009).
Divergent/Discrim inant
Scores on the TEQ correlated with a behavioral measure of social comprehension (Reading the Mind in the
Eyes Test-Revised: r 5 .35, Interpersonal Perception Task-15: r 5 .23) in a sample of 79 undergraduates (
Spreng
et al., 2009
). In a sample of 200 students, a negative correlation was observed with the Autism Quotient
(r 52.30). Males and females did not differ significantly in total TEQ scores in the first sample, although in the
second sample, females scored significant ly higher than males.
Construct/Factor Analytic
An iterative maximum-likelihood factor analysis with SMCs as initial communality estimates was undertaken
on the item intercorrelations (N 5 200).
Spreng et al. (2009) then conducted a further exploratory factor analysis
on the intercorrelations of the final 16 items of the TEQ forcing a single-factor structure.
269OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Criterion/Predictive
Criterion/predictive validity coefficients remain to be documented.
Location
Spreng, R.N., McKinnon, M.C., Mar, R.A., & Levine, B. (2009). The Toronto Empathy Questionnaire: Scale
development and initial validation of a factor-analytic solution to multiple empathy measures, Journal of
Personality Assessment, 91,6271.
Results and Comments
The TEQ loads on a single factor representative of ‘the broadest, common construct of empathy’.
Spreng et al.
(2009)
argued that since the TEQ correlates with IRI components of empathetic concern, perspectiv e taking, and
fantasy, it may not be necessary to use multiple subscales to measure empathy. However, the TEQ does not cor-
relate with the IRI subscale of personal distress, suggesting that it may not encapsulate all facets of empathy.
TEQ SAMPLE ITEMS
I enjoy making other people feel better
I am not really interested in how other people feel
I find it silly for people to cry out of happiness
I can tell when others are sad even when they do not
say anything
Note: Items are rated on a 5-point Likert-type scale
ranging from: 1 5 ‘Never’; 2 5 ‘Rarely’; 3 5 ‘Sometimes’;
4 5 ‘Often’; 5 5 ‘Always’.
Questionnaire of Cognitive and Affective Empathy (QCAE)
(Reniers, Corcoran, Drake, Shryane, & Vo
¨
llm, 2011).
Variable
The QCAE aims to build on earlier measures of empathy in which the constru cts were considered to be either
too narrow or inaccurate, inconsistently defined, or psychometric properties were less than optimal (
Reniers
et al., 2011
). Both cognitive and affective components of empathy are measured.
Description
The QCAE is a 31-item measure with a 4-point forced-choice response scale. To create the QCAE, items were
derived from the EQ (
Baron-Cohen & Wheelwright, 2004), Hogan’s Empathy Scale (Hogan, 1969), the Empathy
subscale of the Impulsiveness-Venturesomeness-Empathy Invento ry (IVE;
Eysenck & Eysenck, 1991), and the IRI
(
Davis, 1980, 1983). Each item was assessed by two raters. If both raters agreed on an item as a measure of cogni-
tive or affective empathy it was included in the measure. The QCAE comp rises five subscales (31 items) labeled:
perspective taking, online simulation, emotion contagion, proximal responsivity, and peripheral responsivity,
respectively (Reniers et al., 2011
). The first two subscales measure cognitive empathy and the remaining three
subscales measure affective empathy.
Sample
The initial sample comprised 925 participants (284 males; 641 females) whose mean age was 26 years (SD 5 9).
Some 81% of the participants originated from European decent with the majority specifying the United Kingdom
as their place of origin.
Reliability
Internal Consistency
Cronbach alpha coefficients have been reported as follows: perspective taking (.85), emotional contagion (.72),
online simulation (.83), peripheral responsivity (.65), and proximal responsivity (.70) (
Reniers et al., 2011).
TestRetest
Testretest reliability coefficients for the QCAE are not currently available.
270 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Validity
Convergent/Concurrent
Reniers et al. (2011) reported that the cognitive and affective subscales of the QCAE share some variance in
common (r 5 .31). This suggests that while there is a relationship between the cognitive and affective subscales,
they still represent distinct forms of empathy. The BES correlates positively with the QCAE subscales of co gnitive
(r 5 .62) and affective (r 5 .76) empathy (
Reniers et al., 2011).
Divergent/Discrim inant
Reniers et al. (2011) reported that females scored more highly than males on both the cognitive and affective
subscale.
Reniers et al. (2012, p. 205) reported that the QCAE cognitive empathy subscale is negatively correlated
with secondary psychopathy (r 52.64) (as measure d via the Levenson Self-Report Psychopathy Scale). No
relationship was observed between empathy scores and moral judgment competence scores (as measured via the
Moral Judgment Task).
Construct/Factor Analytic
A principal components analysis (with direct oblimin rotation) was carried out for the original 65-item scale
(N 5 640). Both the Scree test (
Cattell, 1978; Cattell & Vogelmann, 1977) and a parallel analysis (Velicer & Jackso n,
1990
) suggested five components, defining the subscales of the QCAE. Although a subsequent confirmatory fac-
tor analysis in an independent sample (N 5 318) provided support for the five-component structure, a two-
dimensional structure relating to cognitive and affective empathy ‘provided the best and most parsimonious fit
to the data’ (
Reniers et al., 2011, p. 84).
Criterion/Predictive
Lang (2013) reported that QCA scores decreased in a sample of 185 participants (82% female) following obser-
vation of chronic pain portrayed in entertainment media. Also, predictive validity of the QCAE has been demon-
strated in stu dies into prenata l testosterone and the later development of behavioral traits (
Kempe & Heffernan,
2011
), as well as musical appre ciation (Clemens, 2012).
Location
Reniers, R., Corcoran, R., Drake, R., Shryane, N.M., & Vo
¨
llm. B.A. (2011). The QCAE: A questionnaire of cogni-
tive and affective empathy. Journal of Personality Assessment, 93,8495.
Results and Comments
The QCAE has been used alongside other empathy measures including the QMEE (
Mehrabian & Epstein,
1972
) and IRI (Davis, 1980, 1983) in research studies into empathy (Kempe & Heffernan, 2011) or music apprecia-
tion (
Clemens, 2012). The QCAE is the first online measure of empathy to date. However, testretest reliability
remains to be determined for the QCAE.
QCAE SAMPLE ITEMS
I can easily work out what another person might
want to talk about.
I am good at predicting what someone will do.
It worries me when others are worrying and panicky.
Friends talk to me about their problems as they say
that I am very understanding.
It is hard for me to see why some things upset people
so much.
I try to look at everybody’s side of a disagreement
before I make a decision.
Note: Items are rated on a 4-point scale ranging from:
4 5 ‘Strongly agree;’ 3 5 ‘Slightly agree;’ 2 5 ‘Slightly dis-
agree;’ and 1 5 ‘Strongly disagree’.
Picture Viewing Paradigms (PVP)
(Westbury & Neumann, 2008).
271OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Variable
In the PVP, empathy is conceptualized as an individual’ s self-reported response to empathy-eliciting visual
images.
Description
The PVP is a simple task in which images depicting individuals (term ed targets) are depicted in certain situa-
tions. Often these are negative (e.g., confinement, injury, grief), but they may also be positive. Image duration is
typically between 6 and 10 seconds. Participants view the images and make a rating response. Ratings may also
relate to dif ferent components (e.g., affecti ve and cognitive) or related constructs (e.g., sympathy, distress).
Physiological recordings may also be taken during the image presentation.
Westbury and Neumann (2008)
defined empathy on a 9-point scale as, ‘to what degree you are able to imagine feeling and experienci ng what the
target is experiencing, in other words, your ability to put yourself in the others’ situation.’ They also measured
corrugator electromyographic activity and skin conductance responses. Images were sourced from the
International Affective Picture System (IAPS;
Lang, Bradley, and Cuthbert, 1999) or other media (e.g., Internet).
Variations of the PVP were also used, such as using video clips instead of static images (
Westbury & Neumann,
2008). In addition, participants were asked to concentrate on their own feelings while viewing the images or con-
centrate on the feelings of the target a ‘self’ versus ‘other’ distinction (e.g.,
Schulte-Ru
¨
ther et al., 2008).
Sample
Westbury and Neumann (2008) used a sample of 73 undergraduates (mean age 5 22.5 years, SD 5 9.41). A sec-
ond sample comprised 33 undergraduates (mean age 5 24.6 years).
Neumann, Boyle, and Chan (2013) subse-
quently employed a sample of 26 male and 73 female Caucasian participants (mean age 5 25.44 years, SD 5 9.41)
as well as a sample of 29 male and 70 female Asian participants (mean age 5 20.89 years, SD 5 1.70).
Reliability
Internal Consistency
Westbury and Neumann (2008) reported Cronbach alpha coefficients for subjective empathy ratings of .91 (first
sample) and .94 (second sample). Subsequently, Neumann et a l. (2012) reported high alpha coefficients for
empathy-perspective taking (α 5 .98), empathy-affect (α 5 .98), and empathy-understanding ( α 5 .98), suggesting
the possibility of some narrowness of measurement (cf.
Boyle, 1991).
TestRetest
Testretest reliability has not been reported for empathy-related PVT itself. In research unrelated to empathy
that used the IAPS,
Lang et al. (1993) reported stability coefficients (time interval unspecified) for arousal
(r 5 .93), valence (r 5 .99), the corrugator response (r 5 .98) and zygomatic response (r 5 .84).
Validity
Convergent/Concurrent
Self-reported PVP empathy ratings in
Westbury and Neumann (2008), correlated positively with BEES scores
in the first (r 5 .56) and second (r 5 .43) samples. In the second sample, empathy ratings correlated positively
with ratings of sympathy (r 5 .66) and distress (r 5 .59).
Divergent/Discrim inant
Kring and Gordon (1998) used videotaped facial expressions that represented the emotions of happiness, sad-
ness, and fear. Participants watched video clips unaw are their facial expressions were being recorded during film
presentation. Following each clip, participants were asked to rate the extent to which they experienced sadness,
fear, disgust, and happiness. Females reacted more expressively than males across all film clips.
Criterion/Predictive
No criterion/predictive validity coefficients have been reported to-date.
Location
Westbury, H.R., & Neumann, D.L. (2008). Empathy-related responses to moving film stimuli depicting human
and non-human animal targets in negative circumstances. Biological Psychology, 78,6674.
272 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Results and Comments
The picture viewing paradigm is commonly employed in experimental research in which experimental manip-
ulations are used (e.g., empathy towards different animal types;
Westbury & Neumann, 2008) or in neuroscien-
tific research (e.g., fMRI). Researchers have rarely used the same stimuli across different experiments.
In addition, the results obtained depend on the specific way in which empathy-related responding is quantified
(e.g., self-report versus physiological response). The psychometric properties of the PVP approach require further
investigation.
Comic Strip Task (CST)
(Vo
¨
llm et al., 2006).
Variable
The CST paradigm as an indicator of empathy is based on how well one can correctly assess other individuals’
mental states (desires, intentions, and beliefs).
Description
The CST comes from the original version of attribution of intention by
Sarfati et al. (1997), and Brunet, Sarfati,
Hardy-Bayle, and Decety (2000)
. This is a non-verbal task that presents a series of comic strips and asks partici-
pants to choose the best one out of two or three strips on an answer card to finish the story.
Vo
¨
llm et al. (2006)
modified the original paradigm using some of the original comic strips from Brunet et al. (2000) from the ‘attribu-
tion of intention’ condition, but also generated new comic strips for assessing cognitive empathy. In the pilot
study of the empathy stimuli,
Vo
¨
llm et al. (2006) reported that participants ‘rate each cartoon for clarity and
empathic understanding on a scale from 15 (very poor, poor, average, good and excellent) ... [with] ... the fol-
lowing instruction: “The cartoon s that will be presented require you to put yours elf in the situation of the main
character”.’ There are four conditions: theory of mind, empathy, physical attribution with one character, and
physical attribution with two characters. In the cognitive empathy condition, participants choose one of two pic-
tures to finish the story that makes the main character in the story feel better.
Sample
Vo
¨
llm et al. (2006) used a small sample of 13 male participants recruited from the general community and uni-
versity populations whose mean age was 24.9 years (ranging from 19 to 36 years).
Reliability
Internal Consistency
No information is currently available on internal consistency.
TestRetest
Testretest reliability coefficients for the CST are not currently available.
Validity
Convergent/Concurrent
Evidence on convergent/concurrent validity is not currently available.
Divergent/Discrim inant
Brunet, Sarfati, Hardy-Bayle, and Decety (2003) showed that performance of schizophrenic patients was signif-
icantly lower than normal control participants on all three conditions measuring successful intention of
attribution.
Construct/Factor Analytic
Using an earlier version of the CST,
Brunet et al. (2000) defined four conditions of attribution of intention (AI),
a physical causality with characters (PC-Ch), a physical causality with objects (PC-Ob), and a rest condition.
Brunet et al. (2000) conducted a principal components analysis for all experimental conditions with two main
273OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
components extracted; the first component loaded positively on AI and PC-Ch and negatively on PC-Ob. The sec-
ond component loaded positively on PC-Ch and negatively on AI.
Criterion/Predictive
Vo
¨
llm et al. (2006) showed that affective empathy conditions activated the medial prefrontal cortex (mPFC),
temporo-parietal junction (TPj), middle temporal gyrus, middle occipital gyrus, lingualis gyrus, and cerebellum.
Affective empathy was associated with more activations of paracingulate, anterior and posterior cingulate, and
the amygdala, related to emotional processing.
Location
Vo
¨
llm, B.A. et al. (2006). Neuronal correlates of theory of mind and empathy: A functional magnet ic resonance
imaging study in a nonverbal task. NeuroImage, 29,9098.
Results and Comments
The CST may be overly simplistic and unable to appr opriately estimate an individual’s cognitive understand-
ing or responsiveness in an empathy inducing situation (
Reid et al., 2012). Also, this type of stimulus has been
characterized as not reflecting ‘real-life’ situations which are often more complex and involve multiple persons
(
Reid et al., 2012). The psychometric properties of the task require furthe r investigation. However, the CST does
provide a performance based measure (i.e., it is an actual test) of empathy, in contrast to the plethora of subjec-
tive self-report measures.
Picture Story Stimuli (PSS)
(Nummenmaa et al., 2008).
Variable
In the PSS, empathy is conceptualized as the ability to interpret visual scenes and predict the most likely
behavioral consequence based on cognitive or affec tive cues.
Description
Nummenmaa et al. (2008) used 60 digitized color pictures. The pictures comprise two categories depicting two
individuals in visually matched aversive (30) and neutral (30) scenes. Aversive pictures depict interpersonal
attack scenes, such as strangling, while neutral pictures present daily (non-emotional) scenes, such as having a
conversation. Participants are required either to ‘watch’ (as though watching TV) the scene or ‘empathize’ (men-
tally simulate how the person in the scene thinks and feels). On corners of the picture yellow arrows instruct par-
ticipants how to respond, for instance, during an ‘empathize’ block, all arrows point towards the area in which
the target of the empathy is depicted in the scene (e.g. an attacker, victim, or a person engaged in a non-
emotional activity). On ‘watch’ blocks, the arrows in the left visual field point left and those in the right visual
field point right. The pictures are matched on visual variables such as luminosity, average contrast density,
global energy, complexity, and pixel area covered by faces in each scene, as well as how often the actors looking
towards the camera.
Reliability
No testretest reliability coefficients for the PSS are currently available.
Validity
Convergent/Concurrent
No convergent/concurrent validity evidence is currently available.
Divergent/Discrim inant
No divergent/discriminant validity evidence is currently available.
274 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Criterion/Predictive
Nummenmaa et al. (2008) showed that emotional pictures depicting an attack scene increase experience of
fear, anger and disgust while decreasing experience of pleasure in participants. Affective empathy stim uli
resulted in increased activity in the thalamus (involved in emotional processing), left fusiform gyrus (face percep-
tion), right brain stem and networks associated with mirroring (inferior parietal lobule). Furthermore, the thala-
mus, primary somatosensory and motor cortices showed augmented functional coupling in relation to emotional
empathy (
Nummenmaa et al., 2008).
Location
Nummenmaa, L., Hirvonen, J., Parkkola, R., & Hietanen, J.K. (2008). Is emotional contagion special? An fMRI
study on neural systems for affective and cognitive empathy. NeuroImage, 43,571580.
Results and Comments
The PSS has not been used extensively in research into empathy. The psychometric properties of the picture
story approach, including testre test reliability, internal consistency, as well as convergent and discriminant
validity remain to be determined.
Kids’ Empathetic Development Scale (KEDS)
(Reid et al., 2012 ).
Variable
Cognitive, affective, and behavioral components of empathy are examined using emotion recognition, picture
based scenarios, and behavioral self-report te chniques.
Description
The KEDS is ‘a measure of complex emotion and mental state comprehension as well as a behavioral measure
of empathy’ (
Reid et al., 2012, p. 11). It is a multidimensional measure of empathy for school-aged children, com-
prising 12 ‘faceless’ pictographic stimuli that are scenarios of events or multiple characters. The figures are ‘face-
less’ to ensure the measurement of affective inference as opposed to emotion recognition. Emotional
identification resp onse cards consist of faces used to match up with the figures in scenes. Faces incorporate both
simple (happy, sad, angry) and complex (relaxed, surprised, afraid) emotions. Prior to administration, children
are shown the emotional identification response cards and identify the sex, mental, and emotional states.
Children ascribe one of six emotions presented to a person/s in each of the scenes by pointing to the picture or
by verbally labeling the emotion. Following each stimulus presentation, children are prompted with questions
pertaining to inferred affective empathy (e.g., ‘How do you think this boy/girl/man feels?’), cognitive empathy
(e.g., ‘Can you tell me why this boy/girl/man feels ... ?’ and ‘Please tell me more about what is happening’), as
well as behavioral elements of emp athy (‘What would you do, if you were that boy/girl/man?’) . In six scenarios,
two characters have blank faces and children are asked the same questions for each. The number of males and
females presented in each scene are counterbalanced.
Sample
The initial developmental sample comprised 220 children, aged from 7 to almost 11 years (
Reid et al., 2012).
Reliability
Internal Consistency
Reid et al. (2012) reported a Cronbach alpha coefficient for all 17 character scenarios of .84, for affective (.63),
for cognitive (.82), and for the behavioral scales (.84).
TestRetest
Testretest reliability coefficients are not currently available.
275OVERVIEW OF THE MEASURES
III. EMOTION REGULATION
Validity
Convergent/Concurrent
There is a positive correlation between the cognitive and behavioral subscales (.42) (
Reid et al., 2012). Also, the
KEDS total score and cognitive and behavioral subscales correlate positively with the Bryant Index of Empathy
(.21, .14, and .20, respectively). The total and cognition scores correlate positively with both the Emotion
Vocabulary Test (
Dyck et al., 2001) and the Happe Strange Stories test (Happe, 1994). The KEDS total score corre-
lates .21 with the BEQ, emotional vocabulary (.25), while behavior scores correlate .24 with emotional vocabulary.
The Wechsler Intelligence Scale for Children (WISC-IV;
Wechsler, 2003) Full-Scale IQ, Verba l Comprehension
(VCI) and Perceptual Reasoning (PRI) subtests correlate positively with the KEDS total score, as well as with
affect and behavior subscales. KEDS total and affect scores correlate positively with Working Memory (WMI).
Divergent/Discrim inant
The KEDS total and cognition scores do not correlate with the Emotion Recognition Task (
Baron-Cohen et al.,
1997
). For total scores and for affective, cognitive, and behavioral subscales, older children exhibit significantly
higher mean scores on each scale than younger children. The KEDS total scale correlates negatively (2 .23) with
the WCST-PE, while subscale correlations with the WCST-PE were as follows: affect (2 .24), behavior (2 .18).
Also, females score more highly on total KEDS and the cognition subscale than do males (
Reid et al., 2012).
Construct/Factor Analytic
A principal components analysis with varimax rotation produced four components. The first component
exhibited the highest loadings on items with single figures, positive emotions, and unhappy situations where
affect could be inferred without other characters’ mental states; this component was labeled ‘Simple’. The second
component loaded on items of figures experiencing conflicting emotions or where an expectation was violated
(situations which involve reconciling two perspectives); this component was labeled ‘Complex’. The third compo-
nent entailed items where figure s were in conflict, attacking, or taking ad vantage of another figure; this compo-
nent was labeled ‘Aggression’. The fourth component loaded on items from a scenario that reflected a parent/
child interaction and was labeled ‘Authority’.
Criterion/Predictive
No criterion/predictive validity evidence is currently available.
Location
Reid, C., Davis, D., Horlin, C., Anderson, M., Baughman, N., & Campbell, C. (2013). The kids’ empathic devel-
opment scale (KEDS): A multi-dimensional measure of empathy in primary school-aged children. British Journal
of Developmental Psychology, 31, 231256.
Results and Comments
The KEDS aims to provide a comprehensive measure of empathy that overcomes problems in how an individ-
ual estimates empathy, the simplicity of scenarios in other story based scales, observer and expectancy bias that
transpires from self-report measures, as well as language restraints in young children. It also distinguishes
between empathy, sympathy, and distress. All KEDS scales (except cognitive subscale) display significant correla-
tions with the WISC-IV and the VCI, suggesting that performance on the KEDS depends to some extent on a
child’s general verbal comprehension. The cognitive scale, unlike the affective and behavioral scales, in most
cases does not require the child to go beyond the stimulus picture to infer the answer.
NEUROSCIENTIFIC MEASURES OF EMPATHY
Can neuroscientific measures such as MRI be used to measure empathy? The answer is Yes. To limit our mea-
surements of empathy to self-report or behavioral tasks would not satisfactori ly progress research in the field.
We include neuroscientific measures here, highlighting their importance and future use (see
Gerdes et al., 2010).
276 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Magnetic Resonance Imaging (MRI)
(cf. Banissy, Kanai, Walsh, & Rees, 2012).
Variable
MRI is a magnetic field neuroimaging technique that pro duces non-invasive images of the internal structures
of the body, including the central nervous system.
Description
An MRI scanner uses a strong magnetic field that aligns the atomic nuclei and radio frequency fields. The
resulting fields are processed by the scann er to reproduce an image of internal structures. The MRI scanner pro-
duces excellent spatial resolution (approximately 2 mm or better) and high levels of contrast between tissues of
the brain. The MRI technique is essentially a measure of the volume of certain brain regions the dependent var-
iable being a volume measure (e.g., voxtels). The MRI does require comp liance on behalf of the participant to
ensure accurate measurement (e.g., minimal movements during the scanning).
Sample
Due to the use of specialized equipment and the time consuming testing protocol, empathy assessment using
MRI has typically used small sample sizes. In addition, it is also necessary that the participants are screened to
rule out the potential influence of a range of other factors on the measurements. Screening is done for history of
psychiatric or neurological disorders, use of medications that affect central nervous system function, head
trauma, substance abuse, and other serious medical conditions.
Reliability
Inter-Rater
Levin et al. (2004) repor ted that two technicians as sessed MRI images on three separate occasions to assess
inter-rater reliability. Both technicians sho wed good intra-class correlations between trials 1 and 2 (ICC 5 .99 and
1.00) and between trials 2 and 3 (ICC 5 1.00 and 1.00). These findings were replicated by
Kumari et al. (2009).
Validity
Convergent/Concurrent
Certain brain regions subserve empathy (e.g., ACC, IFG) and so these are focused on in MRI (and fMRI)
research into empathy. Correlations between self-report measures such as the Interpersonal Reactivity Index (IRI)
and the Empathy Quotient (EQ) and these brain regions would seem to represent appropriate evidence of con-
vergent validity.
Banissy et al. (2012) exami ned the correlations between grey matter and IRI scores in 118 healthy
adults. They reported that Perspective Taking scores correlated positively with left anterior cingulate volume
(.25).
Sassa et al. (2012) examined the neural correlates between grey matter volume and scores on the child ver-
sion of the EQ in 136 boys and 125 girls (aged from 5.6 to 15.9 years). EQ scores correlated significantly (posi-
tively) with the regional grey matter volume of the precentral gyrus, the inferior frontal gyrus, the superior
temporal gyrus, and the insula.
Hooker, Bruce, Lincoln, Fisher, and Vinogradov (2011) examined the correlation
between grey matter volume, IRI scores, and Quality of Life Scale (QLS) scores in 21 schizophrenia spectrum dis-
order patients and 17 healthy co ntrols. Brain regions significantly associated with IRI Perspective Taking were
the hippocampus, anterior cingulate cortex (VMPFC), superior temporal gyrus, insula, and precuneus. In addi-
tion, there were also some regions relating to QLS-Empathy, including the insula, precentral gyrus, superior/
middle frontal gyrus, and anterior cingulate cortex.
Divergent/Discrim inant
Banissy et al. (2012) also reported evidence of divergent validity, wherein the IRI measure of Empathic
Concern was found to correlate significantly (negatively) with grey matter volume in the left inferior frontal
gyrus (2 .36). Also, Empathic Concern scores were significantly and negatively associated with left precuneus
(2 .27), left anterior cingul ate (2 .25), and left insula volume (2 .35).
277NEUROSCIENTIFIC MEASURES OF EMPATHY
III. EMOTION REGULATION
Location
Banissy, M.J., Kanai, R., Walsh, V., & Rees, G. (2012). Inter -individual differences in empathy are reflected in
human brain structure. NeuroImage, 62, 20342039.
Results and Comments
MRI measures neuroanatomical structures that subserve empathy. Taken together, the cognitive component of
empathy is associated with grey matter volume of the ventral medial Prefrontal Cortex (vmPFC), whereas the
affective component of empathy is associated with grey matter volume of the inferior frontal gyrus, insula and
precuneus. The MRI cannot show the empathic process in action. The extent to which the size of a given brain
structure reflects a particular level of empathy, remains to be determined.
Functional Magnetic Resonance Imaging (fMRI)
(cf. Singer, 2006).
Variable
Functional magnetic resonance imaging (fMRI) is an extension of MRI in which high resolution images of
activity levels in neural struct ures are obtained. Whereas MRI provides images of structural brain anatomy, fMRI
provides real-time images of brain activity by detecting increased blood supply and metabolic function (Blood
Oxygen Level Dependence or BOLD).
Description
A common technique in fMRI is blood oxygen level dependency (BOLD), which measures the hemodynamic
response related to energy use in neurons. Those neurons that are more active will consume more oxyge n. fMRI
measures are used with tasks or stimuli that elicit empathy (e.g., PVP) and the corresponding brain activati on is
measured. Like the MRI, fMRI has excellent spatial resolution (approximately 2 mm), but has comparatively
poorer temporal resolution (500 to 1000 ms). Another technique that produces spatial representations of active
neurons is positron emission tomography (PET). However, this method has not been used extensively in empa-
thy research (e.g., see
Ruby & Decety, 2004; Shamay-Tsoory et al., 2005). Researc h using fMRI reveal the follow-
ing brain regions are associated with the empathic response: medial, dorsal medial, ventromedial and
ventrolateral prefrontal cortex (Kra
¨
mer, Mohammadi, Don
˜
amayor, Samii, & Mu
¨
nte, 2010; Lawrence et al., 2006;
Seitz et al., 2008), superior temporal sulcus (Kra
¨
mer et al., 2010
), presupplementary motor area (Seitz et al., 2008;
Lawrence et al., 2006), insula and supramarginal gyrus (Lawrence et al., 2006; Carr, Iacoboni, Dubeau, Mazziotta,
& Lenzi, 2003), and amygdala (Carr et al., 2003
). Some of these findings have been extended to children (Pfeifer,
Iacoboni, Mazziotta, & Dapretto, 2008
).
Sample
As with MRI, due to the use of special ized equipment and the time consuming testing protocol, empathy
assessment using fMRI has typically used small sample sizes. In addition, it is also necessary that the participants
are screened to rule out the potential influence of a range of other factors on the measurements. Screening is
done for history of psychiatric or neurological disorders, use of medications that affect central nervous system
function, head trauma, substance abuse, and other serious medical conditions. It is also standard practice that
researchers state the number of right handed participants due to the laterality of brain functions. In many fMRI
studies, IQ scores and confirmation of normal or corrected to normal vision is also often stated.
Most research has used healthy adult participants recruited from the university population or local commu-
nity. This has included
Carr et al. (2003) who used 7 males and 4 females with a mean age of 29.0 years
(range 5 21 to 39),
Lawrence et al. (2006) wh o used 6 males and 6 females with a mean age of 32.2 years
(SD 5 9.95),
Jackson et al. (2005) who used 8 males and 7 females with a mean age of 22.0 years (SD 5 2.6 years),
Gazzola, Aziz-Zadeh, and Keysers (2006) who used 7 males and 9 females with a mean age of 31 years
(range 5 25 to 45),
Seitz et al. (2008) who used 7 males and 7 females with a mean age of 28.6 years (SD 5 5.5),
Hooker, Verosky, Germine, Knight, and D’Esposito (2010) who used 8 males and 7 females with a mean age of
21.0 years (range 5 18 to 25),
Kra
¨
mer et al. (2010) who used 11 males and 6 females with a mean age of 27.8 years
(SD 5 4.8). Unlike prior research that has used samples consisting of males and females, Nummenmaa et al.
(2008)
used only females (N 5 10) with a mean age of 26 years (SD 5 5.6 years). The resea rchers cited maximizing
278 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
statistical power as the reason for the female-only sample because females were argued to experience generally
more intense emotional responsivity. Sex differences in fMRI were specifically examined by
Schulte-Ru
¨
ther et al.
(2008)
who used 12 males with a mean age of 24.4 years (SD 5 3.0) and 14 females with a mean age of 24.8 years
(SD 5 3.7).
Xu et al. (2009) examined ethnic differences with a sample of eight male and nine female Chinese col-
lege students (mean age 5 23.0 years, SD 5 2.0) and eight male and eight female Caucasian college students
(mean age 5 23.0 years, SD 5 3.7). Few studies have used adolescent or children samp les.
Sterzer, Stadler,
Poustka, and Kleinschmidt (2007) used 12 male adolescents with conduct disorder (mean age of 12.75 years,
SEM 5 0.49) recruited from clinics of the Department of Child and Adolescent Psychiatry in Germany and com-
pared this sample with 12 healthy male adolescents (mean age of 12.5 years, SEM 5 0.45).
Pfeifer et al. (2008)
used a sample of 16 children (nine boys and seven girls) aged from 9.6 to 10.8 years (M 5 10.2 years, SD 5 0.4).
Reliability
Activations across the entire brain consistently resulted in positive correlatio ns for lateralized indices of encod-
ing (r 5 .82) and recognition (r 5 .59)
Wagner et al. (2005, p. 126).
TestRetest
Wagner et al. (2005) investigated testretest reliability of activation patterns elicited in the medial temporal
lobes using fMRI and a verbal episodic memory paradigm over a 7 to 10-month time interval. They reported sig-
nificant test retest coefficients of medial temporal lobe activations for encoding (r 5 .41) but not for recognition
(r 52.24).
Validity
Convergent/Concurrent
As indicated above in relation to MRI, certain brain regions subserve empathy (e.g., ACC, IFG) and these are
also focused on in fMRI research into empathy. Convergent val idity with self-report empathy scales and fMRI
has been obtained. The IRI Perspective Taking subscale correlates positively with activation of a mirror neuron
system for auditory stimuli related to motor exe cution (
Gazzola et al., 2006). Activation in the somatosensory cor-
tex, inferior frontal gyrus, superior temporal sulcus, and middle temporal gyrus were positively correlated with
self-reported cognitive empathy as measured by the IRI Perspective Taking and Fantasy subscales (
Hooker et al.,
2010
). Activity in the precentral gyrus was also significantly correlated with IRI Empathic Concern and IRI
Personal Distress subscales (
Hooker et al., 2010). Sterzer et al. (2007) reported that anterior insula activity was
positively associated with ImpulsivenessVenturesomenessEmpathy Questionnaire (
Eysenck & Eysenck, 1991)
scores.
Singer et al. (2004) reported that activation in the ACC and left anterior insula was positively correlated
with scores on the BEES (ACC: r 5 .52; left insula: r 5 .72) and the IRI Empathic Concern subscale (ACC: r 5 .62;
left insula: r 5 .52). A significant correlation (r 5 .77) has been found between fMRI medial prefrontal cortex activ-
ity and favorable ingroup biases (ingroupoutgroup) in ratings of the amount of empathy felt towards indivi-
duals in pain scenarios (1 5 not at all to 4 5 very much;
Mathur et al., 2010). Shamay-Tsoory et al. (2005) used
Positrom Emission Tomography (PET) and showed that the cerebellum, thalamus, occipitotemporal cortex, and
frontal gyrus were more strongly activated during an empathy eliciting interview than a neutral interview.
Divergent/Discrim inant
Xu et al. (2009) using fMRI showed that Caucasian and Chinese participants who viewed imag es of faces
receiving a painful injection showed more activity of the ACC and insular cortex if those images depicted people
of their own ethnicity than if they depicted people of another ethnic group. African-American participants have
shown greater activity of the medial prefrontal cortex when viewing members of their same ethnic group than
other ethnic groups (
Mathur et al., 2010).
Likewise, sex differences in brain regions activated during fMRI are apparent from various research studies.
For example,
Schulte-Ru
¨
ther et al. (2008) tested 12 males and 14 females in a picture viewing paradigm.
Participants viewed synthetic fearful or angry faces and were asked to concentrate on their own feelings when
viewing the faces (self-task) or on the emotional state in the target (other-task). Female participants scored mo re
highly on the BEES and rated the intensity of their own emotions when viewing the stimuli as higher than male
participants. Sex differences in fMRI were found in the comparison of the self-task with a baseline task wherein
females showed stronger activatio n of the right inferior frontal cortex, right superior temporal sulcus, and right
cerebellum than males. Males showe d stronger activation of the left temporoparietal junction than females. In the
279NEUROSCIENTIFIC MEASURES OF EMPATHY
III. EMOTION REGULATION
comparison of the other-task with the baseline, females showed stronger activation in the inferior frontal cortex
than males.
Criterion/Predictive
Using fMRI,
Jackson et al. (2005) ask ed participants to imagine the feelings of another person and oneself in
painful situations and to rate the pain level from different perspectives. Adopting the pers pective of the other
person was found to correlate positively with regional activation in the posterior cingulate/precuneus and right
temporo-parietal junction.
Jackson et al. (2006) found in a sample of 15 healthy adults that subjective ratings of
pain of targets in photographic stimuli correlated significantly with activity in the anterior cingulate cortex sug-
gesting predictive validity for the brain region activations, and possibly of empathy for pain in others.
Nummenmaa et al. (2008) compared fMRI scans to images designed to elicit affective or cognitive components of
empathy. The cognitive empathy conditions depicted targets in everyday situations, whereas the affective condi-
tions depicted targets in hard, threat, or suffering situations. The affective condition elicited greater activation of
the thalamus (emotion processing), fusiform gyrus (face and body perception), and inferior parietal lobule and
premotor cortex (mirroring of motor actions) than did the cognitive condition.
Location
Singer, T. (2006). The neuronal basis and ontogeny of empathy and mind reading: Review of literature and
implications for future research. Neuroscience and Biobehavioral Reviews, 30, 855863.
Results and Comments
Among fMRI and PET research analyzing empathy, most of the studies have investigated emp athy for pain
(
Jackson et al., 2005), disgust (Wicker et al., 2003; Benuzzi, Lui, Duzzi, Nichelli, & Porro, 2008), threat
(Nummenmaa et al., 2008
) and pleasantness (Jabbi, Swart, & Keysers, 2007). Research that examines empathy
using stimuli depicting facial expressions in different situations or social interactions is at risk of confounding
empathy with emotion perception. In addition, fMRI and PET research is interpreted to reflect the neural
responses related to empathy. However, it might be argued that such responses are actually related to aversive
responses coupled with motor preparation for defe nsive actions in general (
Yamada & Decety, 2009).
Facial Electromyography (EMG)
(cf. Westbury & Neumann, 2008).
Variable
Electromyography is the measurement of the electrical potentials produced by skeletal muscles when they con-
tract (
Neumann & Westbury, 2011). In contrast to alternative approaches to measuring facial expressions (e.g.,
observer ratings), EMG activity has the advantage of being able to detect muscle activity that occurs below the
visual threshold. It provides a non-verbal index of motor mimicry which many theorists argue underlies
empathic responding (e.g., Preston & de Waal, 2008).
Description
Facial EMG recordings can be obtained by attaching small surface electrodes on the skin over the site of the
muscles that play a role in the facial expression of interest. These muscles are primarily the corrugator supercilli,
zygmaticus major, lateral frontalis, medial frontalis, levator labii superioris, orbicularis oculi, and masseter. Inferenc es
regarding the intensity of the facial expression are gained by measuring the magnitude of the EMG signal.
However, the application of electrodes onto the face may increase awareness of facial expressiveness and lead to
exaggerated facial reactions or more general demand characteristics.
Sample
The four studies that have examined facial EMG measurement of empathy have sampled from healthy adult
university populations.
Westbury and Neumann (2008) used 36 male and 37 female university students with a
mean age of 22.5 years (SD 5 6.9). Similarly,
Sonnby-Borgstro
¨
m (2002) used 21 male and 22 female university stu-
dents with a median age of 23 years (range 18 to 37) and Sonnby-Borgstro
¨
m, Jo
¨
nsson, and Svensson (2003) used
36 male and 24 female university students with a median age of 22 years (range 19 to 35).
Brown, Bradley, and
280 10. MEASURES OF EMPATHY: SELF-REPORT, BEHAVIORAL, AND NEUROSCIENTIFIC APPROACHES
III. EMOTION REGULATION
Lang (2006) recruited two samples from a university population: one consisting of 21 male and 22 female African
Americans and the other 20 male and 20 female European Americans. The ages for each sample were not
described, although it was reported that 98% of the total sample were aged between 17 and 25 years.
Reliability
Internal Consistency
Westbury and Neumann (2008) reported a Cronbach alpha coefficient of .92 over all stimuli used in a picture-
viewing paradigm.
TestRetest
‘Facial EMG shows moderate testretest stability over relatively long intervals...’(
Harrigan, Rosenthal, &
Scherer, 2008
, p. 41).
Validity
Convergent/Concurrent
‘Facial EMG has high concurrent validity with visible intensity changes in onset phase of zygomatic major,
with average correlation above 0.90.’ (
Harrigan et al., 2008, p. 40). Westbury and Neuman n (2008) reported that
ratings on the BEES were significantly correlated with corrugator EMG activity when viewing images of human
and non-human animals in negative circumstances (r 5 .35). In addition, subjective ratings of empathy towards
the targets in the images were significantly correlated with corrugator EMG activity (r 5 .41). Subjective empathy
ratings and corrugator EMG showed the same pattern across different animal groups (e.g., higher for human tar-
gets than bird targets). Activity of the orbicularis oculi muscle when viewing another person receiving painful
sonar treatment has shown to be significantly correlated with scores on the IRI per spective taking subscale
(r 5 .39). Facial EMG during pictures of happy and angry facial expressions has been shown to be correlated with
scores on the EETS (
Sonnby-Borgstro
¨
m, 2002; Sonnby-Borgstro
¨
m et al., 2003). In recordings of the orbicularis
occuli, indicative of wincing, participants showed greater activity relative to a pre-stimulus baseline when view-
ing others undergoing painful sonar treatment when taking the perspective of the other person (Lamm, Porges,
Cacioppo, & Decety, 2008
).
Divergent/Discrim inant
Brown et al. (2006) conducted a study in which African American and European American participants
viewed images depicting pleasant and unpleasant facial expressions. African American participants showed
larger corrugator EMG responses to unpleasant pictures of Black targets than to unpleasant pictures of White tar-
gets. However, the same ethnic difference was not found in the European American participants. Sex differences
may also be observed in facial EMG (
Dimberg & Lundquist, 1990).
Location
Westbury, H.R., & Neumann, D.L. (2008). Empathy-related responses to moving film stimuli depicting human
and non-human animal targets in negative circumstances. Biological Psychology, 78,6674.
Results and Comments
EMG activity is advantageous in its ability to detect muscle activity that occurs below the visual threshold.
Although, researchers should take care to ensure that any motor mimicry observed through facial EMG recording
reflect the stimuli the participant is being exposed to and not othe r stimuli. For example, corrugator EMG can be
elicited by non-facial visual stimuli and even sounds (
Larsen, Norris, & Cacioppo, 2003).
Electroencephalogram (EEG) and Event Related Potentials (ERPs)
(cf. Neumann & Westbury, 2011).
Variable
The EEG and ERP measure the electrical activity produced by the firing of neurons in the scalp. The firing of
the neurons is presumed to reflect psychological processes, including the empathic response. Short-term changes
in the EEG are termed event-related potentials (ERPs).
281NEUROSCIENTIFIC MEASURES OF EMPATHY
III. EMOTION REGULATION
Description
The recordings are taken through electrodes placed on the surface of the scalp. Electrode locations are based
on the 1020 System that defines regions as frontal (F), central (C), parietal (P), temporal (T), and occipital (O).
Electrode caps are designed to correspond to these regions and may contain 32, 64, 128, or 256 potential electrode
locations. The EEG signal is characterized according to the pattern of brain waves defined according to the fre-
quency band in which they are found. The frequency bands include alpha (8 to 13 Hz), beta (14 to 30 Hz), gamma
(30 to 100 1 Hz), theta (4 to 7 Hz), and delta (0.5 to 3.5 Hz). ERPs are described in terms of whether the potential
is a positive or negative wave and the latency in which the wave occurs. The N100, for examp le, is a negative
change that occurs approximately 100 ms following stimulus onset. EEG and ERP show excellent temporal reso-
lution by being able to sample brain activity at 2000 Hz or better.
Sample
Light et al. (2009) examined data from children aged 6 years (8 children), 7 years (25 children), 8 years (45 chil-
dren), 9 years (27 children) and 10 years (6 children). The resulting sample had a mean age of 7.92 years
(SD 5 0.98) and consisted of 52 males and 56 female s. In their research,
Gutsell and Inzlicht (2012) tested 17 male
and 13 female White right-handed university students with a mean age of 18.46 years ( SD 5 3.81).
Mu, Fan, Mao,
and Han (2008)
recruited 11 male and 4 female adults with a mean age of 20.8 years (SD 5 1.82) and who were
all right handed and had normal vision. Similarly,
Han, Fan, and Mao (2008) recruited 13 males (mean age 5 20.9
years, SD 5 2.25) and 13 females (mean age 5 21.0 years, SD 5 1.47) that were screened for normal or corrected to
normal vision and were all right handed.
Reliability
Schmidt et al. (2012) reported evidence of ERP reliability and split-half reliabi lity.
TestRetest
Schmidt et al. (2012) also reported testretest reliability in absolute frontal (r 5 .86 to .87) central (r 5 .94), and
parietal (r 5 .95 to .96) EEG alpha power for a resting condition over a one-week period.
Williams, Simms, Clark,
and Paul (2005)
reported that over a 4-week interval, for both eyes open and eyes closed resting periods (of two
minutes duration), that EEG data did not differ across sessions with respect to alpha, beta, theta, and delta
waves. Testretest reliability coefficients (r 5 .71 to .95) were reported with larger reliability coefficients for eyes
open as compared with eyes closed conditions. Numerous other studies have also provided evidence for
testretest reliability of up to 1-year (
Cassidy, Robertson, & O’Connell, 2012; Ha
¨
mmerer, Li, Vo
¨
lkle, Mu
¨
ller, &
Lindenberger, 2012; Segalowitz & Barnes, 2007). Williams et al. (2005)
revealed that for oddball targets, N100
amplitude and latency (.76 and .72 respectively), P200 amplitude (.68), N200 amplitude and latency (.47 and .71
respectively) and P300 latency (.56) all provided significant partial correlations over a 4-week interval. For odd-
ball non-targets, N100 amplitude and latency (.74 and .63 respectively) and P200 amplitude and lat ency (.82 and
.62 respectively) also showed moderate test retest reliability. Furthermore,
Williams et al. (2005) provided
testretest reliability coefficients for P150 amplitude and late