ArticlePDF Available

Viewers extract the mean from images of the same person: A route to face learning

Authors:

Abstract and Figures

Research on ensemble encoding has found that viewers extract summary information from sets of similar items. When shown a set of four faces of different people, viewers merge identity information from the exemplars into a representation of the set average. Here, we presented sets containing unconstrained images of the same identity. In response to a subsequent probe, viewers recognized the exemplars accurately. However, they also reported having seen a merged average of these images. Importantly, viewers reported seeing the matching average of the set (the average of the four presented images) more often than a nonmatching average (an average of four other images of the same identity). These results were consistent for both simultaneous and sequential presentation of the sets. Our findings support previous research suggesting that viewers form representations of both the exemplars and the set average. Given the unconstrained nature of the photographs, we also provide further evidence that the average representation is invariant to several high-level characteristics.
Content may be subject to copyright.
Viewers extract the mean from images of the same person: A
route to face learning
Robin S. S. Kramer
#
$
School of Psychology, University of Aberdeen,
Aberdeen, UK
Department of Psychology, University of York, York, UK
Kay L. Ritchie
#
$
School of Psychology, University of Aberdeen,
Aberdeen, UK
Department of Psychology, University of York, York, UK
A. Mike Burton
#
$
School of Psychology, University of Aberdeen,
Aberdeen, UK
Department of Psychology, University of York, York, UK
Research on ensemble encoding has found that viewers
extract summary information from sets of similar items.
When shown a set of four faces of different people,
viewers merge identity information from the exemplars
into a representation of the set average. Here, we
presented sets containing unconstrained images of the
same identity. In response to a subsequent probe,
viewers recognized the exemplars accurately. However,
they also reported having seen a merged average of
these images. Importantly, viewers reported seeing the
matching average of the set (the average of the four
presented images) more often than a nonmatching
average (an average of four other images of the same
identity). These results were consistent for both
simultaneous and sequential presentation of the sets.
Our findings support previous research suggesting that
viewers form representations of both the exemplars and
the set average. Given the unconstrained nature of the
photographs, we also provide further evidence that the
average representation is invariant to several high-level
characteristics.
Introduction
When viewers are shown sets of perceptually similar
items, there is growing evidence to suggest that
summary statistics, such as the mean, may be
represented. This ‘‘ensemble encoding’’ is thought to
provide an efficient way of summarizing both low-level
and more complex scene information. For example,
when participants were shown sets containing circles of
different sizes, they tended incorrectly to identify a test
circle as having been present when it had a similar size
to the mean of the set (Ariely, 2001). In addition,
participants were near chance when asked to identify
which circles had actually been present. This now
common pattern of findings is interpreted as viewers
forming an accurate representation of the average of a
set while retaining less (if any) information regarding
individual exemplars. As well as basic size averaging,
similar results have been found, for example, with
judgments of orientation (Robitaille & Harris, 2011),
speed (Atchley & Andersen, 1995), and dynamic
displays (Albrecht & Scholl, 2010).
More recently, researchers have begun to consider
whether viewers also encode the average for a more
complex set of stimuli: human faces. Evidence suggests
that participants form an accurate representation of the
average emotional expression (Haberman, Harp, &
Whitney, 2009; Haberman & Whitney, 2007, 2009),
gender (Haberman & Whitney, 2007), and gaze
direction (Sweeny & Whitney, 2014) from a set of faces.
Further, this process is not mediated by low-level
features, luminance cues, or other nonconfigural cues
(Haberman & Whitney, 2009). Of note, given the rapid
extraction of summary information (e.g., the average
expression from 16 faces presented for 500 ms or less),
this ensemble encoding is likely distinct from the
prototype effect (building an abstract prototypical
representation based on repeated occurrences), which
typically operates over the order of minutes (e.g., Fiser
& Aslin, 2001).
Citation: Kramer, R. S. S., Ritchie, K. L., & Burton, A. M. (2015). Viewers extract the mean from images of the same person: A
route to face learning. Journal of Vision,15(4):1, 1–9, http://www.journalofvision.org/content/15/4/1, doi:10.1167/15.4.1.
Journal of Vision (2015) 15(4):1, 1–9 1http://www.journalofvision.org/content/15/4/1
doi: 10.11 6 7 / 1 5 . 4.1 ISSN 1534-7362 Ó2015 ARVOReceived October 29, 2014; published April 17, 2015
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
Several studies have now demonstrated that identity
information is also represented by summary statistics.
When shown four images of different people, partici-
pants extract the mean identity, no matter whether
these faces are unfamiliar (de Fockert & Wolfenstein,
2009) or familiar (Neumann, Schweinberger, & Burton,
2013). Importantly, subsequent studies have ruled out
the possibility that viewers are simply extracting the
mean retinal image by presenting sets of faces that
incorporated different viewpoints (Leib et al., 2014). As
such, the evidence suggests that ensemble encoding can
operate on viewpoint-invariant representations, which
broadens its applicability and usefulness for real-world
scenes.
To date, research investigating the ensemble encod-
ing of identity has only considered multiple identities.
By averaging across people, the suggestion is that
viewers form the gist of a crowd, for example.
Although it may be useful to encode the average
expression (‘‘this crowd looks angry’’), it is less clear
why a representation of the average identity may be
beneficial (‘‘the average of all these people’s faces would
look like this’’). In contrast, if we are exposed to
multiple instances of a single person, perhaps over
several encounters or movies, then encoding the
average of those instances has clear advantages. The
average of a set of instances can provide a stable
representation of an individual by washing away
aspects of the set that change from one photo to the
next while preserving aspects that are consistent across
the set (Burton, Jenkins, Hancock, & White, 2005;
Jenkins & Burton, 2008). These representations are also
robust to errors in that incorporating a few photo-
graphs of the wrong person makes little difference to
the average (Jenkins, Burton, & White, 2006; see also
Haberman & Whitney, 2010, for evidence that the
visual system discounts outliers when encoding the
average emotional expression). As such, encoding an
average for a within-person set of images may underpin
the process of familiarity through the buildup of
exposure to different instances. In the current studies,
we focus on this within-person encoding by only
including images of a single identity in each trial.
Although some experiments involving faces have
utilized simultaneous presentation (e.g., de Fockert &
Wolfenstein, 2009; Neumann et al., 2013), others have
implemented a sequential design (e.g., Haberman et al.,
2009; Leib et al., 2014). Even when viewers were
presented with face images one after the other, results
have demonstrated that the average of these images was
encoded. In the following two studies, we investigate
both methods of presentation.
Previous experiments have, for the most part, used
relatively homogeneous, gray scale stimuli (e.g., with
little variability in pose). By specifically varying pose,
Leib and colleagues (2014) demonstrated how encoded
representations are viewpoint-invariant. Here, we use
‘‘ambient’’ color images (Jenkins, White, Van Mon-
tfort, & Burton, 2011). These are photographs sampled
from the real world, and they incorporate a great deal
of variability in pose but also in lighting, expression,
focal length, etc. Therefore, encoding of the average of
these images would need to take place at a sufficiently
high level to deal with these differences.
Finally, given evidence of identity averaging for both
unfamiliar and familiar faces (de Fockert & Wolf-
enstein, 2009; Neumann et al., 2013), we included
consideration of familiarity in the current research in
order to allow for a direct comparison of these two
categories of faces. For within-person set averaging, it
may be that encoding the average of a familiar person is
in some way disrupted by our previous experiences with
that identity, including exposure to a potentially large
number of prior exemplars. Equally, recognition of that
identity may hinder (or help) the encoding process.
Experiment 1: Simultaneous
presentation
In this experiment, we presented participants with
four images of the same person simultaneously. As
such, we investigated whether participants represented
simultaneously presented images of the same person as
an average.
Methods
Participants
Twenty undergraduate students from the University
of Aberdeen (12 women; age M¼24.1 years, SD ¼9.3
years) volunteered to take part in the study and
received money for their participation. All provided
informed consent prior to participation (in accordance
with the ethical standards stated in the 1964 Declara-
tion of Helsinki).
Stimuli
Thirty images were downloaded from the Internet
for each of 20 celebrities. We used celebrity photos in
order to ensure that many images of each person were
available. Ten of these celebrities (five women) were
Hollywood actors and so were chosen to be familiar to
participants. The other 10 (five women) were Austra-
lian celebrities, who were selected in order that they
would be unfamiliar to UK participants.
For each identity, we entered the name into Google
Images as a search term along with criteria specifying
full-color, large, face images only. We then chose the
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 2
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
first 30 images delivered that met the following criteria:
(a) no part of the face should be obscured (for example
by clothing, glasses, or a hand); (b) pose should be very
broadly full-face in order to allow the placement of
landmarks; and (c) pose should be standing or sitting,
but not lying down, in order to limit the angle of the
head to relatively upright. Note that as a result of
obtaining images from the Internet, image variation
(lighting, pose, expression, age, etc.) for each identity
was large (for an example, see Figure 1). All images
were cropped and rotated so that both pupils were
aligned to the same transverse plane. Images were also
resized so that they appeared as 11.8 cm high with a
varying width of approximately 7.5 cm on-screen.
The first 28 images of each identity were divided into
seven sets of four images, arbitrarily based on the order
in which they were downloaded. For each set, the
average was created by morphing across the four
images using custom MATLAB software. The first four
sets were chosen as the display sets, i.e., those that
participants would view during the experiment. The
other three formed nondisplay sets and provided
additional averages for use as test faces (see below).
Procedure
The procedure closely followed that of Neumann
and colleagues (2013). Participants were shown four
trials for each identity (80 trials in total). In each trial, a
central fixation cross appeared for 1 s. This was
followed by four images presented simultaneously (the
display set) for 1500 ms with each image randomly
assigned to one of four specified positions on-screen.
Immediately following the display set (interstimulus
interval [ISI] ¼0), a test face was presented for 500 ms,
smaller in size than the display set images (7.9 cm ·
approximately 5 cm). Participants used both index
fingers to indicate via button press whether the test face
had or had not been present in the previous display set.
Test faces were (a) a matching exemplar (a randomly
selected image from the preceding display set), (b) a
nonmatching exemplar (a randomly selected image that
was not seen individually or within an average for that
identity in other trials), (c) a matching average (the
average of the four display set images), or (d) a
nonmatching average (the average of four different
images, randomly selected from the nondisplay sets).
Figure 1 provides an example display set and possible
test faces. A blank screen lasting 2200 ms followed the
test face, allowing for a total response window of 2700
ms.
For each of these four conditions, 20 trials were
presented—one for each identity. These 80 trials were
presented in a random order for each participant.
It is important to note that for each of the four trials
for a given identity, all images in the display sets and all
test face exemplars and averages contained only images
of that identity. As such, our focus was solely on
within-person representations.
Prior to the experiment proper, participants were
given 16 practice trials and provided with trial-by-trial
on-screen feedback on their accuracy. (No feedback
was given during the actual experiment.) Note that the
correct answer to average test faces is always ‘‘absent.’’
In order to prevent participants from learning this
association, averages were not presented in the practice
block. None of the four practice identities appeared in
the experimental block.
After completing the practice and experimental
blocks, the familiarity of the 20 experiment identities
was checked by giving participants a new image (which
was not part of the stimulus set) of each celebrity on a
printed sheet and asking if they were familiar with that
person. It was made clear that familiarity referred to
prior experience rather than what participants had
seen during the experiment. As expected, familiarity
with Hollywood celebrities was high (number of
identities recognized M¼8.5, SD ¼1.7), and
familiarity with Australian celebrities was low (M¼
0.7, SD ¼0.8).
Figure 1. An example display set, followed by the four possible
test faces. From left to right, a member of the presented set, an
exemplar that did not appear in any display set, the morphed
average of the presented set, and the morphed average of a
nondisplay set. Note the variability in the display set images.
(Copyright restrictions prevent publication of the original
images used in these experiments. Images shown here, also
used in Figure 3, feature an identity who did not appear in the
experiments. She has given permission for her images to be
reproduced here.)
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 3
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
Results and discussion
Response data for Experiment 1 are shown in Figure
2. Data were entered into a 2 (Familiarity: Familiar,
Unfamiliar) ·2 (Image Type: Exemplar, Average) ·2
(Test Face: Matching, Nonmatching) ANOVA. All
factors were within-subjects. We found a significant
main effect of Test Face, F(1, 19) ¼213.87, p,0.001,
g
2p
¼0.92, with participants responding ‘‘present’’ more
often for test faces that matched the preceding set (M¼
80.4%) than for those that did not (M¼41.3%). We
also found a significant Familiarity ·Test Face
interaction, F(1, 19) ¼24.41, p,0.001, g
2p
¼0.56.
Simple main effects showed higher ‘‘present’’ responses
for matching test faces in both the Familiar condition,
F(1, 19) ¼131.93, p,0.001, g
2p
¼0.87, and the
Unfamiliar condition, F(1, 19) ¼168.16, p,0.001, g
2p
¼0.90, with the interaction being driven by a larger
effect for unfamiliar faces. No other effects or
interactions were significant.
This is an interesting result. It is clear from the
‘‘exemplar’’ conditions that participants were sensitive
to the faces they had actually seen, giving significantly
more ‘‘present’’ responses to images they had seen over
those they had not. However, what is particularly
interesting is that this effect was exactly replicated for
the ‘‘average face’’ test stimuli. So participants were just
as likely to claim that they had seen the average of the
display set. This effect is not a simple preference for
averages: Participants claimed to have seen the average
of the particular photos from the display set, making
fewer ‘‘present’’ responses to another average of this
person derived from different photos. Given the fact
that averages of a particular face will eventually
converge, given larger sample sizes, this is rather a
striking result. It suggests that the average formation is
quite tightly image-bound while, at the same time, not
being due to low-level perceptual averaging (Leib et al.,
2014).
We propose that this averaging process may underlie
face learning. If people are able to extract an average
from different photos of the same person, this is a fast
route to forming a robust representation that can be
used for subsequent recognition of that person.
However, in natural settings, we never see the same face
represented in different ways simultaneously. In order
for this mechanism to be a plausible one for the process
of face learning, it should also be evident following
sequential presentation of images. This is explored in
Experiment 2.
Experiment 2: Sequential
presentation
In this experiment, we presented participants with
four images of the same person sequentially. In other
respects, the design was the same as in Experiment 1.
As above, our aim is to establish whether participants
acquire an average representation of multiple images of
the same person. In particular, do they believe they
have seen an average of the set when, in fact, they have
not?
Methods
Participants
A further 20 undergraduate students from the
University of Aberdeen (16 women; age M¼22.1 years,
SD ¼4.7 years) volunteered to take part in the study
and received money for their participation. None had
taken part in the previous experiment. All provided
informed consent prior to participation (in accordance
with the ethical standards stated in the 1964 Declara-
tion of Helsinki).
Figure 2. Mean percentage of ‘‘present’’ responses for (a) familiar and (b) unfamiliar test faces. Error bars represent 95% confidence
intervals.
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 4
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
Procedure
The stimuli were those used in Experiment 1. The
procedure was also identical to the first experiment with
one important difference: The four images for each
display set were presented sequentially. In each trial, a
central fixation cross appeared for 1 s. This was
followed by the four images, presented one at a time in
a random order. Each image appeared on-screen for
375 ms (a quarter of the presentation time for all four
images in Experiment 1; Neumann et al., 2013). In
order to avoid the possibility of low-level perceptual
averaging simply due to the overlap in locations on-
screen of the four images (e.g., if all images were
presented centrally), each image appeared with its
center at a random position along the circumference of
a circle of radius 4.2 cm (see Figure 3). Within each
trial, no two images appeared at an angle of less than
308to each other around this circle. A blank screen
appeared after each image for 375 ms. Prior to the
presentation of the test face, a central fixation cross
appeared on-screen for 1 s in order to highlight for
participants that the sequence had finished and the next
image would be the test face. All other details remained
unchanged from the first experiment.
After completion of the practice and experimental
blocks, the familiarity of the identities was checked as
in Experiment 1. As expected, familiarity with Holly-
wood celebrities was high (M¼8.1, SD ¼2.1), and
familiarity with Australian celebrities was low (M¼0.7,
SD ¼0.9).
Results and discussion
Response data for Experiment 2 are shown in Figure
4. Data were entered into a 2 (Familiarity: Familiar,
Unfamiliar) ·2 (Image Type: Exemplar, Average) ·2
(Test Face: Matching, Nonmatching) ANOVA. All
factors were within-subjects. We found a significant
main effect of Test Face, F(1, 19) ¼113.25, p,0.001,
g
2p
¼0.86, with participants responding ‘‘present’’ more
often for test faces that matched the preceding set (M¼
82.8%) than for those that did not (M¼43.5%). We
also found a significant Familiarity ·Test Face
interaction, F(1, 19) ¼7.56, p¼0.013, g
2p
¼0.29. Simple
main effects showed a larger ‘‘present’’ response for
matching test faces in both the Familiar condition,
F(1, 19) ¼54.16, p,0.001, g
2p
¼0.74, and the
Unfamiliar condition, F(1, 19) ¼119.00, p,0.001, g
2p
¼0.86, with the interaction being driven by a larger
effect for unfamiliar faces.
We also found a significant Image Type ·Test
Face interaction, F(1, 19) ¼4.44, p¼0.049, g
2p
¼0.19.
Simple main effects showed a larger ‘‘present’
response for matching test faces in both the Exemplar
condition, F(1, 19) ¼101.06, p,0.001, g
2p
¼0.84, and
the Average condition, F(1, 19) ¼62.40, p,0.001, g
2p
¼0.77, with the interaction being driven by a larger
effect for exemplars. No other effects or interactions
were significant.
Once again, we find the same effect as in Experiment
1. Participants are equally willing to claim that they
have seen the average of a set as they are to recognize a
real exemplar. As in the previous experiment, this effect
is tied specifically to the average of the images they
have seen with averages of novel photos being rejected
at a similar rate as novel instances. This seems to be
good evidence for the proposal that viewers automat-
ically extract the average of a set of faces of the same
person—a mechanism that could plausibly underlie
face learning.
Combined analysis of Experiments 1
and 2
Given the similar patterns of results for the two
experiments, we carried out a mixed ANOVA to
determine whether there was a significant effect of the
type of presentation. Data from the two experiments
were entered into a 2 (Presentation Type: Simulta-
neous, Sequential) ·2 (Familiarity: Familiar, Unfa-
miliar) ·2 (Image Type: Exemplar, Average) ·2 (Test
Face: Matching, Nonmatching) ANOVA. Presentation
Type was between-subjects, and the remaining factors
were within-subjects. We found no main effect of
Presentation Type, F(1, 38) ¼0.75, p¼0.392, g
2p
¼0.02,
Figure 3. Example of sequential presentation. Each face in the
sequence appeared onscreen for 375 ms, and the ISI was also
375 ms.
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 5
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
and no significant interactions involving this factor (all
ps.0.387).
We found a significant main effect of Test Face,
F(1, 38) ¼295.73, p,0.001, g
2p
¼0.89, with
participants responding ‘‘present’’ more often for test
faces that matched the preceding set (M¼81.6%) than
for those that did not (M¼42.4%). In addition, we
found a significant main effect of Familiarity, F(1, 38) ¼
4.69, p¼0.037, g
2p
¼0.11, with a larger ‘‘present’
response for familiar test faces (M¼63.9%) than for
unfamiliar ones (M¼60.1%). There was also an almost
significant main effect of Image Type, F(1, 38) ¼4.06, p
¼0.051, g
2p
¼0.10, with participants responding
‘‘present’’ more often for average test faces (M¼63.8%)
than for exemplars (M¼60.2%).
These main effects were qualified by two interac-
tions. The first was a significant Familiarity ·Test
Face interaction, F(1, 38) ¼27.13, p,0.001, g
2p
¼0.42.
Simple main effects showed a larger ‘‘present’’ response
for matching test faces in both the Familiar condition,
F(1, 38) ¼148.02, p,0.001, g
2p
¼0.80, and the
Unfamiliar condition, F(1, 38) ¼280.97, p,0.001, g
2p
¼0.88, with the interaction being driven by a larger
effect for unfamiliar faces. The second was a significant
Image Type ·Test Face interaction, F(1, 38) ¼4.82, p
¼0.034, g
2p
¼0.11. Simple main effects showed a larger
‘‘present’’ response for average test faces in the Non-
matching condition, F(1, 38) ¼7.29, p¼0.010, g
2p
¼
0.16, but no difference between averages and exemplars
in the Matching condition, F(1, 38) ¼0.03, p¼0.874,
g
2p
,0.01.
To sum, these results mirrored those found when
each experiment was analyzed separately as we would
expect because there was no effect of presentation type.
In addition, we found that participants responded
‘‘present’’ significantly more often when a nonmatching
average was presented in comparison with a non-
matching exemplar while no difference was found for
matching test faces. This result was suggested by the
findings of Experiment 2 and has been confirmed here.
General discussion
We investigated set averaging for both simultaneous
and sequential presentation designs. In contrast with
previous work, we used color images that incorporated
a large amount of variability, and each display set
contained images of only one identity. For both
experiments, we found a consistent pattern of results.
First, participants demonstrated good memory for
exemplars for both methods of presentation. That is to
say, participants were able to report accurately that a
test exemplar was present (approximately 80%correct)
in the display set they had previously seen. Participants
found it harder to correctly report the absence of a test
exemplar (approximately 40%incorrect). This inaccu-
racy is higher than in previous work (de Fockert &
Wolfenstein, 2009; Neumann et al., 2013) and is likely
to arise because in the experiments presented here all
four display set images were of the same identity.
Perceiving that a fifth, novel image of the same identity
was not present may be more difficult than comparing
this image with four previous images that all depicted
different identities as was the case in previous studies.
Second, we find clear evidence for encoding the
average. In both experiments, participants responded
‘‘present’’ in around 80%of trials in which the test face
was the matching average. This suggests that viewers
formed an average representation of the four images
with the result that they believed they had previously
seen that average when asked. Importantly, partici-
pants were significantly less likely to think they had
seen an average of a different set of four images of the
same identity. This is an important result, demonstrat-
ing that it is not simply any average image that causes
viewers to respond ‘‘present’’—even an average made
up of different images of the same person. Therefore,
participants must be forming an average representation
that is specific to the images they have encountered.
Figure 4. Mean percentage of ‘‘present’’ responses for (a) familiar and (b) unfamiliar test faces. Error bars represent 95% confidence
intervals.
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 6
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
Third, although we find these patterns of results for
both familiar and unfamiliar faces, the sizes of the
effects are larger for unfamiliar identities. Figures 2 and
4 suggest that this may be due to more ‘‘present’’
responses for nonmatching test faces in the familiar
condition. This makes intuitive sense in that, for
familiar identities, even when the test face is non-
matching (a new image of the person or a new average),
participants are more likely to think they have already
seen the image because they have prior experience with
that identity and so may feel they have seen images that
have not appeared in the experiment. However, given
the relatively small effect size of this interaction, we
would recommend further research before drawing any
conclusions from this result.
Fourth, for the sequential method of presentation,
we found a just significant interaction between Image
Type and Test Face. This effect was also present in the
combined analysis. Inspection of the figures suggests
that viewers are slightly less likely to respond ‘‘present’’
for nonmatching exemplars in comparison with non-
matching averages. Again, this follows intuition in that
a nonmatching average may appear more similar to the
display set images than a nonmatching exemplar,
resulting in more incorrect ‘‘present’’ responses. How-
ever, we note the very small effect size here.
Other than this slight difference for nonmatching
exemplars and averages, we find no effects due to
Image Type. We find no evidence in the current
experiments to suggest viewers form a strong repre-
sentation of the matching average while failing to
represent the individual exemplars. This result is in line
with some research (Neumann et al., 2013) but
contrasts with other work (Haberman & Whitney,
2007, 2009). Although representing both exemplars and
their average appears inefficient as a solution, it may be
that hierarchical representations in working memory
are formed at multiple levels of abstraction (Brady &
Alvarez, 2011). Items in working memory may benefit
from a combination of representations, in which
information about the average can increase accuracy
when exemplar memory is unreliable or inaccurate.
However, there is an alternative interpretation.
Although viewers appear to remember individual
exemplars, it may be that a strong representation of the
average is sufficient for producing this apparent
accuracy. Matching exemplars will always be more
similar to the matching average than nonmatching
exemplars. Therefore, simply by referencing an encod-
ed average representation, viewers may be able to
discriminate, at least to some extent, between matching
and nonmatching exemplars. Indeed, previous research
provides direct evidence against the idea that individual
exemplars are represented (Ariely, 2001; Corbett &
Oriet, 2011). The designs of the experiments presented
here do not allow for a test of this interpretation, and
so further research is required in order to address this
specifically.
In the current work, ‘‘present’’ responses should only
have been given in 25%of trials: those in which the test
face was a matching exemplar. It is possible that
participants had inflated expectations regarding the
required ratio of ‘‘present’’ responses because of
experience with the practice block (50%correct ‘‘pre-
sent’’ responses) or psychology experiments more
generally. However, previous research that controlled
for this by informing participants of the correct
frequency of ‘‘present’’ responses suggests that this
possibility is unlikely to account for the results presented
here (experiments 2 and 3, Neumann et al., 2013). In
addition, even if participants were motivated to increase
the number of ‘‘present’’ responses given, this should not
favor any particular condition. As such, this account
fails to explain why matching averages received more
‘‘present’’ responses than nonmatching averages.
Previous research on the ensemble encoding of faces
has mainly focused on simultaneous presentation (de
Fockert & Wolfenstein, 2009; Haberman & Whitney,
2007, 2009; Neumann et al., 2013). However, evidence
also supports the idea that facial expressions (Haber-
man et al., 2009) and identities (Leib et al., 2014;
experiment 4, Neumann et al., 2013) are averaged
across sequential presentations. Indeed, this mecha-
nism appears to be viewpoint-invariant in that the
average identity can be formed by averaging images
that vary in viewing angle (Arnold & Si´
eroff, 2012;
Leib et al., 2014). Here, by using ambient images, we
provide additional evidence that ensemble encoding
can operate on viewpoint-invariant representations.
However, our stimuli also varied in expression,
lighting, gaze direction, and numerous other real-
world factors. As such, our findings provide a strong
argument that encoding is invariant with regard to
multiple high-level features.
Although previous research has shown that viewers
represent the averages for both familiar (Neumann et
al., 2013) and unfamiliar (de Fockert & Wolfenstein,
2009) sets of images containing different identities, the
current work demonstrates that this is also true for sets
containing different images of the same identity.
Mechanisms that result in the averaging of faces across
identities provide no obvious advantages (for discus-
sion, see Neumann et al., 2013). However, it is easy to
imagine the benefits of averaging together different
images of the same identity. We know from previous
research that the average of a set of instances can
provide a more stable and robust representation of an
individual (Burton et al., 2005). The formation of a
within-person average may therefore provide a useful
tool that could explain why viewers perform much
better with familiar face recognition (Bruce, 1986) and
matching (Bruce et al., 1999).
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 7
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
Although there may be clear advantages to creating
the average representation for a single identity,
previous evidence that we also average across identities
may suggest that the current findings are the result of a
more general ensemble encoding mechanism rather
than anything specific to the creation of stable person
representations. In line with this idea, developmental
prosopagnosics can extract ensemble characteristics
from sets of faces equivalently to controls (Leib et al.,
2012), perhaps suggesting a general process that has
been co-opted for faces. However, that ensemble
encoding of faces is not mediated by low-level features,
luminance cues, or other nonconfigural cues (Haber-
man & Whitney, 2009), and that it is able to operate on
high-level, view-invariant information (Leib et al.,
2014) may suggest at least some specialization regard-
ing faces.
In conclusion, we have shown that viewers extract an
average from different images of the same identity
while apparently continuing to represent the individual
exemplars. This process appears unaffected by whether
the images are presented simultaneously or sequential-
ly. Our findings may provide one account through
which stable representations of identities are formed.
However, representing the average alone remains a very
limited statistical summary, and we recommend further
investigation in order to determine whether other
important information, such as the distribution of a set
of faces (Burton, Kramer, Ritchie, & Jenkins, in press),
is also encoded.
Keywords: set representation, ensemble encoding,
face, identity, averaging
Acknowledgments
The research leading to these results has received
funding from the European Research Council under
the European Union’s Seventh Framework Programme
(FP/2007-2013) / ERC Grant Agreement n.323262, and
from the Economic and Social Research Council, UK
(ES/J022950/1).
Commercial relationships: none.
Corresponding author: Robin S. S. Kramer.
Email: remarknibor@gmail.com.
Address: Department of Psychology, University of
York, York, UK.
References
Albrecht, A. R., & Scholl, B. J. (2010). Perceptually
averaging in a continuous visual world: Extracting
statistical summary representations over time.
Psychological Science, 21, 560–567.
Ariely, D. (2001). Seeing sets: Representation by
statistical properties. Psychological Science, 12,
157–162.
Arnold, G., & Si´
eroff, E. (2012). Temporal integration
of face view sequences and recognition of novel
views. Visual Cognition, 20, 793–814.
Atchley, P., & Andersen, G. (1995). Discrimination of
speed distributions: Sensitivity to statistical prop-
erties. Vision Research, 35, 3131–3144.
Brady, T. F., & Alvarez, G. A. (2011). Hierarchical
encoding in visual working memory: Ensemble
statistics bias memory for individual items. Psy-
chological Science, 22, 384–392.
Bruce, V. (1986). Influences of familiarity on the
processing of faces. Perception, 15, 387–397.
Bruce, V., Henderson, Z., Greenwood, K., Hancock, P.
J. B., Burton, A. M., & Miller, P. (1999).
Verification of face identities from images captured
on video. Journal of Experimental Psychology:
Applied, 5, 339–360.
Burton, A. M., Jenkins, R., Hancock, P. J. B., & White,
D. (2005). Robust representations for face recog-
nition: The power of averages. Cognitive Psychol-
ogy, 51, 256–284.
Burton, A. M., Kramer, R. S. S., Ritchie, K. L., &
Jenkins, R. (in press). Identity from variation:
Representations of faces derived from multiple
instances. Cognitive Science.
Corbett, J. E., & Oriet, C. (2011). The whole is indeed
more than the sum of its parts: Perceptual
averaging in the absence of individual item
representation. Acta Psychologica, 138, 289–301.
de Fockert, J., & Wolfenstein, C. (2009). Rapid
extraction of mean identity from sets of faces.
Quarterly Journal of Experimental Psychology, 62,
1716–1722.
Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical
learning of higher-order spatial structures from
visual scenes. Psychological Science, 12, 499–504.
Haberman, J., Harp, T., & Whitney, D. (2009).
Averaging facial expression over time. Journal of
Vision, 9(11):1, 1–13, http://www.journalofvision.
org/content/9/11/1, doi:10.1167/9.11.1. [PubMed]
[Article]
Haberman, J., & Whitney, D. (2007). Rapid extraction
of mean emotion and gender from sets of faces.
Current Biology, 17, R751–R753.
Haberman, J., & Whitney, D. (2009). Seeing the mean:
Ensemble coding for sets of faces. Journal of
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 8
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
Experimental Psychology: Human Perception and
Performance, 35, 718–734.
Haberman, J., & Whitney, D. (2010). The visual system
discounts emotional deviants when extracting
average expression. Attention, Perception, & Psy-
chophysics, 72, 1825–1838.
Jenkins, R., & Burton, A. M. (2008). 100%accuracy in
automatic face recognition. Science, 319, 435.
Jenkins, R., Burton, A. M., & White, D. (2006). Face
recognition from unconstrained images: Progress
with prototypes. In Proceedings of the 7th IEEE
International Conference on Automatic Face and
Gesture Recognition, Southampton, UK, 10-12 April
(pp. 25–30). Los Alamitos, CA: IEEE Computer
Society.
Jenkins, R., White, D., Van Montfort, X., & Burton,
A. M. (2011). Variability in photos of the same
face. Cognition, 121, 313–323.
Leib, A. Y., Fischer, J., Liu, Y., Qiu, S., Robertson, L.,
& Whitney, D. (2014). Ensemble crowd perception:
A viewpoint-invariant mechanism to represent
average crowd identity. Journal of Vision, 14(8):26,
1–13, http://www.journalofvision.org/content/14/8/
26, doi:10.1167/14.8.26. [PubMed] [Article]
Leib, A. Y., Puri, A. M., Fischer, J., Bentin, S.,
Whitney, D., & Robertson, L. (2012). Crowd
perception in prosopagnosia. Neuropsychologia, 50,
1698–1707.
Neumann, M. F., Schweinberger, S. R., & Burton, A.
M. (2013). Viewers extract mean and individual
identity from sets of famous faces. Cognition, 128,
56–63.
Robitaille, N., & Harris, I. M. (2011). When more is
less: Extraction of summary statistics benefits from
larger sets. Journal of Vision, 11(12):18, 1–8, http://
www.journalofvision.org/content/11/12/18, doi:10.
1167/11.12.18. [PubMed] [Article]
Sweeny, T. D., & Whitney, D. (2014). Perceiving crowd
attention: Ensemble perception of a crowd’s gaze.
Psychological Science, 25, 1903–1913.
Journal of Vision (2015) 15(4):1, 1–9 Kramer, Ritchie, & Burton 9
Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933740/ on 04/20/2015 Terms of Use:
... Exposure to the ways in which a person's appearance varies is critical (Andrews, Jenkins, Cursiter, & Burton, 2015;Bindemann & Sandford, 2011;Dowsett, Sandford, & Burton, 2016;Menon, White, & Kemp, 2015a,b;Ritchie & Burton, 2017). Exposure to variability in appearance allows the perceiver to represent multiple variations of a person's appearance (Burton, Kramer, Ritchie, & Jenkins, 2016;Young & Burton, 2017) and/or form an average representation of a person's appearance that contains cues that are reliably diagnostic of that identity, excluding cues that are specific to a particular instance (Burton, Jenkins, Hancock, & White, 2005;Kramer, Ritchie, & Burton, 2015). Adults' ability to identify a target from a 30-image lineup improves as they are able to view more images of the target (Dowsett, Sandford, & Burton, 2016;. ...
... One avenue for future research is to examine 4-and 5year-olds' ability to extract average representations of an identity's appearance-a process known as ensemble coding. Average representations contain cues that are diagnostic to identity and eliminate cues that are specific to a particular instance, facilitating recognition of that identity in novel instances (Burton et al., 2005;Kramer et al., 2015). Studies have shown that children as young as 6 years and adults extract average representations of facial identity when viewing multiple images of a newly encountered face (Davis, Matthews, & Mondloch, 2021;Kramer, Ritchie, & Burton, 2015;. ...
... Average representations contain cues that are diagnostic to identity and eliminate cues that are specific to a particular instance, facilitating recognition of that identity in novel instances (Burton et al., 2005;Kramer et al., 2015). Studies have shown that children as young as 6 years and adults extract average representations of facial identity when viewing multiple images of a newly encountered face (Davis, Matthews, & Mondloch, 2021;Kramer, Ritchie, & Burton, 2015;. In these studies, participants viewed four images of an identity, followed by a test image that was (a) one of the displayed images, (b) an average of the displayed images, (c) a new image of the identity, or (d) an average of four new images of the identity. ...
Article
Children under 6 years of age have difficulty recognizing a familiar face across changes in appearance and telling the face apart from similar-looking people. Understanding the process by which newly encountered faces become familiar can provide insights into these difficulties. Exposure to the ways in which a person varies in appearance is one mechanism by which adults and older children (≥6 years) learn new faces. We provide the first investigation of whether this mechanism for face learning functions in younger children. Children aged 4 and 5 years were read two storybooks featuring an unfamiliar character. Participants viewed six images of the character in one story and one image of the character in the other story. After each story, children were asked to identify novel images of the character that were intermixed with images of a similar-looking distractor. Like older children, 4- and 5-year-olds were more sensitive to identity in the 6-image condition, but they also adapted a less conservative criterion. Young children identified more images of the character after viewing six images versus one image. However, many also incorrectly identified more images of the distractor after viewing six images versus one image, an effect not previously found for older children and adults. These results suggest that this mechanism for face learning is not fully refined before 6 years of age.
... Further evidence comes from ensemble coding. Adults and children automatically form an average after briefly viewing four different images of the same identity; they recognize the average when asked if it had been present in a study array (Davis et al., 2021;Kramer et al., 2015;Matthews et al., 2018). ...
... Evidence from ensemble coding also supports the hypothesis that humans store a representation of within-person variability. Participants retain little information about individual exemplars when tested with low-level object categories (i.e., they recognize the average but not the exemplars comprising that average; Ariely, 2001), but recognize both the average and individual exemplars when asked whether a particular face image had been in the study array (Kramer et al., 2015;Matthews et al., 2018;Neumann et al., 2013). The ability to retain a representation of exemplars while forming an average likely allows for a representation of familiar faces that includes both an average and a representation of idiosyncratic variability in appearance. ...
Article
Full-text available
Matching identity in images of unfamiliar faces is error prone, but we can easily recognize highly variable images of familiar faces – even images taken decades apart. Recent theoretical development based on computational modelling can account for how we recognize extremely variable instances of the same identity. We provide complementary behavioural data by examining older adults’ representation of older celebrities who were also famous when young. In Experiment 1, participants completed a long‐lag repetition priming task in which primes and test stimuli were the same age or different ages. In Experiment 2, participants completed an identity after effects task in which the adapting stimulus was an older or young photograph of one celebrity and the test stimulus was a morph between the adapting identity and a different celebrity; the adapting stimulus was the same age as the test stimulus on some trials (e.g., both old) or a different age (e.g., adapter young, test stimulus old). The magnitude of priming and identity after effects were not influenced by whether the prime and adapting stimulus were the same age or different age as the test face. Collectively, our findings suggest that humans have one common mental representation for a familiar face (e.g., Paul McCartney) that incorporates visual changes across decades, rather than multiple age‐specific representations. These findings make novel predictions for state‐of‐the‐art algorithms (e.g., Deep Convolutional Neural Networks).
... This representation allows the perceiver to extrapolate beyond the stored instances to recognize novel instances of the same person. Second, individual instances are integrated into an average representation of the identity that contains cues that are diagnostic of identity but excludes non-diagnostic cues (i.e., those that are specific to a particular instance; Burton et al., 2005;Kramer et al., 2015). Indeed, rapidly extracting an average representation has been claimed as a route to face learning. ...
... We provide evidence that this domain-general skill is preserved with aging. It is noteworthy that both young and older adults recognized both the individual exemplars and the average of those exemplars, consistent with other studies examining ensemble coding of facial identity (Davis et al., 2020;Kramer et al., 2015;Neumann et al., 2013). This is in contrast to studies examining ensemble coding of low-level object categories, in which observers retain little information about individual exemplars (Ariely, 2001). ...
Article
Recent research has emphasized the importance of using images that incorporate natural variability in appearance (i.e., ambient images) to assess face learning and recognition. Across five tasks, we provide the first examination of older adults’ face learning and recognition in ambient images. Young and older adults showed comparable performance in three tasks: when recognizing a familiar face across ambient images, extracting average representations of an identity (i.e., ensemble coding) and learning a new identity from multiple images in a perceptual task. However, compared to young adults, older adults have even more difficulty matching images of unfamiliar faces and despite showing comparable benefits in sensitivity, older adults adopted a more conservative response bias after being exposed to low variability in appearance in a face memory task, resulting in them failing to recognize novel instances of a newly learned identity. We discuss the implications of our findings for older adults and the insights our findings provide for understanding both the development of face learning and recognition in childhood and the own-race recognition advantage.
... De Fockert and Wolfenstein (2009) also reported observers were more likely to incorrectly select the mean identity of a set of four unfamiliar faces as present versus an individual identity that was present. This finding was later replicated for sets of four familiar famous faces (Neumann et al., 2013), and four exemplars of the same celebrity regardless of simultaneous or sequential presentation (Kramer et al., 2015). ...
Book
This Element outlines the recent understanding of ensemble representations in perception in a holistic way aimed to engage the general audience, novel and expert alike. The Element highlights the ubiquitous nature of this summary process, paving the way for a discussion of the theoretical and cortical underpinnings, and why ensemble encoding should be considered a basic, inherently necessary component of human perception. Following an overview of the topic, including a brief history of the field, the Element introduces overarching themes and a corresponding outline of the present work.
... This prototypical representation is considered to be the statistical average of all seen faces, and provides the basis for a neural 'face space' , where other faces can be encoded relative to this prototype (Valentine, 1991). Indeed, evidence suggests that the formation of face prototypes is both rapid and implicit (de Fockert & Wolfenstein, 2009;Kramer et al., 2015;Neumann et al., 2013;Or & Wilson, 2013), and may even take place in infancy (de Haan et al., 2001). ...
Article
Full-text available
Facial first impressions are known to influence how we behave towards others. As a result of the COVID-19 pandemic, we often view incomplete faces due to the commonplace wearing of face masks. Previous research has shown that perceptions of attractiveness are often increased due to these coverings, with initial evidence suggesting that this may be caused by viewers using a mental representation of the average face to complete any missing information. Here, we directly address this hypothesis by presenting participants with incomplete faces (either the lower or upper half removed) and asking them to decide how they thought the actual, full face looked. Participants were able to manipulate the missing half of the face onscreen by increasing or decreasing the averageness of its shape. Our results demonstrated that participants did not select the original versions of the faces but instead chose more average versions when manipulating both the lower and upper face. Further, the typicality of the original image influenced responses, with less typical faces (in comparison with more typical ones) being completed using an even more average version of the missing half of the faces. Taken together, these findings provide the first direct evidence that people utilise an average/typical internal representation when inferring information about incomplete faces. This result has theoretical importance in terms of visual perception, as well as real-world relevance in a time where face masks are commonplace due to the COVID-19 pandemic.
... However, individual representations can be as precise as mean representations. Kramer et al. (2015) reported that the mean representation of a set of familiar faces was as accurate as the individual representations. To explain this finding, it was suggested that processing individual familiar faces requires fewer attentional resources. ...
Article
Full-text available
Individuals can perceive the mean emotion or mean identity of a group of faces. It has been considered that individual representations are discarded when extracting a mean representation; for example, the “element-independent assumption” asserts that the extraction of a mean representation does not depend on recognizing or remembering individual items. The “element-dependent assumption” proposes that the extraction of a mean representation is closely connected to the processing of individual items. The processing mechanism of mean representations and individual representations remains unclear. The present study used a classic member-identification paradigm and manipulated the exposure time and set size to investigate the effect of attentional resources allocated to individual faces on the processing of both the mean emotion representation and individual representations in a set and the relationship between the two types of representations. The results showed that while the precision of individual representations was affected by attentional resources, the precision of the mean emotion representation did not change with it. Our results indicate that two different pathways may exist for extracting a mean emotion representation and individual representations and that the extraction of a mean emotion representation may have higher priority. Moreover, we found that individual faces in a group could be processed to a certain extent even under extremely short exposure time and that the precision of individual representations was relatively poor but individual representations were not discarded.
... We argue that when face information and contextual information compete for attention, prioritizing the processing of faces while ignoring contextual distractions might particularly facilitate two aspects of face recognition. First, it might help the visual system to filter what information is critical for identification and what is not so as to enhance the accurate abstraction of an average face representation across the various factors that can alter a person's appearance (i.e., ensemble coding; Burton et al., 2005;Davis et al., 2021;Kramer et al., 2015), consistent with Bruce's notion of learning "stability from variation" (Bruce, 1994;Young & Burton, 2018). However, how ensemble coding is related to the use of internal and external cues requires further investigation. ...
Article
Full-text available
Everyday face recognition presents a difficult challenge because faces vary naturally in appearance due to changes in lighting, expression, viewing angle, and hairstyle. We know little about how humans develop the ability to learn faces despite naturalistic facial variability. In the current study, we provide the first examination of attentional mechanisms underlying adults’ and infants’ learning of naturally-varying faces. Adults (n = 48) and 6- to 12-month-old infants (n = 48) viewed videos of models reading a storybook, whose appearance contained either high or low variability, then viewed the learned face paired with a novel face. Infants showed adult-like prioritization of face over non-face regions; both age groups fixated the face region more in the high than low variability condition. Overall, however, infants showed less ability to resist contextual distractions during learning, which potentially contributed to their lack of discrimination between the learned and novel faces. Mechanisms underlying face learning across natural variability are discussed.
Article
Familiar faces can be confidently recognized despite sometimes radical changes in their appearance. Exposure to within-person variability—differences in facial characteristics over successive encounters—contributes to face familiarization. Research also suggests that viewers create mental averages of the different views of faces they encounter while learning them. Averaging over within-person variability is thus a promising mechanism for face familiarization. In Experiment 1, 153 Canadian undergraduates (88 female; age: M = 21 years, SD = 5.24) learned six target identities from eight different photos of each target interspersed among 32 distractor identities. Face-matching accuracy improved similarly irrespective of awareness of the target’s identity, confirming that target faces presented among distractors can be learned incidentally. In Experiment 2, 170 Canadian undergraduates (125 female; age: M = 22.6 years, SD = 6.02) were tested using a novel indirect measure of learning. The results show that viewers update a mental average of a person’s face as it becomes learned. Our findings are the first to show how averaging within-person variability over time leads to face familiarization.
Article
Full-text available
An ensemble or statistical summary can be extracted from facial expressions presented in different spatial locations simultaneously. However, how such complicated objects are represented in the mind is not clear. It is known that the aftereffect of facial expressions, in which prolonged viewing of facial expressions biases the perception of subsequent facial expressions of the same category, occurs only when a visual representation is formed. Using this methodology, we examined whether an ensemble can be represented with visualized information. Experiment 1 revealed that the presentation of multiple facial expressions biased the perception of subsequent facial expressions to less happy as much as the presentation of a single face did. Experiment 2 compared the presentation of faces comprising strong and weak intensities of emotional expressions with an individual face as the adaptation stimulus. The results indicated that the perceptual biases were found after the presentation of four faces and a strong single face, but not after the weak single face presentation. Experiment 3 employed angry expressions, a distinct category from the test expression used as an adaptation stimulus; no aftereffect was observed. Finally, Experiment 4 clearly demonstrated the perceptual bias with a higher number of faces. Altogether, these results indicate that an ensemble average extracted from multiple faces leads to the perceptual bias, and this effect is similar in terms of its properties to that of a single face. This supports the idea that an ensemble of faces is represented with visualized information as a single face.
Article
It is well-established that individuals are better at recognising faces of their own-race compared to other-races, however there is ongoing debate regarding the perceptual mechanisms that may be involved and therefore sensitive to face-race. Here we ask whether serial dependence of facial identity, a bias where the perception of a face’s identity is biased towards a previously presented face, shows an other-race effect. Serial dependence is associated with face recognition ability and appears to operate on high-level, face-selective representations, like other candidate mechanisms (e.g., holistic coding). We therefore expected to find an other-race effect for serial dependence for our Caucasian and Asian participants. While participants showed robust effects of serial dependence for all faces, only Caucasian participants showed stronger serial dependence for own-race faces. Intriguingly, we found that individual variation in own-race, but not other-race, serial dependence was significantly associated with face recognition abilities. Preliminary evidence also suggested that other-race contact is associated with other-race serial dependence. In conclusion, though we did not find an overall difference in serial dependence for own versus other-race faces in both participant groups, our results highlight that this bias may be functionally different for own versus other-race faces and sensitive to racial experience.
Article
Full-text available
Research in face recognition has tended to focus on discriminating between individuals, or "telling people apart." It has recently become clear that it is also necessary to understand how images of the same person can vary, or "telling people together." Learning a new face, and tracking its representation as it changes from unfamiliar to familiar, involves an abstraction of the variability in different images of that person's face. Here, we present an application of principal components analysis computed across different photos of the same person. We demonstrate that people vary in systematic ways, and that this variability is idiosyncratic-the dimensions of variability in one face do not generalize well to another. Learning a new face therefore entails learning how that face varies. We present evidence for this proposal and suggest that it provides an explanation for various effects in face recognition. We conclude by making a number of testable predictions derived from this framework. © 2015 Cognitive Science Society, Inc.
Article
Full-text available
In nearly every interpersonal encounter, people readily gather socio-visual cues to guide their behavior. Intriguingly, social information is most effective in directing behavior when it is perceived in crowds. For example, the shared gaze of a crowd is more likely to direct attention than is a single person's gaze. Are people equipped with mechanisms to perceive a crowd's gaze as an ensemble? Here, we provide the first evidence that the visual system extracts a summary representation of a crowd's attention; observers rapidly pooled information from multiple crowd members to perceive the direction of a group's collective gaze. This pooling occurred in high-level stages of visual processing, with gaze perceived as a global-level combination of information from head and pupil rotation. These findings reveal an important and efficient mechanism for assessing crowd gaze, which could underlie the ability to perceive group intentions, orchestrate joint attention, and guide behavior.
Article
Full-text available
Individuals can rapidly and precisely judge the average of a set of similar items, including both low-level (Ariely, 2001) and high-level objects (Haberman & Whitney, 2007). However, to date, it is unclear whether ensemble perception is based on viewpoint-invariant object representations. Here, we tested this question by presenting participants with crowds of sequentially presented faces. The number of faces in each crowd and the viewpoint of each face varied from trial to trial. This design required participants to integrate information from multiple viewpoints into one ensemble percept. Participants reported the mean identity of crowds (e.g., family resemblance) using an adjustable, forward-oriented test face. Our results showed that participants accurately perceived the mean crowd identity even when required to incorporate information across multiple face orientations. Control experiments showed that the precision of ensemble coding was not solely dependent on the length of time participants viewed the crowd. Moreover, control analyses demonstrated that observers did not simply sample a subset of faces in the crowd but rather integrated many faces into their estimates of average crowd identity. These results demonstrate that ensemble perception can operate at the highest levels of object recognition after 3-D viewpoint-invariant faces are represented.
Article
When faces are learned from rotating view sequences, novel views may be recognized by matching them with an integrated representation of the sequence or with individual views. An integrated-representation process should benefit from short view durations, and thus from the inclusion of views in a short temporal window, allowing the distribution of attention over the entire sequence. A view-matching process should benefit from long view durations, allowing the attention to focus on each view. In a sequential comparison task, we tested the recognition of learned and novel interpolated and extrapolated views after learning faces from rapid and slow sequences (240 ms or 960 ms for each view). We found a superiority of rapid over slow sequences, in favour of the integrated-representation hypothesis. In addition, the recognition pattern for the different viewpoints in the sequence depended on the absence or presence of extrapolated views, showing a bias of the distribution of attention.
Article
When viewers are shown sets of similar objects (for example circles), they may extract summary information (e.g., average size) while retaining almost no information about the individual items. A similar observation can be made when using sets of unfamiliar faces: Viewers tend to merge identity or expression information from the set exemplars into a single abstract representation, the set average. Here, across four experiments, sets of well-known, famous faces were presented. In response to a subsequent probe, viewers recognized the individual faces very accurately. However, they also reported having seen a merged 'average' of these faces. These findings suggest abstraction of set characteristics even in circumstances which favor individuation of the items. Moreover, the present data suggest that, although seemingly incompatible, exemplar and average representations co-exist for sets consisting of famous faces. This result suggests that representations are simultaneously formed at multiple levels of abstraction.
Article
Four experiments investigated matching of unfamiliar target faces taken from high-quality video against arrays of photographs. In Experiment 1, targets were present in 50% of arrays. Accuracy was poor and worsened when viewpoint and expression differed between target and array faces. In Experiment 2, targets were present in every array, but performance remained highly error prone. In Experiment 3, short video clips of the targets were shown and replayed as often as necessary, but performance levels were only slightly better than Experiment 2. Experiment 4 showed that matching was dominated by external face features. The results urge caution in the use of video images to identify people who have committed crimes. Superficial impressions of resemblance or dissimilarity between face images can be highly misleading. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Two experiments examining the ability of human observers to detect differences in the statistical properties underlying velocity distributions were conducted. A four-alternative forced-choice methodology, using four simultaneous velocity distributions, was used in both experiments. In the first experiment the value of one statistical moment (mean, variance, skewness, or kurtosis) was manipulated while the others were held constant. The subjects task was to determine which of four velocity distributions contained the dissimilar value. In the second experiment only the latter three moments were examined. A similar procedure was used, however feedback was given after each trial to maximize observer performance. The results from both experiments indicate that human observers can reliably detect differences in both mean and variance information underlying velocity distributions. The results of this research has important implications for image segmentation and the detection of heading from optic flow.
Article
Despite several processing limitations that have been identified in the visual system, research shows that statistical information about a set of objects could be perceived as accurately as the information about a single object. It has been suggested that extraction of summary statistics represents a different mode of visual processing, which employs a parallel mechanism free of capacity limitations. Here, we demonstrate, using reaction time measures, that increasing the number of stimuli in the set results in faster reaction times and better accuracy for estimating the mean tendency of a set. These results provide clear evidence that extraction of summary statistics relies on a distributed attention mode that operates across the whole display at once and that this process benefits from larger samples across which the summary statistics are calculated.
Article
We tested Ariely's (2001) proposal that the visual system represents the overall statistical properties of sets of objects against alternative accounts of rapid averaging involving sub-sampling strategies. In four experiments, observers could rapidly extract the mean size of a set of circles presented in an RSVP sequence, but could not reliably identify individual members. Experiment 1 contrasted performance on a member identification task with performance on a mean judgment task, and showed that the tasks could be dissociated based on whether the test probe was presented before or after the sequence, suggesting that member identification and mean judgment are subserved by different mechanisms. In Experiment 2, we confirmed that when given a choice between a probe corresponding to the mean size of the set and a foil corresponding to the mean of the smallest and largest items only, the former is preferred to the latter, even when observers are explicitly instructed to average only the smallest and largest items. Experiment 3 showed that a test item corresponding to the mean size of the set could be reliably discriminated from a foil but the largest item in the set, differing by an equivalent amount, could not. In Experiment 4, observers rejected test items dissimilar to the mean size of the set in a member identification task, favoring test items that corresponded to the mean of the set over items that were actually shown. These findings suggest that mean representation is accomplished without explicitly encoding individual items.